Patient-reported outcome measures in presbyopia: a literature review
•,,,,,,,,,.
...
Abstract
Presbyopia is the age-related loss of near-distance focusing ability. The aim of this study was to identify patient-reported outcome measures (PROMs) used in clinical trials and quality-of-life studies conducted in individuals with presbyopia and to assess their suitability for use in individuals with phakic presbyopia. Literature searches were performed in Medline and Embase up until October 2017. Specific search terms were used to identify presbyopia studies that included a PROM. All clinical trials with PROM-supported endpoints in presbyopia were identified on ClinicalTrials.gov. Further searches were conducted to retrieve articles documenting the development and psychometric evaluation of the PROMs identified. A total of 703 records were identified; 120 were selected for full-text review. Twenty-one clinical trials employed PROMs to support a primary or secondary endpoint. In total, 13 PROMs were identified; a further 23 publications pertaining to the development and validation of these measures were retrieved. Most PROMs were developed prior to release of the Food and Drug Administration (FDA) 2009 patient-reported outcome guidance and did not satisfy regulatory standards. The Near Activity Visual Questionnaire (NAVQ) was identified as the most appropriate for assessing near-vision functioning in presbyopia. While the NAVQ was developed in line with the FDA guidance, the items do not reflect changes in technology that have occurred since the questionnaire was developed in 2008 (eg, the increase in smartphone use), and the measure was not validated in a purely phakic presbyopia sample. Further research is ongoing to refine the NAVQ to support trial endpoints related to changes in near-vision functioning associated with phakic presbyopia.
Background
Presbyopia is a common age-related vision disorder characterised by a progressive inability to focus on near objects. Presbyopia is hypothesised to be caused by either a weakening of the ciliary muscles or a loss of lens elasticity preventing focal point change, the latter considered the primary causative mechanism. This condition is found to occur both in individuals with intact natural crystalline lens (phakic presbyopia), as well as those who have undergone an invasive surgical procedure involving the extraction of the natural crystalline lens (pseudophakic presbyopia).1–4 It was estimated there were 1.3 billion people living with presbyopia worldwide in 2017, which is projected to increase to 1.8 billion by 2050.5 The lens of the human eye is usually able to change shape in order to focus light onto the retina, enabling individuals to see objects at both near and far distances. The lens decreases in flexibility throughout life until, after the age of 40 years, it cannot change shape easily, leading to difficulty focusing on near-distance objects and performing near-vision activities, such as reading or threading a needle.1 There are currently no approved therapies that reverse the normal ageing process that causes presbyopia; instead, current management focuses on either optical correction with medical devices (eg, spectacles, contact lenses) or surgical intervention (eg, corneal inlay, corneal refractive procedures and intraocular lens (IOL) replacement).6 Until recently, the assessment of treatment outcomes in both clinical trials and clinical practice has primarily relied on clinician-reported biomedical parameters, mostly based on Snellen or logarithm of the minimum angle of resolution (logMAR) acuity testing. However, there is now an increasing emphasis from regulators and other relevant stakeholders on the incorporation of patient voice in clinical trials.7 One way to include patient perspective when evaluating treatment efficacy is through inclusion of a patient-reported outcome measure (PROM)-supported endpoint.8–10
A patient-reported outcome (PRO) is a health outcome reported directly by the patient (ie, study participant) about the status of the patient’s health without amendment or interpretation of the patient’s response by a clinician or anyone else.11 In 2009, the US Food and Drug Administration (FDA) released their influential ‘Guidance for Industry on Patient-Reported Outcome Measures’, which describes the review and evaluation criteria for PROs used to support claims in approved medical product labelling.12 The European Medicines Agency also published a reflection paper in 2005 to provide broad recommendations on health-related quality-of-life (HRQoL) evaluation in the context of clinical trials.13 Aside from having the potential to support regulatory evaluation, marketing authorisation and product label claims, the data generated by PROM-supported endpoints can also be of value for reimbursement authorities, clinicians and patients to quantify the added benefit of a treatment.
In order for PROM data to support an FDA product label claim, the PROM must assess relevant and important concept(s) of interest and has been developed and validated to the standards specified in regulator guidelines (including evidence of both content validity and psychometric validity in the population of interest) and included in a well-designed and adequately controlled clinical trial. Hence, the review and critical evaluation of existing PROMs have become an important first step in identifying a PROM that assesses the concept of interest (the ‘thing’ being measured; ie, near-vision functioning) in a defined context of use (ie, as a clinical trial endpoint in a sample of adults with presbyopia with intact natural crystalline lens or phakic presbyopes). The objective of the current literature review was to identify the most commonly cited PROMs designed for use in presbyopia or similar conditions and critically evaluate the evidence of content validity and psychometric properties, and as such their adequacy for use to support endpoints in a Novartis’s pivotal trial testing a new pharmacological therapy for phakic individuals with presbyopia.
Methods
Patient and public involvement
As this is a review of the available literature on PROMs used in presbyopia, patients were not involved in this research project.
Phase 1 (search strategy and screening)
A comprehensive review of bibliographic databases and other sources (ClinicalTrials.gov, Patient-Reported Outcome and Quality of Life Instruments Database (PROQOLID)) was conducted to identify PROMs that have been used in individuals with presbyopia. Literature searches were performed in Medline, Embase and Evidence-Based Medicine Reviews (Cochrane Database of Systematic Reviews and Database of Abstracts of Reviews of Effects) via Ovid SP up until October 2017 (when the search was conducted) using specific search strings (table 1). Limits were placed on searches to exclude articles in languages other than English or studies not conducted in human subjects.
Table 1
|
Search strings used in the targeted literature search
Conference abstracts, case studies and case reports were not considered for further screening. Studies of populations not exclusively comprising individuals with presbyopia were excluded, as were studies that did not include a PROM. Studies assessing treatment satisfaction in patients with presbyopia using a non-standardised measure, such as a single-item visual analogue scale or a numeric analogue scale, were excluded. PROMs designed to assess only dry eye symptoms as a result of the use of contact lens in individuals with presbyopia were also excluded from further analysis.
The first-level screening was performed based on the title and abstract of the citations, and full-text copies of the studies with individuals with presbyopia were obtained for the next round of review, which involved critical full-text appraisal based on the aforementioned criteria.
Phase 2 (evaluation of PROMs)
In order to assess the PROMs identified in phase 1 based on available information regarding the development and psychometric validation of the measures, additional publications were accessed via a PROM-specific search using Ovid, bibliographic searches and information provided by the PROM developers (either online or on request). Information about the development and psychometric validation was extracted and compared across the measures. Only PROMs where information on the development and validation history was available via an accessible full-text publication were chosen for the review. The properties of the PROMs were assessed against the FDA guidance for PROMs submitted as a clinical endpoint for drug approval to support a labelling claim (online supplementary table S1).12 Information related to the use of modern psychometric techniques, such as Rasch analysis, was also reviewed where available. Rasch models convert ordinal scores into linear, interval-level data allowing for easier interpretation, presenting both item difficulty and respondent ability on a common scale. Further evaluation of the face validity, including examination of the item wording and missing concepts, was undertaken for PROMs that satisfied most of the psychometric assessment criteria.
Results
A total of 703 unique records were identified from the literature search during the first round of abstract screening, and 120 were selected for full-text review (figure 1). Using ClinicalTrials.gov, a total of 121 clinical studies in presbyopia were identified, of which only 21 mentioned a PROM used to support either a primary or secondary endpoint. Seventeen studies with varied study designs and sample sizes ranging from 26 to 7890 participants were found to meet the inclusion criteria for full-text review (online supplementary table S2).
Flow chart illustrating the study selection process. PRO, patient-reported outcome.
In total, 13 PROMs designed to assess vision outcomes (including symptoms and/or HRQoL impact) of presbyopia or similar vision conditions were identified. Of the 13 measures, only one was presbyopia-specific; 11 were generic eye disease measures used across a range of patient groups, including those with refractive correction (refractive surgery, spectacles and contact lenses) and cataracts; and one was a numeric rating scale (NRS) assessing overall vision satisfaction and ocular comfort. Based on the relevance of content (excluding PROMs measuring the effect of dry eye from contact lens use and NRS), nine PROMs were deemed suitable for further assessment of psychometric properties (table 2). The nine-item version of the National Eye Institute Visual Function Questionnaire (NEI VFQ-9) is an abbreviated version of NEI VFQ-25 and so was not counted separately within the nine PROMs identified.
Table 2
|
Patient-reported outcome measures in presbyopia considered for evaluation of psychometric characteristics
Phase 2 searches were conducted to evaluate and compare the psychometric properties of the nine PROMs meeting the inclusion criteria from phase 1. PROM-specific Ovid searches identified 116 abstracts. Of these 116 abstracts, 20 had been identified in the phase 1 searches, 32 were not relevant and the remaining were selected for full-text review, of which 23 were included in the final review. Information related to the psychometric properties of these nine PROMs is presented in table 3.
Table 3
|
Evaluation of psychometric characteristics of the patient-reported outcome measures in presbyopia
Content validity
Although patient input was sought in the development of most PROMs reviewed, individuals with presbyopia were only involved in the item generation process of six PROMs: Near Activity Visual Questionnaire (NAVQ), National Eye Institute Visual Function Questionnaire (NEI VFQ-25), Quality of Vision (QoV), National Eye Institute Refractive Error Quality of Life Instrument-42 (NEI RQL-42), Freedom From Glasses Value Scale (FGVS), and Refractive Status And Vision Profile (RSVP). While the NAVQ was the only presbyopia-specific PROM identified, it should be noted that individuals with pseudophakic presbyopia were included in the NAVQ development along with individuals with phakic presbyopia.14 15 The FGVS16 included individuals diagnosed with cataracts or presbyopia who had undergone IOL implantation surgery. Refractive error focus groups were used to derive items for the NEI RQL-4217 and RSVP,18 whereas content for the NEI VFQ-2519 20 was informed by conducting 26 condition-specific focus groups to meet its intended objective of evaluating vision-related quality of life across several common eye conditions, including cataracts, age-related macular degeneration, diabetic retinopathy and glaucoma. Participants involved in the item refinement procedure of the QoV21 included individuals with and without refractive correction. The remaining three PROMs, Catquest-9SF, Visual Function Index (VF-14) and Cataract TyPE Spec, were developed with input from individuals with cataract only.22–24
Reliability
The internal consistency of the NAVQ met the acceptability threshold of >0.70,14 as did the near vision subscales of the NEI VFQ-25, NEI RQL-42 and Cataract TyPE Spec (ranging from 0.85 to 0.95).24–26 While high internal consistency for the NAVQ (0.945) and Cataract TyPE Spec (0.94) indicates that the items are highly correlated and therefore measure the common concept of near vision, such high alpha coefficients suggest there may be some level of redundancy and that item reduction may be possible. Acceptable test–retest reliability was found for the NAVQ and near vision subscales of the NEI VFQ-25 and NEI RQL-42 (intraclass correlation coefficient ranging from 0.72 to 0.91), although the same was not reported for TyPE Spec. Among other measures that do not include a separate near-vision dimension, the internal consistency was found to be acceptable (>0.70) for FGVS, QoV and RSVP, whereas the test–retest reliability was strong for QoV (>0.70), was modest in the case of RSVP (intraclass correlation of 0.66) and was not reported for FGVS and NEI VFQ-9.18 21 27 28 As the values of reliability for Catquest-9SF, VF-14 and Cataract TyPE Spec were not reported specifically for individuals with presbyopia, these were not considered in the analysis.
Construct validity
The assessment of convergent validity showed that the NAVQ demonstrated moderate correlations with near visual acuity (Pearson’s correlation coefficient (r)=0.32) and critical print size (r=0.27), which provided evidence of convergent validity.14 Strong correlations (r=0.65–0.70) were observed between Early Treatment Diabetic Retinopathy Study visual acuity and NEI VFQ-25 subscales of near vision, distance vision and general vision.25 In the case of the NEI RQL-42, there was a significant association between most of the NEI RQL-42 subscales and the refractive error in both better-seeing and worse-seeing eyes.26 For the QoV, moderate correlations were observed between logMAR visual acuity and the three QoV subscales (frequency scale r=0.72, severity scale r=0.64 and bothersome scale r=0.35).21 29 No evidence of the FGVS correlating with objective clinical parameters or subjective measures of visual function was reported.30 RSVP scores were moderately associated with patient satisfaction (r=−0.41) and rating of vision (r=−0.42).31
Discriminant validity for the NAVQ was also supported by the high area under the receiver operator characteristic curve (0.91), a Rasch separation index of 2.92, and an overall good fit to the Rasch model with negligible ceiling and floor effects.14 The NAVQ scores also discriminated between individuals with different kinds of correction, such as IOL, contact lenses and varifocal spectacles, supporting known groups validity. In the case of the NEI VFQ-25, the near-vision and distance-vision scores significantly discriminated between participants in the reference group (better vision) and those with poor vision.25 Although the psychometric performance of the NEI VFQ-25 based on analysis using classic test theory has been shown to be adequate, analyses using Rasch validation methodology have demonstrated performance limitation in the near vision subscale and general design of the NEI VFQ-25 structure, including a substantial ceiling effect.32–34 Scores of the NEI RQL-42 differed significantly for participants in subgroups based on refractive error correction (no correction; postsurgery—no correction; glasses; multifocal glasses; contact lenses).26 In a Rasch validation study, the NEI RQL-42 near vision subscale demonstrated poor discriminative ability as suggested by a person separation index of 0.71, which was below the acceptable threshold of 2.0.35 For the QoV, discriminant ability of the scores was demonstrated by acceptable value of person separation indexes: 2.08, 2.10 and 2.01 for frequency, severity and bothersome scales, respectively.21 29 FGVS score suggested discrimination between individuals wearing glasses after surgery and those who gained spectacle independence.30 RSVP scores were shown to distinguish among individuals based on the degree of refractive error. However, the Rasch analysis revealed several redundant and misfitting items and poor item to person targeting that could diminish its discriminative ability.31 The construct validity of the remaining measures was not evaluated in a presbyopia-specific patient sample nor in a diverse group that included a subset of individuals with presbyopia.22–24
Responsiveness
In a randomised, controlled, cross-over trial assessing performance of a commercially available contact lens, Biofinity users demonstrated significantly greater improvements in NAVQ scores compared with OASYS contact lens users (p=0.047).36 Gundersen and Potvin37 reported that QoV scores of frequency, severity and degree to which the symptoms were bothersome were more sensitive than when using the NEI VFQ-25 in detecting between multifocal and monofocal IOLs. Furthermore, a cross-over study comparing two presbyopic soft contact lens modalities showed that, out of 13, only the 3 subscales of clarity, vision and appearance showed significant improvements in scores compared with the habitual correction method on the NEI RQL-42 scale.38 Gierek-Ciaciura et al39 reported an improvement in VF-14 scores for 83% of the individuals after implantation of multifocal IOLs, although there was no significant difference in postoperative scores between individuals with different IOLs. Following the bilateral Laser Anterior Ciliary Excision (LaserACE) procedure, a significant change in scores was registered on Catquest-9SF from a mean patient satisfaction score of −1.00 to 0.33 after surgery (p=0.000016).40 Responder definition or minimal important difference thresholds were not reported for any of the reviewed questionnaires in individuals with presbyopia.
Critical evaluation of the NAVQ
As the NAVQ was the only disease-specific measure identified at the time of literature review, further critical assessment of its face validity was undertaken (figure 2). It was concluded that the language used in the instructions could be interpreted differently by respondents who use contact lenses instead of reading spectacles. Furthermore, the absence of a recall period mentioned in the instructions could lead to participants using different timeframes for recollection.
Some of the issues identified with the content included use of examples of differing difficulty level in the same item (items 1, 5, 6 and 7); use of examples not relevant to modern times (such as telephone directories, postal electricity bills and letters); and examples measuring concepts other than near-vision ability (such as writing, playing card games and gardening) that assess manual dexterity and may lead to spurious scores. In addition, more relevant examples assessing ease of typing on a smartphone or tablet, reading on an electronic device (such as a smartphone, tablet or computer screen) and impact of lighting conditions on performing routine activities were missing from the items.
Discussion
This literature review identified PROMs developed for use in individuals with presbyopia and compared their properties in the context of regulatory requirements for supporting product label claims.12 Of the nine unique measures that were evaluated, only NAVQ was found to be developed to measure difficulties in near-vision function specifically in individuals with presbyopia. The other PROMs were found to have limitations related to lack of focus on presbyopia and insufficient evidence to support strong psychometric properties using modern psychometrics. Modern psychometric methods based on the item response theory, such as the Rasch measurement theory, provide a robust approach to examine validity and to help overcome two key limitations associated with traditional validation methods based on classic test theory: (1) scores are sample-dependent and scale-dependent; and (2) SE of measurement around individual subjects’ scores is assumed to be a constant value regardless of the person’s location on the range of a scale.41 42
While the NAVQ has undergone rigorous psychometric analysis and was developed with input from individuals with presbyopia, its validation study was performed in a sample of individuals with pseudophakic and phakic presbyopia, and therefore the results may not be generalisable specifically to phakic presbyopia. Face validity analyses revealed the requirement for rewording the measure instructions, including a short recall period of 1 week; reassessment of the relevance of examples; and replacement of irrelevant examples with others more applicable in present times.
As per FDA’s PRO guidance, examples of modifications to the original measure that alter responses of participants to the same set of questions and therefore require qualitative evidence to establish content validity include (1) making changes in the order of items, item wording, response options or recall period or deleting portions of a questionnaire; (2) using the questionnaire in a different setting, population or condition from the one for which it was originally developed; (3) changing instructions or the placement of instructions within the PROM; (4) switching mode of administration from paper to electronic format; and (5) changing the timing of, or procedures for, PROM administration within the clinic visit.12 Thus, based on the proposed modifications to the NAVQ, a qualitative study confirming the content validity and conceptual framework of the revised measure along with evidence generated in the context of a well-designed trial to support its psychometric properties, both in the targeted population of interest, would be necessary for the NAVQ to be qualified as a fit-for-purpose measure for supporting product labelling claims.
Although this literature review was confined to assessing PROMs used in presbyopia studies, it complements findings of a review by Kandel et al,43 which also identified the NAVQ as one of the superior quality questionnaires for measuring refractive surgery outcomes related to activity limitations. With regard to the limitations, it is possible that some relevant literature may have been missed due to restrictions of the literature search to only a few selected databases, although a bibliographic search was also performed to ensure identification of all relevant PROMs in presbyopia. Furthermore, the properties of the culturally adapted versions of the PROMs were not assessed, and it is possible that some of the identified measures might have performed better in a non-English-speaking population. However, such data would be unlikely to change the conclusions here, as NAVQ is the only presbyopia-specific measure that has the potential to support product label claims, although with further modification. Given the lack of availability of a conceptual model based on patient and clinician input, the final selection of PROMs identified for detailed review was not informed by a concept-mapping exercise. The results of a recently conducted qualitative study in phakic presbyopes and healthcare practitioners to develop a conceptual model for the NAVQ modification will be reported in a future publication. Finally, it is also acknowledged that this literature review did not meet the criteria to be considered as a ‘systematic review’. However, the authors believe that the approach was sufficiently structured and rigorous (eg, through the use of search terms and inclusion criteria) and that ‘systematic’ literature review methods are not typically judged necessary or appropriate to identify and evaluate adequately PROMS.
Conclusion
No single PROM fully adhered to the quality standards detailed in the US FDA Guidance for Industry on Patient-Reported Outcome Measures (2009), which represents best practice methods for PROMs. The NAVQ has the most potential to support trial endpoints related to changes in near-vision functioning associated with presbyopia, but with modification. Further research is ongoing to confirm the content validity and psychometric validity of a revised NAVQ in this specific population.