Original Research

Patient-reported outcome measures in ophthalmology: too difficult to read?

Abstract

Objective Patient-reported outcome measures (PROMs) are commonly used in clinical trials and research. Yet, in order to be effective, a PROM needs to be understandable to respondents. The aim of this cross-sectional analysis was to assess reading level of PROMs validated for use in common eye conditions.

Methods and analysis Readability measures determine the level of education a person is expected to have attained to be able to read a passage of text; this was calculated using the Flesch-Kincaid Grade Level, FORCAST and Gunning-Fog tests within readability calculations software package Oleander Readability Studio 2012.1. Forty PROMs, previously validated for use in at least one of age-related macular degeneration, glaucoma and/or diabetic retinopathy, were identified for inclusion via a systematic literature search. The American Medical Association (AMA) and National Institutes of Health (NIH) recommend patient materials should not exceed a sixth-grade reading level. Number of PROMs exceeding this level was calculated.

Results Median (IQR) readability scores were 7.9 (5.4–10.5), 9.9 (8.9–10.7) and 8.4 (6.9–11.1) for Flesch-Kincaid Grade Level, FORCAST and Gunning-Fog test, respectively. Depending on metric used, this meant 61% (95% CI 45% to 76%), 100% (95% CI 91% to 100%) and 80% (95% CI 65% to 91%) exceeded the recommended threshold.

Conclusion Most PROMs commonly used in ophthalmology require a higher reading level than that recommended by the AMA and NIH and likely contain questions that are too difficult for many patients to read. Greater care is needed in designing PROMs appropriate for the literacy level of a population.

Key messages

What is already known about this subject?

  • Patient-reported outcome measures (PROMs) are being used increasingly to shape policy and practice in healthcare. Previous research has shown that PROMs in other healthcare disciplines have readability levels beyond the average adult. This may limit the population that PROMs represent.

What are the new findings?

  • The majority of PROMs used in three common eye conditions (age-related macular degeneration, glaucoma and diabetic retinopathy) do not meet public health recommended readability levels.

How might these results change the focus of research or clinical practice?

  • Greater care is needed in choosing and designing PROMs appropriate for the literacy level of a population.

Introduction

Health literacy, defined as ‘people’s knowledge, motivation and competences to access, understand, appraise, and apply health information in order to make judgements and take decisions in everyday life concerning healthcare, disease prevention and health promotion to maintain or improve quality of life during the life course’1 has become a well-known term in medical and healthcare communications. At its simplest level health literacy is related to assessment of reading and literacy levels required to engage with one’s own health. Studies assessing the reading level of patient educational materials, such as information websites and leaflets, in ophthalmology suggest a majority of patient information is currently not easily readable.2 3

Patient-reported outcome measures (PROMs) are an increasingly used endpoint in clinical trials.4 5 A clinical endpoint should be a ‘clinically meaningful measure of how a patient feels, functions or survives’.6 In other words, the outcome should be ‘relevant to the patient’.7 In ophthalmology, it has been acknowledged that traditional clinical measures, such as visual acuity, do not reflect the patient’s experience or the impact of disease on patients’ lives8 and PROMs are often used as outcome measures in ophthalmic clinical trials.9–14

There have been several calls to incorporate PROMs into routine practice.5 15–17 PROMs have been used routinely preoperatively and postoperatively in hernia, hip, knee and varicose vein surgery in the UK.15 In the field of ophthalmology, they have been piloted before and after cataract surgery in New Zealand, Sweden and the Netherlands.15 18–20 Yet, this incorporation into routine practice is currently the exception rather than the norm.21

If PROMs are to be effectively used to shape policy, and manage patients then the instruments designed to elicit them ought to be legible and understandable. Studies analysing PROMs used in orthopaedic surgery,22 oral health23 and most recently audiology,24 suggest, for example, that the majority of PROMs may have readability levels beyond that of the average adult. Kroll et al25 reflect on the parts of society that might be most under-represented as a result of this; indeed, Baker et al26 report poorer reading ability to be associated with poorer health. In other words, the people for whom PROMs may be most pertinent, may be the least able to interact with them.

The aim of this study, therefore, was to assess the readability of commonly used PROMs in ophthalmology, and compare these to each other and to population norms for literacy levels.

Materials and methods

Selection of PROMs

A literature search was conducted on 8 August 2018 using search terms related to PROMs and common ophthalmological conditions ([“patient reported outcome measure*” or “PROM” or “questionnaire” or “quality of life” or QoL] and [glaucoma or macula or AMD or ARMD or “diabetic retinopathy” or “diabetic macular oedema” or “diabetic macular edema” or “maculopathy”] and [validat* or rasch or develop*]) in order to identify PROMs that had been used in at least one of age-related macular degeneration, glaucoma or diabetic retinopathy to assess quality of life or visual function. These three conditions were chosen because as they are leading causes of blindness both in the UK and worldwide.27 PROMs were screened for eligibility by three independent researchers (DJT, LE, LJ) using Covidence (www.covidence.org), a computerised literature review management software. Literature search results were imported directly into Covidence, where duplicate studies, and studies reporting on duplicate PROMs were removed. The title and abstract for each study were screened by two authors independently, followed by the full-text articles of those deemed to be relevant. Any disagreements at either stage were discussed until consensus was reached. PROMs were excluded if they had not been administered in the English language, or if they were designed for use in children. PROMs assessing knowledge or health beliefs about a condition were also excluded.

Readability measures

Reading comprehension level determines the readability that a text must have so that a reader can understand these written materials; these were calculated using the Flesch-Kincaid Grade Level test, the FORCAST test, and the Gunning-Fog index using readability calculations software package Oleander Readability Studio 2015 (Oleander Software, Vandalia, Ohio, USA).

Flesch-Kincaid Grade Level test

The Flesch-Kincaid Grade Level test28 is one of the most widely used readability measures, with over 3000 citations in research literature alone. It takes into account both sentence length and syllables per word. Output is in the form of ‘grade level’, the minimum USA grade level which the text is predicted to be suitable for. For example, a Flesch-Kincaid Grade Level score of five would suggest that a text is suitable for those in USA grade 5 (aged 10–11) or higher.

FORCAST test

The FORCAST test,29 initially designed for assessing readability of US military technical reading materials, is considered the most appropriate test for assessing forms, questionnaires and lists because it does not rely on text being in sentence format. Rather, it is calculated taking into account frequency of monosyllabic words in the document. As with Flesch-Kincaid, output is in the form of school grade level.

Gunning-Fog index

The Gunning-Fog index30 is calculated from the average sentence length, and the number of polysyllabic words in a document. Its output is in the form of grade level. This measure was included because of its wide previous use in literature on readability of written healthcare materials.31

Data analysis

The American Medical Association (AMA) and the National Institutes of Health (NIH) recommend readability of patient materials should not exceed the reading level equivalent of a child in the sixth-grade (aged 11–12, see table 1).32 33 (For context, this is the readability level of J K Rowling’s Harry Potter and the Philosopher’s Stone.) Number of PROMs requiring a reading level exceeding this threshold using each readability measure was calculated.

Table 1
|
USA school grade levels with corresponding typical age group

Results

From a total of 47 PROMs eligible for inclusion based on the literature, 40 were available to access. These were inputted to the Readability Studio in full (including all response options and instructions).

Median (IQR) readability scores were 7.9 (5.4–10.5), 9.9 (8.9–10.7) and 8.4 (6.9–11.1) for the Flesch-Kincaid Grade Level test, the FORCAST test and the Gunning-Fog test, respectively. In other words, the PROMs evaluated would be expected to be understood, on average, by an individual with a reading age of a 13–14, 15–16 or 13–14 years old, respectively. Results remained unchanged when instructions were omitted from all PROMs. Depending on the metric used this meant 61% (95% CI 45% to 76%), 100% (95% CI 91% to 100%) and 80% (95% CI 65% to 91%) fell outside the 6th grade reading level recommended by the AMA and NIH (figure 1).

Figure 1
Figure 1

Bar charts showing frequency of patient-reported outcome measures (PROMs) at each grade level, depending on metric used. The black dotted line denotes the 6th grade level; it is recommended that patient materials do not exceed this level.

Characteristics of PROM text

Median word count for PROMs was 416 words (IQR 259–734; minimum words (min) 75, maximum words (max) 4601); 18% of included PROMs had word counts exceeding 1000 words. Documents comprised a median of 10% complex (3+ syllable) words (IQR 8% - 13%; min 4%, max 2%) and a median of 29% long (6+ character) words (IQR 23% - 34%; min 16%, max 50%).

Discussion

An estimated 30 million US adults and 10 million UK adults cannot read beyond a 3rd grade level34 35; almost all PROMs assessed in this study (using any formula) had a readability level beyond this. This is noteworthy. The study’s results suggest that most PROM questionnaires commonly used in ophthalmology require a reading comprehension level better than that recommended by the AMA and NIH for patient material. Moreover, when assessed solely using the FORCAST measure, which is the measure recommended for assessing questionnaire and survey text, all PROMs included in this study fell outside recommended readability levels.

These results support findings from literature in other healthcare disciplines. For example, using the FORCAST measure, PROMs used in oral disease23 and audiology24 were also consistently found to exceed the recommended 6th grade reading level. Another study (although using a different readability formula) reported the average reading level of 59 PROMs used in orthopaedics to equate to that of 16–18 years old. Beyond the readability scores, several PROMs were also found to be markedly lengthy. Eighteen per cent of PROMs were longer than 1000 words. For context, that is almost half the length of this manuscript. The brevity of PROMs has been highlighted in previous literature as a priority to both ophthalmology patients and clinicians; qualitative findings on important factors in questionnaire design from one study36 includes quotations like ‘must be short, practical and useful’, ‘you need something that’s basic and easy to fill in’ and ‘keeping it obviously as brief as possible’.

As the use of PROMs becomes more widespread, it is crucial that their content is accessible and understandable to the majority of their target population. Missing data in PROMs has been reported as a major problem,37 and it is certainly possible that readability may be a contributing factor to lack of motivation to complete PROMs.38 Prospective confirmation of this with empirical data would be helpful. Indeed, population data has shown that people with lower literacy levels are less likely to participate in volunteer activities and more likely to report poor health than those with high literacy levels.39 The NEI-VFQ 25 is currently the most commonly used PROM in ophthalmology and had a reading grade level of 10th grade using the FORCAST measure, and 9th grade using both Flesch-Kincaid and Gunning Fog. This is equivalent to the readability of Moby Dick by Herman Melville.

The results of this study have several implications for future practice and research. Importantly, PROMs that have already gone through extensive development and validation processes should not simply be discarded. However, when choosing from existing PROMs and when designing new PROMs, substantial attention should be paid to the complexity of language used, particularly with respect to word length. It is worth noting that current recommended practice for PROM design includes input from participants in the form of qualitative investigation, from which items are derived using participants’ own language. Therefore, the vocabulary used in PROMs should ideally align with that of a sample of the target population. Yet, the representation of individuals with lower literacy levels in these samples remains an inherent problem. When designing new PROMs, a number of steps can be taken to improve readability. Patients should be involved at each stage of the PROM development process and outreach exercises should be undertaken to ensure that these individuals represent a broad sample of the target population. It is advisable to avoid using technical language, use short sentences, write questions in a conversational style and use words and language consistently.24 40 41 Resources such as the Center for Disease Control and Prevention’s Plain Language Thesaurus for Health Communications,42 and the Living Word Vocabulary43 may be useful for assessing the readability level of certain words, and finding replacements where appropriate. This should be applied to items, response options and instructions. Where use of technical jargon is unavoidable, one may provide a simple glossary of terms used.

The methodology was a key strength of this study. The selection of PROMs was done systematically. A range of validated readability measures relevant to the study’s aims were used, each of which having been well-described in the literature, and used in readability studies of PROMs in different disciplines. In addition, this is the first study of its kind in ophthalmology, and highlights an important limitation and factor to considering when using PROMs. Results are limited by the fact that the analysis was restricted to PROMs that had been used in AMD, glaucoma and/or diabetic retinopathy. These conditions were chosen because they are three of the leading causes of blindness both in the UK and worldwide.27 However, future work ought to systematically review readability of PROMs used across ophthalmic conditions and should then concentrate on ensuring PROMs across ophthalmology are at appropriate readability level for their targeted respondents. Furthermore, results are discussed in the context of population norms and general public health guidance but there is no data available on the literacy levels of specific patient populations.

While careful consideration was given to choosing appropriate readability measures to the study’s aims, each of these formulae come with their own set of limitations. No readability measure is a perfect measure of comprehension.23 In addition, other factors, such as formatting, font and font size used, and method of administration, may all impact the final comprehensibility of a PROM. This is a particularly pertinent consideration for PROMs that may be used among a visually impaired population where reading a PROM in its traditional format may not be possible. Readability scores cannot be applied to situations where a PROM is read out loud to the participant; the ‘listenability’ of a piece of text does not equate to its readability, and listening skills have been recognised as distinct from reading skills.44–46 PROMs may be subject to other weaknesses beyond the scope of this study, such as those relating to their psychometric properties, or difficulties establishing unidimensionality. Finally, the AMA and NIH guidelines used as a benchmark reference are based on US literacy levels and may not be appropriate guidelines for literacy levels in other English-speaking countries. While there are no specific standards in the UK for written health materials, the government recommends that public facing written material should not exceed the reading level of a 9 years old (3rd grade).47 If this standard were applied to the PROMs identified in this study, 93%–100% of PROMs (depending on readability measure used) would fall outside the threshold.

To summarise, most PROM questionnaires and instruments used in three common eye conditions require a literacy level better than that recommended by the AMA and NIH for patient material. It is likely that a majority of PROMs use language at a level too advanced for most patients to read easily. Greater care is required in choosing and designing PROMs appropriate for the literacy level of a population.