Vision Science

Efficacy of vision-based treatments for children and teens with amblyopia: a systematic review and meta-analysis of randomised controlled trials

Abstract

Objective To identify differences in efficacy between vision-based treatments for improving visual acuity (VA) of the amblyopic eye in persons aged 4–17 years old.

Data sources Ovid Embase, PubMed (Medline), the Cochrane Library, Vision Cite and Scopus were systematically searched from 1975 to 17 June 2020.

Methods Two independent reviewers screened search results for randomised controlled trials of vision-based amblyopia treatments that specified change in amblyopic eye VA (logMAR) as the primary outcome measure. Quality was assessed via risk of bias and GRADE (Grading of Recommendations, Assessment, Development, and Evaluations).

Results Of the 3346 studies identified, 36 were included in a narrative synthesis. A random effects meta-analysis (five studies) compared the efficacy of binocular treatments versus patching: mean difference −0.03 logMAR; 95% CI 0.01 to 0.04 (p<0.001), favouring patching. An exploratory study-level regression (18 studies) showed no statistically significant differences between vision-based treatments and a reference group of 2–5 hours of patching. Age, sample size and pre-randomisation optical treatment were not statistically significantly associated with changes in amblyopic eye acuity. A network meta-analysis (26 studies) comparing vision-based treatments to patching 2–5 hours found one statistically significant comparison, namely, the favouring of a combination of two treatment arms comparing combination and binocular treatments, against patching 2–5 hours: standard mean difference: 2.63; 95% CI 1.18 to 4.09. However, this result was an indirect comparison calculated from a single study. A linear regression analysis (17 studies) found a significant relationship between adherence and effect size, but the model did not completely fit the data: regression coefficient 0.022; 95% CI 0.004 to 0.040 (p=0.02).

Conclusion We found no clinically relevant differences in treatment efficacy between the treatments included in this review. Adherence to the prescribed hours of treatment varied considerably and may have had an effect on treatment success.

Key messages

What is already known about this subject?

  • Previous meta-analyses or systematic reviews comparing patching to binocular treatments have found no difference or insufficient data to draw any conclusions.

  • Adherence rates to amblyopia treatments range widely and can be quite poor.

What are the new findings?

  • All treatments assessed were not clinically different from 2 to 5 hours patching.

  • Adherence rates are low in many studies, which may affect treatment success.

How might these results change the focus of research or clinical practice?

  • Our results suggest that clinicians have multiple treatment options that they can select based on the needs of their patients.

  • Variability exists in the efficacy of various treatments, in terms of improving amblyopic eye visual acuity.

  • Future studies are encouraged to use objective measures of adherence, where possible, to better understand the true effect of amblyopia treatments.

Introduction

Amblyopia is a neurodevelopmental visual disorder that affects between 0.34% and 3.9% of the population.1 2 Unilateral amblyopia is typically defined as visual acuity (VA) worse than 20/30 in an otherwise healthy eye, alongside a two-line interocular VA difference.3 However, visual deficits caused by amblyopia extend beyond reduced VA and encompass broader deficits such as impaired contrast sensitivity, stereopsis, spatial localisation and global form and motion perception.4–10 These deficits may adversely impact everyday tasks such as reading or playing sports.11–13 Amblyopia also limits career opportunities in fields such as military service, law enforcement, aviation and surgery,3 due to minimum standards of VA and binocularity in these professions.

Unilateral amblyopia results from abnormal visual experience early in life, typically caused by an eye misalignment (strabismus), a significant refractive difference between the eyes (anisometropia) or both (mixed). Deficits arise from impaired cortical processing of visual input from the eye that is chronically defocussed or misaligned.14 While the exact pathophysiology of amblyopia remains unknown, recent evidence suggests that it is a disorder of binocular vision where interocular suppression may play a key role in the resulting visual deficits.15

This systematic review considers vision-based amblyopia treatments that manipulate visual input to the brain, with the intention of changing cortical processing. Conventionally, vision-based amblyopia treatments targeting only the nonamblyopic fellow eye are referred to as monocular treatments. Examples include patching of the fellow eye and the use of atropine drops16 or Bangerter filters17 to reduce fellow eye image quality. These treatments have been shown to effectively improve amblyopic eye VA when treatment adherence is maintained.15 17 18 More recently, binocular approaches that rebalance the strength of visual input between the two eyes19 20 have been developed to overcome interocular suppression and encourage simultaneous perception.21 22 Binocular treatments are designed to improve both amblyopic eye VA and binocular visual function.20 23–28

A number of randomised controlled trials (RCTs) over the past two decades have evaluated the efficacy of monocular (eg, patching, atropine and Bangerter filters) and binocular treatments for improving amblyopic eye VA. Comparisons of vision-based treatments for patients with amblyopia have been examined in systematic reviews comparing patching against atropine29–31 or binocular treatments against patching.32–34 Only one review35 included a meta-analysis, which was limited to two studies and two treatments. In general, published systematic reviews and meta-analyses found no significant differences between the various vision-based amblyopia treatments.36

Treatment adherence, the time the participant spends engaged in the therapy, is a key factor that is often overlooked when assessing treatment efficacy. Poor adherence leads to reduced treatment efficacy.37 38 Holmes et al35 attributed the lack of a treatment effect from their binocular approach to extremely poor adherence, as opposed to the method of the treatment itself. That is, the participants simply were not as engaged as expected. Studies of patching reveal that self-reported adherence rates are variable, ranging from 49% to 87%.38 Therefore, adherence rates can be quite low for children undergoing various types of amblyopia treatments, and this must be considered when determining the true effect of any given treatment.

We conducted this systematic review and meta-analysis to assess the comparative efficacy of vision-based treatments for improving VA of the amblyopic eye. Furthermore, we were interested in how treatment effect size may be impacted by adherence. Our study includes a large sample of RCTs in our systematic review and meta-analysis, with a subanalysis of adherence rates.

Methods

Search strategy

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed in conducting this review.39 The research question and literature search keywords were devised following consultation with a team of clinical and research experts (see online supplemental materials). We used the Population, Intervention, Comparator, Outcome, Time, and Setting (PICOTS) framework (Cochrane Handbook for Systematic Reviews of Interventions40 to specify the parameters of the research question, develop the literature search strategy and devise the eligibility criteria for inclusion of studies in the review (table 1).

Table 1
|
Population, Intervention, Comparator, Outcome, Time, Setting (PICOTS) framework

An information specialist (CC) used the PICOTS to build a comprehensive search strategy for the following databases: PubMed (Medline), Ovid Embase, The Cochrane Library, Scopus and VisionCite. The initial search strategy was developed for PubMed (Medline) and the syntax and search terms were adapted to the other databases. Where available, controlled vocabulary such as medical subject headings was included in the search strategies. The database searches are updated as of 17 June 2020 and the search results were limited to English-language articles. The search strategy and PRISMA checklist are available as online supplemental materials.

Screening

Retrieved citations were imported into RefWorks (ProQuest LLC) for duplicate removal; remaining citations were transferred to DistillerSR (Evidence Partners) for screening by two independent reviewers (AC, TAB) at three levels: title, abstract and full text (figure 1). A third independent reviewer (WB) resolved discrepancies at the abstract and full-text levels. Citations generating discrepancies at title screening were advanced to abstract screening. Article eligibility criteria governing screening were:

Inclusion:

  • RCTs.

  • Full-text published in English.

  • Published between 1975 and 17 June 2020.

  • Investigated one of the following vision-based treatments: patching or Bangerter filters, atropine, binocular treatments (any treatment using both eyes together, excluding optical treatment); combination treatments (any combined treatment that involved patching in addition to another intervention) or optical treatment.

  • At least one group in the study included a vision-based treatment (eg, the other group could be a placebo).

Figure 1
Figure 1

Flowchart of article screening and selection. NMA, network meta-analysis.

Exclusion:

  • Grey literature, conference abstracts, letters, commentaries, review articles or study designs other than RCTs or

  • Only investigated treatments that could be categorised as placebos (eg, a monocular version of a video game as the control group for a binocular game) or that did not directly manipulate visual input to the brain (eg, acupuncture).

Data extraction

Two reviewers (AC, TAB) independently performed double entry data to extract the following information from each study: starting and final sample sizes, mean and SD of age in each group (or overall, if not available), treatment type, treatment dosage, mean and SD of change in VA of the amblyopic eye from baseline in logMAR, 95% CIs of mean difference between treatments, study duration, setting (whether the treatment was prescribed for use at home or in-office) and treatment adherence rates.

Risk of bias

AC and TAB independently assessed the risk of bias (RoB) of the included RCTs at the study level using the Cochrane Risk of Bias 2 tool (22 August 2019 version).40 WB resolved all RoB disagreements. If information related to RoB was not reported, the authors of the study were contacted by e-mail for clarification. Some studies did not mask the outcome assessor, but the concern of it introducing bias was often mitigated through the use of well-validated and automated VA systems. Since poor adherence is a well-documented issue with patching,38 the risk of bias assessment included treatment adherence. For these studies, adherence was primarily based on participant reports.

To assess whether adherence affected the effect sizes (Hedges’ g) of treatment comparisons, we regressed Hedges’ g onto the adherence rates for 26 studies that reported adherence data for all treatment and comparator groups.

Meta-analysis

We conducted a meta-analysis (five studies) comparing patching to binocular treatments.35 41–44 The inverse variance method, DerSimonian-Laird estimator for τ2, and a random effects model to obtain a pooled mean difference and 95% CI for the study-specific mean differences were used to carry out the meta-analysis. There was a high degree of heterogeneity between the studies, with I2=80%; χ2=19.74 (p<0.001), and τ2=0.0017. We used the ‘meta’ package in R V.4.0.2 (The R Foundation for Statistical Computing, Vienna, Austria) to conduct the meta-analyses. GRADEpro (Grading of Recommendations, Assessment, Development, and Evaluations) software (Hamilton, ON: Evidence Prime) was used to evaluate the overall certainty of evidence.

Study-level regression

We conducted an exploratory regression analysis at the study level to examine the relative effect of different treatments on VA. The dependent variable was the treatment-specific improvement in mean amblyopic eye VA from baseline to the end of the trial, as reported in each RCT. The unit of measuring VA was the logarithm of the minimum angle of resolution (logMAR). We included patching 2–5 hours, patching 6–11 hours, patching 12 or more hours, atropine, binocular treatment, combination treatment and intermittent patching (30 s on, 30 s off, using specialised glasses) in the regression analysis. Atropine, binocular treatments and combination treatments did not have a sufficient number of studies to permit separation by dosage.

We modelled each treatment as a dummy variable and used patching 2–5 hours as the reference category. The regression coefficients represented the change in VA of the amblyopic eye for each treatment compared with patching 2–5 hours. Patching 2–5 hours was chosen as the reference because it was the most common treatment dosage employed across RCTs.45 46 We controlled for patient mean age (or median age if the RCT did not report mean age), sample size and whether participants were given optical treatment for four or more weeks prior to the start of the trial.

Since each RCT evaluated two treatments, we modelled ‘study’ as a group-level, random effects variable and fit a restricted maximum likelihood linear mixed model to the data. The other variables (age, sample size and whether spectacles were prescribed at least 4 weeks prior to the start of the trial) were treated as fixed effects. We used the ‘lme4’ package in R V.4.0.2 to conduct the analysis.

Network meta-analysis

To infer relationships between a broader number of treatments beyond those that were directly investigated in head-to-head trials, we undertook a frequentist network meta-analysis (NMA). We used a random effects model to conduct the NMA and measured statistical heterogeneity using the  Inline Formula  test and I2 statistic. For each direct treatment comparison, we extracted the treatment-specific mean changes in logMAR over follow-up and obtained a common effect size, namely, Hedges’ g (a type of standard mean difference (SMD)). Studies that were missing sufficient data to calculate Hedges’ g were excluded from the analysis. Patching treatments were separated into four categories based on the daily prescribed dosage. Combination of treatments was separated by daily prescribed dosage and whether the additional activities were performed at near or at distance. Three studies used a three-arm treatment design, with active therapies including two different binocular treatments47 or a combination treatment and binocular treatment.43 44 The active treatments were combined, and then the SMD was calculated for a combined active category and patching 2–5 hours.

Certainty of treatment efficacy was ranked using p scores, which are analogous to surface under the cumulative ranking curve scores.48 We generated plots to estimate the proportion of direct and indirect evidence contributing to each possible comparison, minimal parallelism and mean path length. Furthermore, we explored the possibility of publication bias using a comparison-adjusted funnel plot and Egger’s test (see online supplemental materials). We used the ‘esc’, ‘netmeta’, and ‘dmetar’ packages in R V.4.0.2 to conduct the NMA.

Patient and public involvement statement

It was not feasible to involve patients or the public in the design, conduct, reporting or dissemination of this project, as it is a meta-analysis on research that has already been conducted.

Results

Following duplicate removal, 3346 citations advanced to the screening phase. We ultimately included 36 RCTs (1%) in the narrative synthesis. From this 36, 5 RCTs (14%) were included in the meta-analysis, 18 in the regression analysis (50%) and 26 in the NMA (72%). The κ for the two screeners was 0.77 at the title and abstract levels (combined) and 1.00 at the full-text screening level.

Narrative synthesis of included studies

All types of vision-based treatments produced VA improvements ranging from 0.06 logMAR to 0.48 logMAR, except for two studies (Pawar et al.; Lee et al.) in which VA declined after patching49 or patching combined with perceptual learning.44 While most treatments led to improved VA from baseline, less than half of the included RCTs (n=17) reported clinically meaningful improvement, which is conventionally defined as a mean improvement in VA of >2 lines (or 0.2 logMAR).1 The most common treatments to achieve this threshold were patching or Bangerter filters (14 conditions) and combination treatments (9). In only 5 of these 17 studies, the active treatment showed a statistically significant difference in amblyopic eye VA improvement from the control group. Therefore, it is rare for studies to show both clinical (an improvement of at least 0.2 logMAR) and statistical significance.

Figure 2 shows the frequency with which each treatment category appeared in the 36 included RCTs, with patching being the most common therapy. Placebo treatments were the least common comparison, likely due to concerns over delaying treatment for young patients. The range of mean ages of participants in the included RCTs was 4.0–14.3 years. Only 10 RCTs had a mean age that was >7 years.

Figure 2
Figure 2

Frequency of vision-based treatments in the literature.

Treatment adherence

Of the 36 included studies, risk of bias was low in 17 and high in 12 (see online supplemental materials). The main reason for high risk of bias was poor adherence rates (seven studies). Adherence to amblyopia treatments was most commonly measured in the literature according to categories set by the Pediatric Eye Disease Investigator Group (PEDIG).50 PEDIG classifies adherence for individual study participants using a percentage score that is calculated by dividing the reported actual dose by the examiner’s prescribed dose. These scores were grouped into four categories: ‘excellent’ (76%–100%), ‘good’ (51%–75%), ‘fair’ (26%–50%) and ‘poor’ (0%–25%). Using these four categories, PEDIG reports the number or percentage of patients in a treatment arm that achieves ‘excellent’ adherence.

Twenty-one of the 36 studies fully reported subjective adherence using the PEDIG classification standards. Over three-quarters of patients achieved ‘excellent’ adherence in only 10 studies. Six studies reported less than half of patients reporting excellent adherence, with the lowest adherence score being a study by Manh et al, wherein only 13% of patients reported excellent adherence.42 Given this variation, it was necessary to examine whether poor adherence influenced the published improvements in VA.

Figure 3 shows the linear regression line between Hedges’ g and adherence rates. When looking at the 17 studies that fully reported adherence rates, the linear regression was significant, demonstrating that treatments with high adherence rates showed larger effect sizes favouring the intervention treatment: regression coefficient 0.022; 95% CI 0.004 to 0.040 (p=0.020). However, the model does not fully explain the data. The regression line may exaggerate the relationship of adherence and effect size.

Figure 3
Figure 3

Histogram (A) examined the Hedges’ g of 12 studies with unreported or incomplete (eg, only reporting adherence rates for the active treatment) adherence data. The data for these studies do not appear to be biased. Scatterplot (B) shows the linear regression comparing effect size of each of the 17 studies as a function of reported adherence (with adherence defined as the percentage of patients achieving “excellent” adherence). Only studies with reported adherence data are included in this scatterplot.

Meta-analysis

Binocular treatment versus patching

We performed a meta-analysis on five RCTs35 41–43 comparing the means of VA improvement for binocular treatments against patching. Figure 4 shows the difference between patching and binocular treatments, which was statistically significant at the 5% level (−0.03 logMAR; 95% CI 0.01 to 0.04). However, this difference is less than two letters and is not clinically significant. There was a high degree of heterogeneity between the studies, with I2=80% and  Inline Formula  = 19.74 (p<0.001). The overall GRADE certainty of evidence for these five studies was assessed, finding an overall low certainty of evidence. This rating was due to serious concerns with inconsistency (high heterogeneity) and low precision (the wide CIs).

Figure 4
Figure 4

Forest plot comparing patching to binocular treatments.

Comparison of multiple vision-based treatments

The exploratory regression comparing any treatment to patching 2–5 hours contained 18 studies. None of the treatments showed a statistically significant difference relative to patching 2–5 hours per day (see online supplemental materials). Furthermore, all treatments showed less than a one letter difference in VA compared with 2–5 hours of patching. Sample size, spectacle use and mean (median) age were not associated with improvements in amblyopic eye VA from baseline in the included RCTs.

An NMA compared all treatments to patching 2–5 hours; the values in the Forest plot, therefore represent the SMD of the treatment in question versus patching 2–5 hours. SMD >0 favours the treatment in question; SMD <0 favours patching 2–5 hours.

The high level of heterogeneity (I2=75.7%) in the NMA confirmed our decision to employ a random effects model. Twenty-six studies were included in the NMA, comparing 14 vision-based therapies to patching 2–5 hours and yielding 26 (direct and indirect) pairwise comparisons (figure 5). Most treatment comparisons involved patching or combination treatments.

Figure 5
Figure 5

Network graph of direct pairwise treatment comparisons. As the number of studies with a specific direct comparison increases, so does the thickness of the line.

The only comparison of SMD between groups that reached statistical significance was found between the combined binocular and combination group and patching 2–5 hours with the combined binocular and combination group having a greater SMD (SMD=2.63, 95% CI 1.18 to 4.09). The p score for the combined binocular group was 0.9988, indicating a high level of certainty for the efficacy of this treatment (see figure 6). However, the finding is from an indirect comparison, and only one of the included RCTs contains this type of therapy. The funnel plot did not show substantial evidence of asymmetry and Egger’s test suggested that publication bias was not present (p=0.1151) (see online supplemental materials).

Figure 6
Figure 6

Forest plot of SMD and P-scores of treatments. The treatments are ranked from highest P-score (most efficacious) to lowest. SMD, standard mean difference.

The results of the NMA should be interpreted with caution. Out of 105 total unique network estimates (treatment comparisons), only 20 contained some proportion of direct evidence (median proportion=0.69; IQR =0.60). The remaining 85 estimates were based entirely on indirect evidence. For 90 of 105 estimates, the minimum number of independent paths contributing to the effect size estimate on an aggregated level (minimal parallelism) was 1; larger numbers of paths support more robust estimates, with the median number of paths being 2.1 (IQR=0.76) in the 15 comparisons with >1 minimum path. For mean path length, which characterises the degree of indirectness of an effect size estimate, values >2 indicate the need to interpret the estimate in question with caution. We found mean path lengths >2 in 80 of the 105 network estimates (plots available from the authors on request).

Discussion

The objective of this systematic review and meta-analysis was to identify an optimal vision-based treatment for improving amblyopic eye VA in 4 to 17 year olds. Our analyses uncovered no clinically important differences between any of the treatments and patching 2–5 hours. Our adherence analysis revealed that poor adherence may be a factor in reducing treatment efficacy and may have affected our results. With high or unclear risk of bias in almost half the included RCTs, the findings of this review should be interpreted with caution.

Our results are similar to a previous NMA showing no significant difference between various amblyopia treatments, and that more research is needed.36 Several literature reviews have specifically compared the efficacy of binocular treatments to patching. A review by Pineles et al did not recommend the use of binocular treatments,33 while other systematic literature reviews concluded that more research was required before making any conclusions about binocular treatments.34 51 More RCTs were available at the time of our literature search than these studies, but the overall strength of evidence for this comparison was low, which implies that further research is still required.

For the NMA, although it was not a significant result, we did not expect placebos to be considered more efficacious than patching 2–5 hours. This result may have arisen because the comparison was indirect and only two studies used a placebo group. Furthermore, the adherence rate for the treatment group of one of the studies was very poor,52 which may explain why the placebo group is ranked as the second-best treatment in the NMA. Nonetheless, it is interesting to see how similar all vision-based treatments appear to be in terms of improving amblyopic eye VA. This implies that clinicians may have multiple treatment options. However, amblyopic eye VA improvements, in general, were small, as fewer than half of the studies reported an improvement greater than 2 LogMAR lines.

Strengths and limitations of the literature

One of the limitations of the literature is that the relatively small number of RCTs prevented us from conducting subanalyses by age or by dosage.

Our exploratory regression analysis showed that optical treatment prior to instituting another form of vision-based treatment was not significantly related to VA improvement. Since the studies that used optical treatment-prescribed spectacles to patients in every group, it was impossible to directly compare the effect of optical treatment to no optical treatment. Additionally, optical treatment durations were variable across many RCTs, with some employing a defined length of time (ranging widely from 4 to 18 weeks) and others waiting until the VA improvement reached a plateau.

Although our exploratory regression did not find an effect for age, it should be noted that 73% of the included RCTs featured a mean age of <7 years. It is possible we did not have a sufficiently wide enough range of ages to discern an effect.

Our meta-analyses revealed a high level of imprecision in the included studies, evidenced by wide CIs passing through the null value. A likely explanation for this variability is poor treatment adherence. It is critical to consider how low treatment adherence can negatively affect treatment efficacy.37 38 Poor adherence was the largest source of potential bias in studies, as identified in the RoB ratings. Of the studies that reported adherence rates, fewer than half had what would be considered good treatment adherence. It is also important to note that adherence data were almost entirely subjective. Many treatments took place at home, unsupervised by the experimenters and in uncontrolled environments. Adherence was reported by parents in the form of diaries or calendars. Subjective reports regularly overestimate adherence rates when compared with objective measures.38 53 54 For example, Holmes et al prescribed a binocular video game treatment to be played at home and found that the average of parent-reported adherence was 66.7% of the total prescribed treatment time, while the game data revealed adherence to be 22.2%.35 Since the subjective adherence rates reported are likely higher than the actual adherence rate, this limits our ability to assess the true impact of adherence. However, these potentially inflated adherence rates were still poor, implying that the problem is more pronounced than what is reported here. Our linear regression showed a significant relationship between effect size and subjective adherence rates. However, the model does not fully explain the data, so this relationship may be exaggerated.

Where possible, robust objective measures should be used to ensure accuracy. Patching adherence can be objectively measured using occlusion dose monitors, which are modified eye patches that contain a battery and the ability to log data about the amount of time the patch is in contact with the skin around the eye.55 Some video game treatments can measure the amount of time a game is turned on or the number of log-ins, but there is no guarantee that the patient is actually looking at the screen while the game is powered on. The simplest option for ensuring adherence objectively is to administer treatment under supervised laboratory conditions, however cumbersome it may be for caregivers.

Strengths and limitations of the review

The major strength of this review is the comprehensive analysis of multiple vision-based therapies drawn from five different databases.3 56 We also included studies that could not be meta-analysed (due to insufficient data reported) in our systematic review to piece together a complete look at the relevant literature. Our results suggest that practitioners have a variety of equally effective treatments at their disposal and should be able to consider both patient and caregiver preferences in the management of amblyopia.

Another strength is the analysis of adherence rates. Previously, Li et al performed an NMA examining various vision-based treatments in patient with amblyopia and concluded that there was no clinically significant difference in the efficacy of these treatments.36 However, this study did not assess adherence rates, which we found to greatly impact the risk of bias rating. The goal of our adherence analysis was to control for adherence as much as possible when assessing treatment efficacy.

Conclusion

Vision-based treatments for amblyopia produce improvements in amblyopic eye VA for patients aged 4 to 17 years, but these improvements are not clinically significantly different from 2 to 5 hours of patching. Adherence must be considered when interpreting this result because many studies had poor or unreported adherence. One critical factor to consider for future studies is objective adherence monitoring, which may explain low treatment effects and high variability in a number of studies.

New vision-based treatments—such as binocular games—continue to be developed19 and may change the landscape of available treatment options for clinicians in 5–10 years time. It is imperative that the literature continues to be surveyed as new studies arise and our understanding of amblyopia evolves.