Discussion
Using data from the MDM trial, we could, for the first time, evaluate the advantages and disadvantages of the ROP activity scale on a prospective, controlled, randomised study, where this scale was intended for use. Our study showed that ROP-ActS did not discriminate better for the MDM study’s primary variable than the originally defined severe ROP (stage 3 and type 1 ROP), although the AA:DHA treatment group had numerically lower mean levels of ROP-ActS than the control group.
The choice of a study’s primary endpoint is crucial when we are interested in demonstrating an effect of a therapy. Besides the chosen statistical methodology, the variable type and the expected effect size, affecting the study’s sample size, must fulfil the requirements of being the most clinically relevant candidate, a priori and clearly defined, validated, reliable and evidence based in order to follow recommendations from the authorities.17 The MDM trial’s primary variable, severe ROP, was an a priori and clearly defined variable. The primary variable was validated and reliable since the diagnosis protocol for ROP stages, and type 1 ROP was followed by all investigating ophthalmologists at the study centres as recommended.9 16 It also fulfilled the requirement of clinical relevance as it identifies infants with the highest risk for end-stage ROP. Furthermore, both preclinical and clinical studies have demonstrated that higher levels of AA and DHA fatty acids have protective effects against pathological neovascularisation and severe ROP. Hence, the study’s primary variable was also evidence based.18–21
Given the information above, we ask ourselves: What advantage would be gained by introducing an ROP-ActS in such a trial?
First, a well-ordered continuous scale would be a more sensitive measure promoting even smaller confirmatory clinical trials to have a successful outcome, provided that the treatment of interest benefits a cohort uniformly along the scale. In the retrospective validation study of ROP-ActS, including 535 infants and 3324 ROP screening examinations, we concluded that the original ROP-ActS was relatively well ordered against the short-term outcome, treated ROP. It was proposed to switch scores 3 (zone II stage 1) and 5 (zone III stage 3).12 In the current study, the percentage of infants with score 3 that finally led to ROP treatment was 24%, and for score 5 was 22%, compared with 33% and 14% in the retrospective study. The decision for switching the scores 3 and 5 would not be as substantiated in this study as in the previous one. Therefore, the scale’s well-ordering characteristic against various outcomes needs further evaluation.
Second, an approximately linear continuous scale would imply better prediction ability and facilitate the statistical evaluation. Studying a continuous scale as a linear variable in a regression analysis would mean that we only require one parameter to be estimated, implying a simpler statistical analysis. Besides increasing the power, this would also contribute to the well and clearly defined study variable. To achieve a completely linear scale, with regards to different populations, outcome and treatments, would be a monumental, if not impossible exercise. However, it is important to continue to try to find an approximately linear scale versus most commonly studied outcomes. It should be emphasised that in assessing the linearity of a scale, we must evaluate its linearity in predicting a certain outcome, and linearity on the components of the studied treatment. For example, in the current study, we reported similar percentages between the treatment and a non-treatment group for infants with no ROP, but noted treatment differences for more severe ROP. A discrete linear scale does not need to have the same difference between each of the two consecutive scores. Neither does it need to have one score per combination of the three variables. Some combinations might have the same risk for the outcome of interest, and some combinations might not be clinically justifiable. In the retrospective and the current study, no infants with scores 4 (zone III stage 1+), 6 (zone III stage 2+), 11 (zone II stage 1+), 15 (zone I stage 1+) and 17 (zone I stage 2+) were observed. More extensive future studies will reveal whether those scores are non-existing or are just rare. Given the rare observation of certain combinations of the three variables currently comprising the scale (stage, zone and plus disease), the methodical development of an ROP-ActS scale and evaluation of its linearity and well-ordering characteristic in relation to various ROP outcomes would require large data sets, including longitudinal follow-up. Following the recently published third edition of the International Classification of ROP, initiated due to subjectivity issues in diagnostics, regression and re-activation of the disease due to new therapies and new features identified through innovations in imaging, the classification of ROP has been further refined with new ROP-related metrics.22 Therefore, ROP experts should help decide whether any of these new metrics are suitable for incorporation into the scale.
Third, early postnatal values of a well-ordered and an approximately linear scale might potentially detect the treatment effect already in the early follow-up that could imply shorter duration of clinical trials.
Last but not least, longitudinal postnatal values of a well-ordered and an approximately linear scale could be used for identifying different patterns of ROP development for different subgroups, important for understanding physiological processes. In an exploratory study, we visually distinguished different profiles for the two sexes and GA strata for the investigational product and the control group. Those interactions were significant. Larger data sets could be used to identify mean levels and ranges for specific GAs or other relevant subgroups.
The prevention of ROP by AA:DHA supplementation was shown to be significant on the most severe parts of the disease scale, severe ROP 20% versus 37%, but not overall, any ROP 58% vs 60%, in the AA:DHA versus the control group. Therefore, on one hand, such a continuous severity scale would not be a better primary variable candidate for therapies that do not uniformly reduce the incidence along the entire severity scale. On the other hand, therapies that are active uniformly on the complete severity profile of the disease would benefit from choosing a continuous severity scale over a dichotomous variable.
The strength of this study is its prospective design of a randomised clinical trial with well-defined and controlled study variables. The analyses were predefined. For the context of validating the scale, the relatively small sample size in relation to the rare studied outcomes and certain combinations of the scale is a limitation. However, the power calculation was performed before initiation of this substudy's analyses, although it was found to overestimate the observed treatment effect based on the current version of the scale. In the current and the retrospective study, only scores 1–18 were examined due to unavailable data for longer term outcome. Additionally, post-treatment validation is still lacking for the scale. However, having ROP-ActS planned as one of the study’s secondary variables, the post-treatment validation will hopefully be feasible in the FIREFLEYE trial that aims to compare the effect of Eylea versus laser therapy.11
In this study, we conclude that the MDM study’s primary variable, severe ROP, was a better candidate for discrimination of the preventive effect of the AA:DHA supplementation versus no supplementation than the evaluated ROP-ActS, ROP stage, zone and plus disease. The ROP-ActS requires further validation on large data sets with longitudinal follow-up concerning its well ordering and linearity against relevant short-term and long-term outcomes and ROP treatments, and potential update of the scale.