Background
Prematurity is defined as birth before 37 weeks of pregnancy, classified as either extreme preterm (<28 weeks), very preterm (28–32 weeks) or moderate to late preterm (32–37 weeks).1 There are an estimated 15 million live preterm deliveries each year, primarily in low-income and middle-income countries (LMICs). While the number of preterm births is increasing each year globally, so too are survival rates into adulthood, in large part due to improvements in neonatal intensive care facilities and technology (especially in LMICs). However, with many regions still lacking access to such advancements, maternal and fetal sequelae of prematurity (including retinopathy of prematurity (ROP)) are becoming more and more consequential.1–3
ROP is a proliferative retinal vasculopathy and one of the most common avoidable causes of childhood blindness. Low gestational age, low birth weight and supplemental oxygen at birth are major risk factors for ROP, leading to more than 30 000 children losing vision annually.4 5 Recognition of pertinent screening periods and timely diagnosis and management are challenging due to the lack of available paediatric ophthalmic specialists. And even when available, the variation in classification standards, equipment and examination technique, and treatment thresholds lead to divergent diagnostic concordance even among experts.6 7
Artificial intelligence (AI) algorithms use inputted data to mathematically generate clinical predictions, and the extensive use of ancillary imaging in ophthalmology makes them especially pertinent in aiding diagnosis and informing management for conditions such as ROP8 9 AI has to date been widely applied in the detection of ophthalmological conditions, including diabetic retinopathy, age-related macular degeneration, glaucoma and ROP—shown to perform on-par or better than human clinician.10–12 Through minimising the bias inherent to AI, and external validation of algorithms to optimise for consistency and replicability of predictions, the potential for AI to mitigate the consequences of relative specialist scarcity and provide cost-effective diagnosis and decision-making is powerful. However, currently, there are no commercial AI tools clinically approved for ROP screening.13
The application of teleophthalmology in ROP diagnosis and management in particular (applied to address underserved remote areas) has created an opportunity to collect rich volumes of ophthalmic imaging data, which can be used secondarily to further AI development in this field.7
Prioritising generalisability, fairness and reproducibility when developing AI algorithms is essential to promote nondiscriminatory models. By way of definition, ‘generalisability’ is the ability to provide accurate predictions in a new sample of patients not included in the original training population,14 ‘fairness’ is the assurance that AI systems are not biased in their predictions for subpopulations,15 and ‘reproducibility’ is the system’s capacity to replicate the accuracy in patients not included in the development.14 Code and data sharing are crucial components to facilitate generalisable and reproducible research and validation studies and enable an understanding of how models can be adapted and applied to heterogeneous patient populations globally who stand most to benefit from advancement in AI in ophthalmology.11 12 16
The risk of biased algorithms is a prominent concern in the development and implementation of safe AI and must be addressed to avoid perpetuating existing healthcare disparities. Because of the nature of the patient population in ROP screening and management, medicolegal aspects are also especially crucial before AI can be implemented safely in this space.17
Here, we review ROP studies that implement AI techniques, compare datasets and algorithms characteristics, and analyse efforts to ensure fairness, generalisability and reproducibility of findings.