Discussion
In this study, we developed and validated a CFDLS classifying between clinically significant and non-significant PCO in retroillumination images. The CFDLS showed a robust performance in detecting clinically significant PCO with a sensitivity of 0.84, a specificity of 0.92 and an AUC of 0.9661 (95% CI 0.93 to 0.99) on external validation. This proof of concept shows that CFDLS can be used to develop potential decision support tools and enables clinicians to expand into the clinical research of AI and explore novel use cases of AI applications.
The validation dataset was created in 2002 and consists of images with different degrees of PCO. Findl et al already used the dataset for comparison of four methods (subjective analysis, Evaluation of Posterior Capsule Opacification [EPCO], Posterior Capsule Opacity [POCO] and Automated Quantification of After-Cataract [AQUA] I studies) of PCO quantification.15 Moreover, Kronschläger et al applied the same dataset in creating an automated qualitative and quantitative assessment tool of PCO, that is, AQUA II.20 AQUA II already showed excellent validity and repeatability. Projecting light into the eye causes Purkinje spots on each tissue interface. Additionally, internal reflections of the optics of the system used may appear. Those light spots and reflections (figure 1) on the image cover pathological changes of the posterior capsule and therefore are missing in the grading process. Because the Purkinje spots change their position in slightly different directions of gaze, merging of images of different gaze positions enables the removal of light-reflection artefacts.21 However, at the time of data collection, this method was yet not published.
A rigorous and sound grading process to establish a ground truth is especially important when labels are provided to develop DL classifiers.22 Krause et al demonstrated the importance of arbitration to improve the algorithmic performance for diabetic retinopathy grades.23 Whereas previous grading approaches for PCO focused on quantitative human grading,15 for this study, we decided to proceed with a binary grading that was aiming to label clinical significance. The rationale behind this is based on the visual significance of the inner area of 3 mm; binary labels were used to reduce the risk of PCO underestimation.24 Good intraobserver and interobserver variabilities were achieved by the three expert graders.
Applications using AI are heading towards all fields of medicine. A recent survey from the American College of Radiology showed that 30% of radiologists were using AI in some form in their clinical practice.25 Teleophthalmology may serve as a solution to increasing demands and stretched services in the field of cataract surgery.8 Wu et al have presented results of a universal AI model for a collaborative management of cataract, with referral decisions for preoperative and postoperative grading, requiring a large dataset for training and bespoke modular architectures.8 The model performance in our study in the external validation was respectable, with a sensitivity of 0.84 and a specificity of 0.92, with an AUC of 0.9661, warranting further research using larger datasets.
With CFDLS, clinicians now have the opportunity to explore clinical datasets using cloud-based APIs and GUIs. The ability to understand the complexity of clinical data in combination with code-free platforms will allow clinicians to further explore clinical use cases. Although little coding skills are required to train such bespoken models, the process of data preparation remains to be a major part of such studies. Furthermore, clinicians need to have a good understanding of the importance of labelling, grading, training, well-balanced distributions and potential hidden confounders when developing CFDLS.26 Automated CFDLSs have been shown to perform comparably with bespoke classifiers in ophthalmology and other fields of medicine. On the other hand, lack of adjustable model architectures during training as well as the ‘black box’ phenomena may serve as limitations.27 As explainability methods still remain to be challenging, Ghassemi et al have argued that rigorous internal and external validations serve as a more achievable goal to evaluate the performance of DL systems.28 Classifiers as developed in this study could help to exclude PCO in triage settings and could be externalised into smartphone-based home screening applications. Once revalidated, it may serve as a decision support tool in a referral refinement process. This proof of concept shows that clinicians can use AI to explore novel applications in ophthalmology outside the classic domains of retinal imaging and glaucoma.
Error auditing showed interestingly that peripheral PCO was noted in the cases to be predicted as false positive. This could be refined by first incorporating a preprocessing step of peripheral cropping. Second, formal occlusion testing of the periphery would bolster this justification but was outside the remit of this project. The importance of error auditing in AI cannot be underestimated to identify and prevent algorithmic bias both inside and outside of healthcare.29 30
The limitations of our study include the size of the datasets and the setting of a single centre with a mainly Caucasian population. No multifocal lenses were included in the dataset as the curation predated multifocal IOLs. The model design of the CFDLS in terms of model architecture and hyperparameters is not transparent; it has the potential to diminish machine learning explainability even further due to a lack of understanding of the model architectures and parameters used. Preprocessing of the images limits the scalability but could be incorporated in a more user-friendly application prior to incorporation. To explore generalisation, further evaluation using a larger dataset representing a multiethical population therefore is warranted.
In conclusion, we trained a CFDLS to classify between significant and non-significant PCO on retroillumination images with high sensitivity and specificity. Moreover, the CFDLS equaled human expert graders in reliability. This CFDLS for PCO serves as proof of concept to support the decision whether PCO needs to be addressed by yttrium aluminium garnet capsulotomy, possibly even in a teleophthalmological or triage setting.