Introduction

With the growing prevalence of type 2 diabetes, diabetic retinopathy is set to become the leading cause of visual impairment and blindness [1]. Screening for diabetic retinopathy among diabetic patients is an effective method to decrease the disease burden by early diagnosis and timely medical treatment [1]. Many countries have therefore adopted an annual or biennial screening programme which is usually integrated within regular diabetes care [25]. Nevertheless, a ‘one size fits all’ approach is costly and time-consuming, as only a minority of type 2 diabetic patients eventually develop sight-threatening retinopathy (STR), which is mainly dependent on several well-known risk factors such as diabetes duration, HbA1c, blood pressure and early stages of retinopathy [69]. Consequently, patients at low risk of STR may be over-screened by using a predefined screening interval, while those at high risk of developing STR may be missed. A personalised screening interval in which the individual risk of developing STR is taken into account might therefore be a more appropriate or cost-effective method to screen for STR in type 2 diabetic patients [2, 3, 1016].

Aspelund et al recently developed a model, based on diabetes-related risk factors, in which they aimed to achieve a more personalised approach to screening for STR in diabetes [17]. However, assessment of the accuracy and validity of a model should be performed before applying it to other populations. The Diabetes Care System West-Friesland (DCS) is a large prospective study in which diabetes-related risk factors and complications are extensively measured in patients with type 2 diabetes, offering a unique opportunity to validate the model. The aim of this study was therefore to validate the personalised screening model of Aspelund et al [17] by assessing the accuracy of predicting STR.

Methods

Study population

The data of type 2 diabetic patients included in the dynamic cohort of the DCS were used. The DCS coordinates regional care for type 2 diabetes using a centrally organised database available to all involved care givers. Diabetes nurses and dietitians perform annual follow-up examinations, including glucose control, cardiovascular risk profiling and identifying the presence of micro- and macrovascular complications [18].

In the study period between 1998 and 2010, the DCS cohort consisted of 8,308 type 2 diabetic patients. The cohort is dynamic, meaning that each year newly diagnosed diabetic patients as well as patients with a longer duration of diabetes enter the DCS cohort. For each patient, the year of entry was considered as baseline. Because current type 2 diabetes guidelines recommend annual or biennial screening for retinopathy [25], in most patients diagnostic fundus photography was performed within 2 years after baseline. We therefore included patients in whom the results of graded fundus photographs and variables used to estimate screening frequency were present at baseline (n = 5,483). In patients with missing fundus photograph results at baseline but in whom fundus photograph results were present 1 or 2 years after baseline (n = 1,030), the measurement taken 1 or 2 years after baseline was considered to be the baseline measurement. Of the 6,513 patients, we excluded patients in whom STR was already present at baseline (EURODIAB grade 3, 4 or 5) (n = 61). For 3,133 patients, no graded fundus photographs within 1 year after model-based time of screening were available. Patients who had no graded fundus photographs available within 1 year after model-based time of screening had a significantly higher age (+1.4 years), shorter diabetes duration (−0.3 years), lower HbA1c (−0.7%), lower systolic blood pressure (−5.5 mmHg) and lower diastolic blood pressure (−5.1 mmHg) at baseline. Furthermore, those patients had a significantly lower prevalence of grade 1 retinopathy (3.5%) and a comparable prevalence of grade 2 retinopathy (1.4%) at baseline and were appointed a significantly longer (+9 months) screening interval compared with those included in the current analyses.

Thus, 3,319 type 2 diabetic patients remained for the current analyses. Patients who were included but censored due to incomplete follow-up (n = 991) had a significantly higher age (+6.5 years), longer diabetes duration (+2.4 years), lower HbA1c (−0.21%) and lower diastolic blood pressure (−3.7 mmHg) compared with patients with complete 60-month follow-up data. Mortality rates did not significantly differ between patients with complete (17.5%) or incomplete (16.3%) follow-up (p = 0.2).

Confirmation of diabetes and diabetes duration

Type 2 diabetes was considered confirmed if at least one of the following was reported by patients’ general practitioner: (1) one or more classic symptoms (excessive thirst, polyuria, weight loss, hunger or pruritis) and fasting plasma glucose ≥7.0 mmol/l or random plasma glucose ≥11.1 mmol/l; (2) at least two elevated plasma glucose levels on different occasions (fasting glucose ≥7.0 mmol/l or random plasma glucose ≥11.1 mmol/l) in the absence of symptoms [19]. Diabetes duration was calculated from the date of diabetes diagnosis until the date of the baseline measurement.

Measurements

Annual physical examinations were performed following a standardised protocol. Weight and height were measured with patients barefoot and wearing light clothes. HbA1c was measured using HPLC. Fasting plasma glucose was measured by means of a hexokinase method (Roche Diagnostics, Mannheim, Germany). Levels of total cholesterol, HDL-cholesterol and triacylglycerol were measured using enzymatic techniques (Boehringer-Mannheim, Mannheim, Germany). Systolic and diastolic blood pressures were measured on the right arm after 5 min resting in a seated position using a random-zero sphygmomanometer (Hawksley-Gelman, Lancing, UK). Information on year of onset of diabetes and on country of birth of the patient and the patient’s parents was obtained by self-report [9]. Patients were categorised into two ethnic groups: Western (European countries [except for Turkey], Indonesia, USA and Oceania) and non-Western (Turkey, Morocco, Surinam, Aruba, Netherlands Antilles, Africa, Asia and Latin America).

Diabetic retinopathy

From 1998 until 2000, fundus photography of both eyes was performed with a Kowa Pro Fundus camera fitted with a green filter (Kowa Optical Industry, Torrance, CA, USA). From the beginning of 2000 until 2004, fundus photography of both eyes was performed with a non-mydriatic Canon CR5 camera (Canon, Tokyo, Japan). From 2004, fundus photography of both eyes was performed with a non-mydriatic Topcon TRC NW 100 camera (Topcon, Tokyo, Japan) [9]. All participants were examined with 45° fundus photographs. One photograph was centred on the macula and the other nasally, with the optic disc one disc diameter from the temporal edge. Mydriasis with 0.5% tropicamide and 2.5% phenylephrine eye drops was performed when the non-mydriatic photograph was not gradable. All photographs were graded by an experienced ophthalmologist who is trained as a retinal specialist. Each patient’s grade of retinopathy was based on the grading of the worst eye. All photographs were graded according to the EURODIAB classification score, in which grade 0 is ‘no retinopathy’, grade 1 is ‘minimal non-proliferative retinopathy’ (one or a few scattered haemorrhages or microaneurysms), grade 2 is ‘moderate non-proliferative retinopathy’, grade 3 is ‘severe non-proliferative or preproliferative retinopathy’, grade 4 is ‘photocoagulated retinopathy’ and grade 5 is ‘proliferative retinopathy’ [20]. Because patients with grades 3–5 retinopathy are usually referred to an ophthalmologist for assessment and/or treatment, grades 3–5 were included in the outcome measure and considered as ‘STR’. If abnormalities were seen near to the macula and maculopathy was suspected, the EURODIAB grading was minimally set on 3 and the patient was referred to an ophthalmologist.

Statistical analyses

Variables are presented as percentages, means (±SD) or medians (interquartile range) in case of a skewed distribution. Differences in baseline characteristics between patients with and without grades 1–2 retinopathy at baseline were tested by Student’s t test for normal distributed variables and the Mann–Whitney U test for variables with a skewed distribution. Differences in proportions were tested by the χ 2 test. Using the model of Aspelund et al, STR risk and an accompanying screening interval ranging from 6 to 60 months were calculated for each patient based on sex, HbA1c, systolic blood pressure, presence of retinopathy and diabetes duration [17]. The model is described briefly in the electronic supplementary material.

In patients with incident STR during follow-up, we checked whether STR occurred before or after the model-based time of screening. Outcomes of omitted fundus photographs according to the model and potentially missed cases of retinopathy grades 3–5 were checked. For the total population, we calculated the reduction in screening frequency by comparing the mean screening interval of the population assigned by the model to annual and biennial screening. The predictive accuracy of the model was estimated using calibration and discrimination techniques. Calibration is the ability of the model to predict the number of observed cases of STR during follow-up and it was visually checked by plotting the predicted risk of developing STR against the observed incidence of STR. Participants were grouped into quintiles of predicted STR risk during 60 months of follow-up, which was calculated in the first step of the model. Using Poisson regression for survival data, the observed incidence of STR was calculated within each quintile, taking into account censored data [21]. Within each quintile, observed STR incidence was plotted against predicted risk. Discrimination is the ability to distinguish between those who develop STR during follow-up from those who do not. Discriminatory ability was estimated by calculating Harrell’s C-statistic (similar to the area under the receiver operating characteristic curve), censoring missing data from patients who did not have complete 60 month follow-up data [22]. Discriminatory power is graded as poor for a C-statistic below 0.6, moderate between 0.6 and 0.8, and good for >0.8.

Finally, we investigated whether updating the model for the observed risk of developing STR within our study population improved calibration and discrimination of the model. Updating the model was performed by calculating the calibration factor: (1 − the proportion of patients without STR after 60 months)/mean risk calculated by the model. The calibration factor was then used to update the risk function calculated by the model: calibration factor × mean risk calculated by the model.

All statistical analyses were conducted using SPSS version 20 (SPSS, Chicago, IL, USA) and R version 3.0.2 (Vienna, Austria) for Windows.

Results

A total of 3,319 type 2 diabetic patients were included in the study, of whom 339 (10.2%) presented grade 1 or 2 retinopathy at baseline, mostly mild non-proliferative retinopathy (grade 1) (9.1%). During a mean of 53 months of follow-up, 76 patients (2.3%) developed STR. Patients with prevalent grade 1 or 2 retinopathy at baseline were significantly older, had longer diabetes duration and higher HbA1c levels compared with patients without retinopathy at baseline (Table 1).

Table 1 Baseline characteristics of the type 2 diabetic patient population stratified by retinopathy grade (EURODIAB) at baseline

Table 2 shows the model-based screening interval for the total population (mean [SD] 31.0 [20.0] months), stratified for patients with different grades of retinopathy at baseline. With progressing grades of retinopathy at baseline, the model-based screening interval shortened and the incidence of STR increased.

Table 2 Model-based screening interval and cases of STR for the total population, stratified by grade of retinopathy (EURODIAB) at baseline

Using the model, 2,468 patients (74.4%) were appointed a screening interval longer than 12 months and 1,755 patients (52.9%) were assigned a screening interval longer than 24 months. When using the model, screening frequency could be reduced by 23% compared with biennial screening, and a reduction of 61% could be achieved compared with annual screening.

A total of 76 patients developed STR during follow-up. Of these, 67 patients (88.2%) developed STR after the model-based time of screening (mean [SD] 25.5 [22.2] months), meaning that the model-based screening interval was safe in these patients. Of these, five patients (7.5%) developed STR within 1 year after baseline; according to the model, screening should have been performed after 6 months. Of all the patients who developed STR during a mean of 53 months of follow-up, nine (11.8%) developed STR before the model-based time of screening (mean [SD] 24.3 [13.2] months), which is later than current care. Patients who were potentially missed by the model had significantly lower systolic blood pressure levels (127 vs 146 mmHg) at baseline compared with patients screened in time. HbA1c level was not significantly different between the two patient groups.

The C-statistic of the personalised screening model was 0.83 (95% CI 0.74, 0.92), indicating that the discriminatory power of the model is a good fit. Calibration of the model is depicted in Fig. 1a. The figure shows the observed incidence of STR within each quintile of the predicted risk of developing STR. The model overestimates the risk of developing grades 3–5 retinopathy. Figure 1b shows the calibration plot of the observed incidence of STR against the predicted incidence of STR after updating the model to the observed risk in our study population. Compared with the initial calibration plot, the observed incidence within quintiles of the updated predicted risk lies closer to the diagonal line, meaning that updating the model improved calibration of the model in our study population. When using the updated model, the screening frequency could be reduced by 29% compared with biennial screening and a reduction of 65% could be achieved compared with annual screening. Of all the patients who developed STR during a mean of 53 months of follow-up, 13 developed STR before the updated model-based time of screening (mean [SD] 25.5 [14.3] months), which is later than current care.

Fig. 1
figure 1

Observed incidence of STR (EURODIAB grades 3–5) within quintiles of predicted risk according to the model (a) and according to the updated model based on STR incidence in the study population (b)

Discussion

This study is the first to validate the model for personalised screening of diabetic retinopathy of Aspelund et al [17]. Compared with conventional annual or biennial screening intervals, use of the personalised screening model can attain a substantial reduction in the number of screenings, while safety and efficacy are barely compromised. Furthermore, we showed that the model is a good fit and that it overestimates the risk of developing STR.

Strengths of the present study include the large sample size and its long-term follow-up with measurement of type 2 diabetes-related risk factors and complications. Furthermore, fundus photographs were graded using the internationally accepted EURODIAB classification system [20], which enhances generalisability of our results to other populations.

Using the model, a reduction in screening frequency ranging from 23% to 61% could be achieved compared with biennial or annual screening, respectively. These reductions are comparable to those of Aspelund et al, who showed in a Danish population used to test the fit of the model that 59% of the screenings could be safely omitted. Other studies have already shown that annual screening intervals can be safely prolonged for individuals at low risk of developing STR [2, 3, 1016]. The model of Aspelund et al also integrated a shorter (<12 months) screening interval for patients at high risk of STR, perhaps diminishing the time to diagnosis of STR.

We showed that the model overestimated the risk of developing retinopathy in the present study cohort, which was also true in the Danish cohort used to test the fit of the model [17]. Overestimation of the model may be due to several reasons. First, the present study cohort consists of a well-treated group of type 2 diabetic patients, in whom glucose and blood pressure control are very well maintained. This may result in less precise estimates of the relative contribution of the independent risk factors in the model. Second, baseline prevalence of grades 1–2 retinopathy in our cohort is lower than in the cohorts used by Aspelund et al. Third, estimation of the risk could have been less precise due to the relatively low incidence of grades 3–5 retinopathy (2.3%). Updating the model to the observed risk of STR in our population improved the calibration of the model but did not lead to increased predictive accuracy. Updating the model before using it in populations with a different prevalence of STR is therefore not recommended.

Patients who developed STR during follow-up and were missed by the model had significantly lower baseline systolic blood pressure levels, which led to an underestimation of STR risk. Taking into account previous trends in risk factors, medication use and more specific information on the presence of early stages of retinopathy (grades 1–2) might increase the accuracy of the model in estimating patients’ risk of STR.

In the present analyses, the development of STR was determined using the follow-up data until the first fundus photograph after the model-based screening interval. This might have led to a less precise estimation of the predictive accuracy of the model. However, the time between model-based time of screening and actual fundus photograph results was restricted to a maximum of 1 year.

Mainly due to large model-based screening intervals and the dynamic nature of the cohort, a considerable number of patients had incomplete follow-up. The estimated risk of STR in patients with incomplete follow-up was lower than in patients with 5 years of follow-up. The predictive accuracy of the model was determined by estimating calibration and discrimination of the model while censoring patients with incomplete follow-up. Taking into account patients with incomplete follow-up might have led to a more precise estimate of the accuracy of the model.

We validated the model in a mainly white population. Before extrapolation of the results to patients of other ethnic origins, the model should first be validated in these groups.

The current study was performed in a real-life setting and the fundus photographs were graded by only one grader. We therefore have no information on inter- and intra-rater reliability. The fundus photographs were graded by an experienced ophthalmologist who was trained as a retinal specialist, enhancing reliable grading in a real-life setting [23]. Still, some cases of STR might have been misclassified by the grader, due to the possibility of over-grading background retinopathy as preproliferative retinopathy and due to the enhanced possibility of missing fine exudates by retinal specialists [24]. Differences in grading protocols may also affect the validity of the personalised screening model. Validation before implementing the model in populations where other grading protocols are used is therefore recommended.

A limitation of personalised screening is that it is dependent on the accuracy and punctuality of the care system and care professionals. The use of personalised screening intervals makes the system more prone to errors: for example, missing a fundus screening. It is therefore recommended to use a computerised system when applying the model in clinical care so that reminders for screening are automatically generated to ensure safe follow-up procedures.

To conclude, we have validated a new model for personalised diabetic retinopathy screening and demonstrated that a large reduction in screenings for retinopathy was achieved in this well-managed population with a relatively short duration of diabetes and an extremely low incidence of STR. Further improvement of the model might decrease the number of missed cases of STR. Use of the model for personalised screening may eventually help to reduce healthcare use and costs of diabetes care.