Introduction
Inherited retinal diseases (IRDs) are a leading cause of blindness in children and the working age in many countries.1–4 Variants in over 250 genes are implicated. There are a number of unresolved questions relating to the spectrum of variants and mechanisms of disease.2 Some associated genes are ubiquitously expressed, yet pathogenic variants appear to give rise only to IRD.5 A number of genes show mutational hotspots, while other regions exist that rarely harbour disease-causing variants, either because the regions are highly conserved or because polymorphisms rarely cause disease. Identifying genes, or genetic regions, with particular characteristics might shed light on particular selection pressures, and also help in future interpretation of novel variants.6 7 The range of genes and variants involved in IRDs has been recently reviewed comprehensively by Schneider et al,8 who discussed, among other things, the prevalence of different types of variant, as well as their amenability to various gene-based therapeutic approaches.
Metrics are available from large genomic datasets which can identify those genes in which loss of function variants appear to be under-represented (conventionally termed ‘haploinsufficient’ genes).9 10 These metrics are an indication of those genes in which heterozygosity for loss of function variants is selected against, presumably due to a survival or molecular disadvantage.11 Genes can also be interrogated as to whether missense changes are significantly under-represented or over-represented. It is possible that variants that result in effects on vision, particularly if these are mild, or manifest late in life, will not have a strong effect on survival or reproductive success and so these metrics might not be affected. However, exploring these metrics for IRD genes might still yield insights into aspects of those genes in particular, potentially highlighting particularly conserved pathways, and could improve our understanding of the mutational landscape of IRD-associated genes more generally.
For this study, we curated a list of IRD genes (from the Retinal Information Network online resource, https://sph.uth.edu/retnet/), and investigated the above metrics in two large genomic databases, namely Genome Aggregation Database (gnomAD)10 and DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources (DECIPHER).12 Both databases were used to identify genes with predicted ‘loss of function intolerance’, and the gnomAD resource was used additionally to identify those in which missense mutations were over-represented or under-represented. Genes of interest were evaluated in terms of associated pathways using the online gene ontology resource Protein Analysis THrough Evolutionary Relationships (PANTHER).13
The parameters investigated have been computed for each gene as a whole (based on the range of variants observed in the large datasets), rather than for any specific variants within the genes. Such parameters have been used, with some success, to identify candidate genes in whole genome data from patients with no molecular cause yet identified.14 In the present study, we took a converse approach: we took genes already known to be associated with retinal disease, and interrogated which of these were, in the general population, found to have an under-representation of loss of function variants, and also which had an under-representation or over-representation of missense variants.
We were interested to observe any particular patterns that emerged, estimating the proportion of IRD-associated genes classified as having an under-representation of loss of function variants and whether particular modes of inheritance were more commonly seen in this group. Similar investigations have been performed for loss of function intolerant genes in general,15 but our study focused in particular on IRD genes. We also explored whether such genes were more associated with syndromic disease, and whether certain pathways were over-represented. Identifying those genes with outlying propensities for missense variants could also be potentially useful: those IRD genes in which missense variants are over-represented may constitute ‘noisy genes’ such that missense variants in these genes, when found in patients, should be interpreted with caution.