Abstract

Over 100 disease-gene associations have been identified by whole-exome sequencing since the first reports in 2010, leading to a revolution in rare disease-gene discovery [1,2]. However, many cases remain unsolved due to the fact that ~100-1000 loss of function, candidate variants remain after removing those deemed as common, low quality or non-pathogenic. In some cases it may be possible to use multiple affected individuals, linkage data, identity-by-descent inference, identification of de novo heterozygous mutations from trio analysis, or prior knowledge of affected pathways to narrow down to the causative variant [3]. Where this is not possible or successful, one approach is to use phenotype data to evaluate whether a particular candidate variant is likely to result in the patient’s clinical manifestations. Model organism phenotype data represents a highly pertinent but under-utilised resource for such disease gene discovery. Whilst some 1800 human genes were associated with human phenotype ontology annotations (HPO) at the time of publication, a further 5700 genes have been shown to have phenotype data available from mouse and zebrafish model organism databases [4]. We have previously developed algorithmic approaches to semantically compare disease phenotypes with mouse and zebrafish phenotypes for disease candidate gene identification [5-8]. We have previously reported that comparisons to mouse phenotype data can dramatically increase the performance of exome analysis prioritization [9]. In the work presented here we combine the comparison of patient phenotypes to known disease as well as mouse and zebrafish phenotypes for each candidate variant in the exome. Where phenotype data is not available for a candidate we use proximity in protein-protein networks to genes with phenotype data to inform on candidacy based on guilt-by-association. The output is combined with measures of variant candidacy such as pathogenicity and allele frequency and synergistically improves performance: the causative variant is identified as the top hit in up to 96% of exomes for known associations and 49% of exomes for previously undescribed associations. Our software, Exomiser, is openly available to use at our website [http://www.sanger.ac.uk/resources/databases/ exomiser/query] and for download to perform local analysis. We are currently collaborating with the NIH Undiagnosed Disease Program to achieve diagnosis of problematic cases through exome analysis. In conclusion, our results clearly show the value of collecting comprehensive clinical phenotype data for translational bioinformatics and future work will focus on producing a robust solution for clinical diagnostics.

Highlights

  • Over 100 disease-gene associations have been identified by whole-exome sequencing since the first reports in 2010, leading to a revolution in rare disease-gene discovery [1,2]

  • Whilst some 1800 human genes were associated with human phenotype ontology annotations (HPO) at the time of publication, a further 5700 genes have been shown to have phenotype data available from mouse and zebrafish model organism databases [4]

  • We have previously developed algorithmic approaches to semantically compare disease phenotypes with mouse and zebrafish phenotypes for disease candidate gene identification [5,6,7,8]

Read more

Summary

Introduction

Over 100 disease-gene associations have been identified by whole-exome sequencing since the first reports in 2010, leading to a revolution in rare disease-gene discovery [1,2]. Model organism phenotype data represents a highly pertinent but under-utilised resource for such disease gene discovery.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call