Abstract

Although the American College of Medical Genetics andGenomics/Association for Molecular Pathology (ACMG/AMP) guidelines for variant interpretation are used widely in clinical genetics, there is room for improvement of these knowledge-based guidelines. Statistical assessment of average deleteriousness of start-lost, stop-lost, and in-frame insertion and deletion (indel) variants and extraction of deleterious subsets was performed, being informed by proportions of rare variants in the general population of the Genome Aggregation Database (gnomAD). A machine learning-based model scoring the pathogenicity of start-lost variants (the PoStaL model) was constructed by predicting possible translation initiation sites on transcripts by deep learning and training a random forest on known pathogenic and likely benign variants. The proportion of rare variants was highest in stop-lost variants, followed by in-frame indels and start-lost variants, suggesting that the criteria in the ACMG/AMP guidelines assigning PVS (pathogenic very strong) to start-lost variants and PM (pathogenic moderate) to stop-lost and in-frame indel variants would not be appropriate. Regarding deleterious subsets, stop-lost variants introducing extensions of more than 30 amino acids and in-frame indels computationally predicted to be damaging are enriched for rare and known pathogenic variants. For start-lost variants, we developed the PoStaL model, which outperforms existing tools. We also provide comprehensive lists of the PoStaL scores for start-lost variants and the length of extended amino acids by stop-lost variants. Our study could contribute to refinement of the ACMG/AMP guidelines, provides resources for future investigation, and provides an example of how to improve knowledge-based frameworks by data-driven approaches. The study was supported by grants from the Japan Agency for Medical Research and Development (AMED) and the Japan Society for the Promotion of Science (JSPS).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call