Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization

Yu-Shan Huang,Feipei Lai,Hsin Wang,Paul Wuh-Liang Hwu,Yu-Chang Chune,Yi-Lin Lin,Ching Hsu,Ni-Chung Lee,I-Cheng Liao

doi:10.2196/37701

Abstract

Background In recent years, thanks to the rapid development of next-generation sequencing (NGS) technology, an entire human genome can be sequenced in a short period. As a result, NGS technology is now being widely introduced into clinical diagnosis practice, especially for diagnosis of hereditary disorders. Although the exome data of single-nucleotide variant (SNV) can be generated using these approaches, processing the DNA sequence data of a patient requires multiple tools and complex bioinformatics pipelines. Objective This study aims to assist physicians to automatically interpret the genetic variation information generated by NGS in a short period. To determine the true causal variants of a patient with genetic disease, currently, physicians often need to view numerous features on every variant manually and search for literature in different databases to understand the effect of genetic variation. Methods We constructed a machine learning model for predicting disease-causing variants in exome data. We collected sequencing data from whole-exome sequencing (WES) and gene panel as training set, and then integrated variant annotations from multiple genetic databases for model training. The model built ranked SNVs and output the most possible disease-causing candidates. For model testing, we collected WES data from 108 patients with rare genetic disorders in National Taiwan University Hospital. We applied sequencing data and phenotypic information automatically extracted by a keyword extraction tool from patient’s electronic medical records into our machine learning model. Results We succeeded in locating 92.5% (124/134) of the causative variant in the top 10 ranking list among an average of 741 candidate variants per person after filtering. AI Variant Prioritizer was able to assign the target gene to the top rank for around 61.1% (66/108) of the patients, followed by Variant Prioritizer, which assigned it for 44.4% (48/108) of the patients. The cumulative rank result revealed that our AI Variant Prioritizer has the highest accuracy at ranks 1, 5, 10, and 20. It also shows that AI Variant Prioritizer presents better performance than other tools. After adopting the Human Phenotype Ontology (HPO) terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). Conclusions We successfully applied sequencing data from WES and free-text phenotypic information of patient’s disease automatically extracted by the keyword extraction tool for model training and testing. By interpreting our model, we identified which features of variants are important. Besides, we achieved a satisfactory result on finding the target variant in our testing data set. After adopting the HPO terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). The performance of the model is similar to that of manual analysis, and it has been used to help National Taiwan University Hospital with a genetic diagnosis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization

Abstract

Talk to us

Similar Papers

More From: JMIR Bioinformatics and Biotechnology

Lead the way for us

Journal: JMIR Bioinformatics and Biotechnology	Publication Date: Sep 15, 2022
License type: cc-by

Similar Papers

Predicting genes from phenotypes using human phenotype ontology (HPO) terms.
Anne Slavotinek ... Hannah Prasad
Human genetics | VOL. 141
Anne Slavotinek, et. al.Anne Slavotinek ... Hannah Prasad
31 Mar 2022
Human genetics | VOL. 141

Short Read (Next-Generation) Sequencing
Jaya Punetha ... Eric P Hoffman
Circulation: Cardiovascular Genetics | VOL. 6
Jaya Punetha, et. al.Jaya Punetha ... Eric P Hoffman
14 Jul 2013
Circulation: Cardiovascular Genetics | VOL. 6

High-risk phenotypes of genetic disease in a Neonatal Intensive Care Unit population.
Tiantian Xiao ... Wenhao Zhou
Chinese medical journal | VOL. 135
Tiantian Xiao, et. al.Tiantian Xiao ... Wenhao Zhou
13 Jan 2022
Chinese medical journal | VOL. 135

Next Generation Sequencing Technologies and Their Applications
Ku Chee‐Seng ... Loy En Yun
-
Ku Chee‐Seng, et. al.Ku Chee‐Seng ... Loy En Yun
19 Apr 2010
19 Apr 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization

Abstract

Talk to us

Similar Papers

More From: JMIR Bioinformatics and Biotechnology