Abstract

Detection of somatic mutation in whole-exome sequencing data can help elucidate the mechanism of tumor progression. Most computational approaches require exome sequencing for both tumor and normal samples. However, it is more common to sequence exomes for tumor samples only without the paired normal samples. To include these types of data for extensive studies on the process of tumorigenesis, it is necessary to develop an approach for identifying somatic mutations using tumor exome sequencing data only. In this study, we designed a machine learning approach using Deep Neural Network (DNN) and XGBoost to identify somatic mutations in tumor-only exome sequencing data and we integrated this into a pipeline called DNN-Boost. The XGBoost algorithm is used to extract the features from the results of variant callers and these features are then fed into the DNN model as input. The XGBoost algorithm resolves issues of missing values and overfitting. We evaluated our proposed model and compared its performance with other existing benchmark methods. We noted that the DNN-Boost classification model outperformed the benchmark method in classifying somatic mutations from paired tumor-normal exome data and tumor-only exome data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call