Abstract

Genome-wide association studies have shown that common genetic variants associated with complex diseases are mostly located in non-coding regions, which may not be causal. In addition, the limited number of validated non-coding functional variants makes it difficult to develop an effective supervised learning model. Therefore, improving the accuracy of predicting non-coding causal variants has become critical. This study aims to build a transfer learning-based machine learning method for predicting regulatory variants to overcome the problem of limited sample size. This paper presents a supervised learning method transfer support vector machine (TSVM) for massively parallel reporter assays (MPRA) validated regulatory variants prediction. First, uses a convolutional neural network to extract features with transfer learning. Second, the extracted features are selected by random forest method. Third, the selected features are used to train support vector machine for classification. We performed scale sensitivity experiments on the MPRA dataset and validated the effectiveness of transfer learning. The model achieves the Mcc of 0.326 and the AUC of 0.720, which are higher than the state-of-the-art method. The source code of our proposed TSVM can be found in https://github.com/biomg/TSVM.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call