A Novel Computational Framework to Predict Disease-Related Copy Number Variations by Integrating Multiple Data Sources.

Lin Yuan,Jing Zhao,Tao Sun,Zhen Shen

doi:10.3389/fgene.2021.696956

Abstract

Copy number variation (CNV) may contribute to the development of complex diseases. However, due to the complex mechanism of path association and the lack of sufficient samples, understanding the relationship between CNV and cancer remains a major challenge. The unprecedented abundance of CNV, gene, and disease label data provides us with an opportunity to design a new machine learning framework to predict potential disease-related CNVs. In this paper, we developed a novel machine learning approach, namely, IHI-BMLLR (Integrating Heterogeneous Information sources with Biweight Mid-correlation and L1-regularized Logistic Regression under stability selection), to predict the CNV-disease path associations by using a data set containing CNV, disease state labels, and gene data. CNVs, genes, and diseases are connected through edges and then constitute a biological association network. To construct a biological network, we first used a self-adaptive biweight mid-correlation (BM) formula to calculate correlation coefficients between CNVs and genes. Then, we used logistic regression with L1 penalty (LLR) function to detect genes related to disease. We added stability selection strategy, which can effectively reduce false positives, when using self-adaptive BM and LLR. Finally, a weighted path search algorithm was applied to find top D path associations and important CNVs. The experimental results on both simulation and prostate cancer data show that IHI-BMLLR is significantly better than two state-of-the-art CNV detection methods (i.e., CCRET and DPtest) under false-positive control. Furthermore, we applied IHI-BMLLR to prostate cancer data and found significant path associations. Three new cancer-related genes were discovered in the paths, and these genes need to be verified by biological research in the future.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Genetics	Publication Date: Jun 29, 2021
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Novel Computational Framework to Predict Disease-Related Copy Number Variations by Integrating Multiple Data Sources.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Genetics

Lead the way for us

Similar Papers

Genome-wide Transcriptome Profiling Reveals the Functional Impact of Rare De Novo and Recurrent CNVs in Autism Spectrum Disorders
Rui Luo ... Daniel H Geschwind
The American Journal of Human Genetics | VOL. 91
Rui Luo, et. al.Rui Luo ... Daniel H Geschwind
21 Jun 2012
The American Journal of Human Genetics | VOL. 91

Computational analysis of copy number variation in plant genomes
RaúL Y Wijfjes
-
RaúL Y WijfjesRaúL Y Wijfjes
09 Dec 2021
09 Dec 2021

Clinical Selection of Prenatal Diagnostic Techniques Following Positive Noninvasive Prenatal Screening Results in Southwest China.
Xiaosha Jing ... Quanfang Zhou
Frontiers in Genetics | VOL. 12
Xiaosha Jing, et. al.Xiaosha Jing ... Quanfang Zhou
28 Jan 2022
Frontiers in Genetics | VOL. 12

Validity of the Family‐Based Association Test for Copy Number Variant Data in the Case of Non‐Linear Intensity‐Genotype Relationship
Manuela Zanda ... Suna Onengut
Genetic Epidemiology | VOL. 36
Manuela Zanda, et. al.Manuela Zanda ... Suna Onengut
12 Sep 2012
Genetic Epidemiology | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Novel Computational Framework to Predict Disease-Related Copy Number Variations by Integrating Multiple Data Sources.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Genetics