Abstract
Reconstructing gene regulatory networks is crucial to understand biological processes and holds potential for developing personalized treatment. Yet, it is still an open problem as state-of-the-art algorithms are often not able to process large amounts of data within reasonable time. Furthermore, many of the existing methods predict numerous false positives and have limited capabilities to integrate other sources of information, such as previously known interactions. Here we introduce KBoost, an algorithm that uses kernel PCA regression, boosting and Bayesian model averaging for fast and accurate reconstruction of gene regulatory networks. We have benchmarked KBoost against other high performing algorithms using three different datasets. The results show that our method compares favorably to other methods across datasets. We have also applied KBoost to a large cohort of close to 2000 breast cancer patients and 24,000 genes in less than 2 h on standard hardware. Our results show that molecularly defined breast cancer subtypes also feature differences in their GRNs. An implementation of KBoost in the form of an R package is available at: https://github.com/Luisiglm/KBoost and as a Bioconductor software package.
Highlights
Reconstructing gene regulatory networks is crucial to understand biological processes and holds potential for developing personalized treatment
The results show that KBoost performs well in both datasets (Table 1)
The results show that KBoost compares favorably to most algorithms and has a similar overall performance as ENNET, a tree gradient boosting algorithm (Table 3)
Summary
Reconstructing gene regulatory networks is crucial to understand biological processes and holds potential for developing personalized treatment. Several groups have used different algorithms based on different mathematical formulations to infer GRNs from gene expression data These include Bayesian networks, correlation metrics, mutual information methods and parametric and non-parametric regression. A seminal paper published in 2012 showed that correlation, mutual information and Bayesian networks tended to perform far worse than methods based on r egression[3] For this reason, in this work we focused only on regression based GRN inference methods. Regression based GRN inference methods build a mathematical model of the expression levels of a target gene given the expression levels of different TFs. The central assumption in these methods is that if the expression level of a TF predicts the expression level of a target gene it is likely regulating it.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.