Abstract

With the digital universe now having surpassed the zetabyte threshold, the push is on to expand advanced high-performance computing infrastructures to manage and store this vast digital universe in ways that facilitate mining of the data, and then developing and applying more sophisticated mathematical algorithms to extract knowledge from it. The promise of effectively mining big data is nothing less than achieving a higher level of understanding in nearly every facet of life, from climate change to the complex patterns in financial markets, to the complexity of living systems. The life sciences stand poised to lead in both the generation of big data and the realization of dramatic benefit from it, whether predicting and preventing the next big outbreak or uncovering the best ways to treat, prevent, and cure common human diseases. We can now score variations in DNA across whole genomes; RNA levels and alternative isoforms, metabolite levels, protein levels and protein state information across the transcriptome, metabolome and proteome; methylation status across the methylome; and construct extensive protein–protein and protein–DNA interaction maps, all in a comprehensive fashion and at the scale of populations of individuals. Interactions among these molecular entities define the complex web of biological processes that give rise to all higher-order phenotypes, including disease. The development of analytical approaches that simultaneously integrate different dimensions of data is essential if we are to extract meaning from these large-scale data to elucidate the complexity of living systems. In this chapter I describe a number of analytical approaches aimed at inferring causal relationships among variables in very large-scale datasets by leveraging DNA variation as a systematic perturbation source. The causal inference procedures are also demonstrated to enhance the ability to reconstruct truly predictive, probabilistic causal gene networks that reflect the biological processes underlying complex phenotypes such as disease. By integrating many different dimensions of data simultaneously to construct these networks, I detail examples of how to construct and then apply network models to uncover causal relationships among molecular phenotypes like gene expression and between molecular and high-order traits such as disease.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.