Abstract

In the past few decades, we have witnessed tremendous advancements in biology, life sciences and healthcare. These advancements are due in no small part to the big data made available by various high-throughput technologies, the ever-advancing computing power, and the algorithmic advancements in machine learning. Specifically, big data analytics such as statistical and machine learning has become an essential tool in these rapidly developing fields. As a result, the subject has drawn increased attention and many review papers have been published in just the past few years on the subject. Different from all existing reviews, this work focuses on the application of systems, engineering principles and techniques in addressing some of the common challenges in big data analytics for biological, biomedical and healthcare applications. Specifically, this review focuses on the following three key areas in biological big data analytics where systems engineering principles and techniques have been playing important roles: the principle of parsimony in addressing overfitting, the dynamic analysis of biological data, and the role of domain knowledge in biological data analytics.

Highlights

  • Massive quantities of data are being generated in biology, the life sciences and healthcare industries and institutions, which hold the promise of advancing our understandings of various biological systems and diseases, developing new biocatalysts and drugs, as well as delivering more affordable and effective patient care

  • We propose a knowledge-guided machine learning (ML) approach by defining model structures based on domain knowledge and hypothesis

  • There are applications where domain knowledge may have appeared to be playing a lesser role, we demonstrated with ample examples that, for analyzing biological big data, domain knowledge has been and will continue to be playing a significant role

Read more

Summary

Introduction

Massive quantities of data are being generated in biology, the life sciences and healthcare industries and institutions, which hold the promise of advancing our understandings of various biological systems and diseases, developing new biocatalysts and drugs, as well as delivering more affordable and effective patient care. To get a big picture of the research in the biological big data analytics field, we conducted a search on the Web of Science using the exact phrase: “big data” and any of the following words or phrases: biology, “life science”, healthcare, “health care”, biomedical, disease, and cancer. Journal articles articles of big big data data analytics analytics in in biology, biology, life sciences and healthcare, and their citation numbers in the past past decade decade based based on on aa Web. Given the the fast-growing fast-growing nature nature of of the the field, field, many many review review papers papers have have been been published published in in just just the the past few years. For ease of reading, these distinctions are largely ignored in this work

Principle of Parsimony in Addressing Overfitting
Checking for Overfitting
Reducing
Reducing Parameter Space
Increasing Sample Space
Summary and Discussion
Dynamic Analysis of Biological Data
Dynamic Metabolic Flux Analysis
Dynamic Analysis of Signal Transduction Networks
Integrated Dynamic Analysis of Multi-Omics Data
Other Applications of Dynamic Data Analysis
The Role of Domain Knowledge in Biological Data Analytics
Knowledge-Guided Unsupervised Learning
Knowledge-Guided Supervised Learning
Knowledge-Guided Feature Engineering and Feature Selection
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call