Abstract

In “Omics” era of the life sciences, data is presented in many forms, which represent the information at various levels of biological systems, including data about genome, transcriptome, epigenome, proteome, metabolome, molecular imaging, molecular pathways, different population of people and clinical/medical records. The biological data is big, and its scale has already been well beyond petabyte (PB) even exabyte (EB). Nobody doubts that the biological data will create huge amount of values, if scientists can overcome many challenges, e.g., how to handle the complexity of information, how to integrate the data from very heterogeneous resources, what kind of principles or standards to be adopted when facing with the big data. Tools and techniques for analyzing big biological data enable us to translate massive amount of information into a better understanding of the basic biomedical mechanisms, which can be further applied to translational or personalized medicine. Today, big data is one of the hottest topics in information science, but its concept can be misleading or confusing. The name itself suggests huge amount of data, which, however, represents only one aspect. In general, big data has four important features, so called four V’s: volume of data, velocity of processing the data, variability of data sources, and veracity of the data quality. These four hallmarks of big data require to be characterized by special theory and technology; however, currently there is no satisfactory solution. Now, more biologists are involved with the big data due to the rapid advance of high-throughput biotechnologies. As an example, the Human Genome Project utilized the expertise, infrastructure, and people from 20 institutions and took 13 years of work with over $3 billion to determine the whole genome structure of approximately three billion nucleotides. But now we can sequence a whole human genome for $1000 and within three days. We have spent decades struggling to collect enough biological and biomedical data, but when big data overwhelms us, are we ready to face the challenge? The new bottleneck to this problem in biology is how to reveal the essential mechanisms of biological systems by understanding the big noisy data. Life sciences today need more robust, expressive, computable, quantitative, accurate and precise ways to handle the big data. As a matter of fact, recent works in this area have already brought remarkable advantage and opportunities, which implies the central roles of bioinformatics and bioinformaticians in the future research of the biological and biomedical fields. In the following text, we describe several aspects of big biological data based on our recent studies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call