Abstract

The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data. This in turn requires massive computational resources. Therefore, high performance computing (HPC) platforms are highly needed as well as efficient and scalable algorithms that can take advantage of these platforms. In this paper, we survey the state-of-the-art HPC platforms for big biological data analytics. We first list the characteristics of big biological data and popular computing platforms. Then we provide a taxonomy of different biological data analysis applications and a survey of the way they have been mapped onto various computing platforms. After that, we present a case study to compare the efficiency of different computing platforms for handling the classical biological sequence alignment problem. At last we discuss the open issues in big biological data analytics.

Highlights

  • The rest of this paper is organized as follows: in Section 2 we present the characteristics of big biological data and popular computing platforms

  • We have presented a survey of computing platforms for big biological data analytics in this paper

  • We have discussed the characteristics of these two categories of problems as well as appropriate computing platforms used to solve them

Read more

Summary

Introduction

Big biological data analysis problems have a very high computational requirements even the corresponding algorithms have polynomial time complexities [2]. HPC may provide an efficient tool to solve these problems This is a new area of biological sciences where computational methods are essential for the progress of the experimental science, and where algorithms and experimental techniques are being developed side by side. A survey and taxonomy of HPC big biological data analysis applications on various computing platforms are presented. The rest of this paper is organized as follows: in Section 2 we present the characteristics of big biological data and popular computing platforms.

Characteristics of Big Biological Data Analytics
Computing Platforms and Programming Models
Taxonomy
Whole Genome Sequence
Case Study
Intel MIC
70 GCUPS 62 GCUPS 45 GCPUS 72 GCUPS
High Performance Computing
Performance Scalability
Programming Productivity
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call