Abstract

Bioinformatics is advanced from in-house computing infrastructure to cloud computing for tackling the vast quantity of biological data. This advance enables large number of collaborative researches to share their works around the world. In view of that, retrieving biological data over the internet becomes more and more difficult because of the explosive growth and frequent changes. Various efforts have been made to address the problems of data discovery and delivery in the cloud framework, but most of them suffer the hindrance by a MapReduce master server to track all available data. In this paper, we propose an alternative approach, called PRKad, which exploits a Peer-to-Peer (P2P) model to achieve efficient data discovery and delivery. PRKad is a Kademlia-based implementation with Round-Trip-Time (RTT) as the associated key, and it locates data according to Distributed Hash Table (DHT) and XOR metric. The simulation results exhibit that our PRKad has the low link latency to retrieve data. As an interdisciplinary application of P2P computing for bioinformatics, PRKad also provides good scalability for servicing a greater number of users in dynamic cloud environments.

Highlights

  • Today new technologies in genomics/proteomics generate biological data with an exponential growth

  • Cloud computing has been regarded as a key approach for processing such a planet-size data, and many bioinformatics applications have been migrated to the cloud environments [4,5,6,7]

  • Bioinformatics clouds are heavily dependent on data, as data are fundamentally crucial for receiving biological insights

Read more

Summary

Introduction

Today new technologies in genomics/proteomics generate biological data with an exponential growth. Cloud computing has been regarded as a key approach for processing such a planet-size data, and many bioinformatics applications have been migrated to the cloud environments [4,5,6,7]. The effectiveness for locating the deluged data in cloud computing is often overlooked, but it is a key problem. From the aspect of retrieving the up-to-date data with less complexity and delay, we settled the existing problems in data discovery. Along these lines, the high computing ability of P2P framework is adopted as a dynamic cloud infrastructure to resolve the challenge caused by massive datasets [11,12,13]

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call