Abstract
Bioinformatics is advanced from in-house computing infrastructure to cloud computing for tackling the vast quantity of biological data. This advance enables large number of collaborative researches to share their works around the world. In view of that, retrieving biological data over the internet becomes more and more difficult because of the explosive growth and frequent changes. Various efforts have been made to address the problems of data discovery and delivery in the cloud framework, but most of them suffer the hindrance by a MapReduce master server to track all available data. In this paper, we propose an alternative approach, called PRKad, which exploits a Peer-to-Peer (P2P) model to achieve efficient data discovery and delivery. PRKad is a Kademlia-based implementation with Round-Trip-Time (RTT) as the associated key, and it locates data according to Distributed Hash Table (DHT) and XOR metric. The simulation results exhibit that our PRKad has the low link latency to retrieve data. As an interdisciplinary application of P2P computing for bioinformatics, PRKad also provides good scalability for servicing a greater number of users in dynamic cloud environments.
Highlights
Today new technologies in genomics/proteomics generate biological data with an exponential growth
Cloud computing has been regarded as a key approach for processing such a planet-size data, and many bioinformatics applications have been migrated to the cloud environments [4,5,6,7]
Bioinformatics clouds are heavily dependent on data, as data are fundamentally crucial for receiving biological insights
Summary
Today new technologies in genomics/proteomics generate biological data with an exponential growth. Cloud computing has been regarded as a key approach for processing such a planet-size data, and many bioinformatics applications have been migrated to the cloud environments [4,5,6,7]. The effectiveness for locating the deluged data in cloud computing is often overlooked, but it is a key problem. From the aspect of retrieving the up-to-date data with less complexity and delay, we settled the existing problems in data discovery. Along these lines, the high computing ability of P2P framework is adopted as a dynamic cloud infrastructure to resolve the challenge caused by massive datasets [11,12,13]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have