Abstract

With recent advancement in computational biology, high throughput next generation sequencing technology has become a de facto standard technology for genes expression studies including DNAs, RNAs and proteins. As a promising technology, it has significant impact on medical sciences and genomic research. However, it generates several millions of short DNA and RNA sequences with several petabytes size in single run. In addition, the raw sequencing datasets such as RNAs are increasing exponentially leading to a big data analytics issue in computational biology. Due to the explosive growth of RNA sequences, the timely classification of RNAs sequence into piRNAs and non-piRNAs have become a challenging issue for traditional technology and conventional machine learning algorithms. Parallel and distributed computing models along with deep neural network have become a major computing platform for big data analytics now required in the field of computational biology. This paper presents a computational model based on parallel deep neural network for timely classification of large number of RNAs sequence into piRNAs and non-piRNAs, taking advantages of parallel and distributed computing platform. The performance of the proposed model was extensively evaluated using two-fold performance metrics. In the first fold, the performance of the proposed model was assessed using accuracy-based metrics such as accuracy, specificity, sensitivity and Matthews’s correlation coefficient. In the second fold, computational-based metrics such as computation times, speedup and scalability were observed. Moreover, initially the performance of the proposed model was assessed using real benchmark dataset and subsequently the performance was assessed using replicated benchmark dataset. The evaluation results in both cases showed that the proposed model improved computation speedup in order of magnitude in comparison with sequential approach without affected accuracy level.

Highlights

  • RNA is an important molecule in computational biology that stores genetic information embedded along a nucleic acid chain in the form of nucleotide bases series

  • The piRNA molecule belongs to a larger class of small noncoding RNA (ncRNA) which is found in animal germline cells and human somatic cell

  • DNN PARAMETERS OPTIMIZATION Deep learning network topology are usually involved a number of parameters that greatly impact on the performance of a model

Read more

Summary

Introduction

RNA is an important molecule in computational biology that stores genetic information embedded along a nucleic acid chain in the form of nucleotide bases series. The cRNA molecules are included mRNA (messenger RNA) which carries genetic information and actively involved in the process of translation of genes (DNA) into proteins. The piRNA molecules having a sequence length of 21 ∼ 35 nucleotides [1]–[4] It is important molecule from the perspective of reproduction and development of germline cells and act as a guardian by protecting the germline cells from attack of transposable elements through transcriptional or post-transcriptional mechanisms [5] [6].

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call