Abstract

With rapid advancement in the field of bioinformatics and computational biology, the collected DNA dataset is growing exponentially, doubling after every 18 months. Due to large-scale and complex structure of the DNA dataset, the analysis of DNA sequence is becoming computationally a challenging issue in bioinformatics field and computational biology. Fast algorithms, capable of analyzing large-scale DNA sequence, are now required in the field of bioinformatics. This paper presents a novel Parallel Vector Space Model (PVSM) approach that supports the analysis of large-scale DNA sequence by taking advantages of multi-core system. The proposed approach is built on top of modified Vector Space Model (VSM). In order to evaluate the performance of PVSM, the proposed technique is extensively evaluated using varied size of DNA sequences in the context of computational efficiency and accuracy. The performance of PVSM is compared with sequential modified VSM. The sequential VSM is implemented on a single processor whereas, the proposed method is initially parallelized on 4 processors and subsequently on 12 processors. The experimental results show that the PVSM performed better than the sequential VSM. The proposed method achieved approximately 2× speedup compared with sequential approach, without affecting the accuracy level. Moreover, the proposed PVSM is highly scalable with an increase in the number of processing cores and support the analysis of large-scale DNA sequences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call