Abstract
Sequence Assembly is one of the important topics in bioinformatics research. Sequence assembly algorithm has always met the problems of poor assembling precision and low efficiency. In view of these two problems, this paper designs and implements a precise assembling algorithm under the strategy of finding the source of reads based on the MapReduce (SA-BR-MR) and Eulerian path algorithm. Computational results show that SA-BR-MR is more accurate than other algorithms. At the same time, SA-BR-MR calculates 54 sequences which are randomly selected from animals, plants and microorganisms with base lengths from hundreds to tens of thousands from NCBI. All matching rates of the 54 sequences are 100%. For each species, the algorithm summarizes the range of [Formula: see text] which makes the matching rates to be 100%. In order to verify the range of [Formula: see text] value of hepatitis C virus (HCV) and related variants, the randomly selected eight HCV variants are calculated. The results verify the correctness of [Formula: see text] range of hepatitis C and related variants from NCBI. The experiment results provide the basis for sequencing of other variants of the HCV. In addition, Spark platform is a new computing platform based on memory computation, which is featured by high efficiency and suitable for iterative calculation. Therefore, this paper designs and implements sequence assembling algorithm based on the Spark platform under the strategy of finding the source of reads (SA-BR-Spark). In comparison with SA-BR-MR, SA-BR-Spark shows a superior computational speed.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Pattern Recognition and Artificial Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.