ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment

Taeho Kim,Hyun Joo

doi:10.1186/1471-2105-11-467

Abstract

BackgroundThere is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC) environment with a greatly extended data storage capacity.ResultsWe designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM) and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called "idle node-seeking task algorithm" (INSTA). The new editing option and the graphical user interface (GUI) provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets.ConclusionsClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1) the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2) Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3) Support for both single PC and distributed cluster systems.

Highlights

There is an increasing demand to assemble and align large-scale biological sequence data sets
Most of the work currently being done in computational biology involves searching for interand intra-sequence homology in massive volumes of genetic and protein sequence data, which are commonly based on a multiple sequence alignments (MSAs) [1]
ClustalXeed allows the alignment of large numbers of nucleic acid or protein sequences

Summary

Results

We designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM) and a distributed file-allocation system for distance matrix construction and pairalign computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called “idle node-seeking task algorithm” (INSTA). The new editing option and the graphical user interface (GUI) provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets

Conclusions

Background

Results and Discussion

11. Li K-B: ClustalW-MPI

17. Rost B

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Sep 17, 2010
Citations: 33	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Glossary
Fran Lewitter ... Janet M Thornton
Trends in Biotechnology | VOL. 16
Fran Lewitter, et. al.Fran Lewitter ... Janet M Thornton
01 Nov 1998
Trends in Biotechnology | VOL. 16

MARS: improving multiple circular sequence alignment using refined sequences
Lorraine A K Ayad ... Solon P Pissis
BMC Genomics | VOL. 18
Lorraine A K Ayad, et. al.Lorraine A K Ayad ... Solon P Pissis
14 Jan 2017
BMC Genomics | VOL. 18

Heuristic Methods for Finding Pathogenic Variants in Gene Coding Sequences
Monique Ohanian ... Diane Fatkin
Journal of the American Heart Association | VOL. 1
Monique Ohanian, et. al.Monique Ohanian ... Diane Fatkin
26 Sep 2012
Journal of the American Heart Association | VOL. 1

Automatic detection of anchor points for multiple sequence alignment
Florian Pitschi ... Claudine Devauchelle
BMC Bioinformatics | VOL. 11
Florian Pitschi, et. al.Florian Pitschi ... Claudine Devauchelle
02 Sep 2010
BMC Bioinformatics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics