Abstract

BackgroundThe continuous flow of EST data remains one of the richest sources for discoveries in modern biology. The first step in EST data mining is usually associated with EST clustering, the process of grouping of original fragments according to their annotation, similarity to known genomic DNA or each other. Clustered EST data, accumulated in databases such as UniGene, STACK and TIGR Gene Indices have proven to be crucial in research areas from gene discovery to regulation of gene expression.ResultsWe have developed a new nucleotide sequence matching algorithm and its implementation for clustering EST sequences. The program is based on the original CLU match detection algorithm, which has improved performance over the widely used d2_cluster. The CLU algorithm automatically ignores low-complexity regions like poly-tracts and short tandem repeats.ConclusionCLU represents a new generation of EST clustering algorithm with improved performance over current approaches. An early implementation can be applied in small and medium-size projects. The CLU program is available on an open source basis free of charge. It can be downloaded from

Highlights

  • The continuous flow of Expressed sequence tags (ESTs) data remains one of the richest sources for discoveries in modern biology

  • Expressed sequence tags (ESTs) represent a significant advancement in modern biology. With their introduction in early 90's they represent the first truly high-throughput technology that deluged the databases and made the advent of advanced computer technologies in biology inevitable. The flood of these short, error-prone messages represents another important, not immediately obvious revolution: it has heralded the transition of modern biology from genetics to the genomics era

  • Each cluster in the dynamic list is compared against the rest of the list in a cycle

Read more

Summary

Introduction

The continuous flow of EST data remains one of the richest sources for discoveries in modern biology. Clustered EST data, accumulated in databases such as UniGene, STACK and TIGR Gene Indices have proven to be crucial in research areas from gene discovery to regulation of gene expression. Expressed sequence tags (ESTs) represent a significant advancement in modern biology. With their introduction in early 90's they represent the first truly high-throughput technology that deluged the databases and made the advent of advanced computer technologies in biology inevitable. The flood of these short, error-prone messages represents another important, not immediately obvious revolution: it has heralded the transition of modern biology from genetics to the genomics era. There are more accurate and advanced technologies to analyze the function of genomes, but EST sequencing was one of the first approaches and is still in extensive use today

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.