Abstract

BackgroundExpressed Sequence Tags (ESTs) have played significant roles in gene discovery and gene functional analysis, especially for non-model organisms. For organisms with no full genome sequences available, ESTs are normally assembled into longer consensus sequences for further downstream analysis. However current de novo EST assembly programs often generate large number of assembly errors that will negatively affect the downstream analysis. In order to generate more accurate consensus sequences from ESTs, tools are needed to reduce or eliminate errors from de novo assemblies.ResultsWe present iAssembler, a pipeline that can assemble large-scale ESTs into consensus sequences with significantly higher accuracy than current existing assemblers. iAssembler employs MIRA and CAP3 assemblers to generate initial assemblies, followed by identifying and correcting two common types of transcriptome assembly errors: 1) ESTs from different transcripts (mainly alternatively spliced transcripts or paralogs) are incorrectly assembled into same contigs; and 2) ESTs from same transcripts fail to be assembled together. iAssembler can be used to assemble ESTs generated using the traditional Sanger method and/or the Roche-454 massive parallel pyrosequencing technology.ConclusionWe compared performances of iAssembler and several other de novo EST assembly programs using both Roche-454 and Sanger EST datasets. It demonstrated that iAssembler generated significantly more accurate consensus sequences than other assembly programs.

Highlights

  • Expressed Sequence Tags (ESTs) have played significant roles in gene discovery and gene functional analysis, especially for non-model organisms

  • The assembly error rates are high for Roche-454 ESTs as we have constantly observed that large portion of Roche-454 unigenes contain assembly errors after reanalyzing several published datasets

  • The three assemblers in iAssembler, MIRA, CAP3 and megablast assemblers, are all base on the overlap-layout-consensus strategy iAssembler is applicable for ESTs with relative long sequences, such as those generated using Sanger and/or Roche-454 platforms

Read more

Summary

Results

We present iAssembler, a pipeline that can assemble large-scale ESTs into consensus sequences with significantly higher accuracy than current existing assemblers. IAssembler employs MIRA and CAP3 assemblers to generate initial assemblies, followed by identifying and correcting two common types of transcriptome assembly errors: 1) ESTs from different transcripts (mainly alternatively spliced transcripts or paralogs) are incorrectly assembled into same contigs; and 2) ESTs from same transcripts fail to be assembled together. IAssembler can be used to assemble ESTs generated using the traditional Sanger method and/or the Roche-454 massive parallel pyrosequencing technology

Background
Conclusion
Rudd S
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call