Abstract

BackgroundSecond generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated.FindingsWe developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads.ConclusionsQuality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

Highlights

  • Second generation technologies have advantages over Sanger; they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage

  • Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies

  • Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction

Read more

Summary

Conclusions

Applying a quality filter to raw sequencing data is required in order to reduce sequence construction error, given that the methodologies available for constructing genomes are based on sequence alignment, in which a wrong base can cause a mismatch, making alignment impossible. The software Quality Assessment allows the operator to visualize quality graphs of the bases in the reads and estimate the coverage based on means or medians, making it possible to select more precise cutoff parameters, reducing the possibility of eliminating high-quality reads or including low-quality reads, which increases the accuracy of the process of constructing genomes from second-generation sequencers. Net Operating system(s): Platform independent Programming language: Java Other requirements: Java JDK 1.6 or higher License: GNU GPL. Genomic sequencing coverage of Cp162 and B7 (tag F3 and R3) for different Phred quality value cutoffs based on the mean and median. Restrictions for use: Permission must be obtained from the author for non-academic/non-public use

Background
Results and Discussion
Bentley S
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call