Abstract

There are many assemblers with different algorithms that are used for de novo transcriptome assembly. At the same time, the filtering stage, which is one of the key stages, also has several approaches and algorithms. However, to date, there are only few studies on the effect of the degree of filtration on the de novo transcriptome assembly, specially for single-end reads. In this paper, we analyzed transcriptomes obtained using two of the most common software (rnaSPADES and Trinity), and also applied various approaches to the stage of filtering reads. The key differences between the two assemblies were shown and the parameters that were sensitive to the degree of filtering and the length of the input reads were identified. An efficient two-stage filtering algorithm was also proposed, which allows one to preserve the volume of input data as much as possible with the required quality of all reads after filtering and trimming.

Highlights

  • For a deeper understanding of the physiology of processes occurring in organisms under various conditions, it is necessary to study the expression of genes or entire complexes of genes, as one of the aspects of the body's response to a stimulus

  • The common pipeline for de novo assembly consists from several steps: quality control, filtering and trimming, assembling, quality assessment of assemblies, annotation

  • The lengths of the readings were in the range of 35 to 151, more than 90% of which had a length of more than 149

Read more

Summary

Introduction

For a deeper understanding of the physiology of processes occurring in organisms under various conditions, it is necessary to study the expression of genes or entire complexes of genes, as one of the aspects of the body's response to a stimulus. Since most metrics for evaluating the quality of the resulting assemblies are relative (comparative), evaluation of the quality of the resulting transcriptome without a qualitative reference is a difficult task This situation is complicated by the fact that the choice of an effective and optimal algorithm for pre- and post-processing of the obtained readings is still a subject of discussion. The common pipeline for de novo assembly consists from several steps: quality control, filtering and trimming, assembling (with or without reference genome), quality assessment of assemblies, annotation. The algorithms and their dependence on the input data are different for different programs for transcriptome assembly [5]

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.