Abstract
Circular RNAs (circRNAs) are long noncoding RNAs that play a significant role in various biological processes, including embryonic development and stress responses. These regulatory molecules can modulate microRNA activity and are involved in different molecular pathways as indirect regulators of gene expression. Thousands of circRNAs have been described in diverse taxa due to the recent advances in high throughput sequencing technologies, which led to a huge variety of total RNA sequencing being publicly available. A number of circRNA de novo and host gene prediction tools are available to date, but their ability to accurately predict circRNA host genes is limited in the case of low-quality genome assemblies or annotations. Here, we present CircParser, a simple and fast Unix/Linux pipeline that uses the outputs from the most common circular RNAs in silico prediction tools (CIRI, CIRI2, CircExplorer2, find_circ, and circFinder) to annotate circular RNAs, assigning presumptive host genes from local or public databases such as National Center for Biotechnology Information (NCBI). Also, this pipeline can discriminate circular RNAs based on their structural components (exonic, intronic, exon-intronic or intergenic) using a genome annotation file.
Highlights
De novo genome sequencing has become a routine procedure, due to a decrease in sequencing costs, diversification of high-throughput sequencing platforms and improvement of bioinformatic tools (Ekblom & Wolf, 2014)
The five different algorithms predicted on average ∼131 (CircExplorer2); ∼501 (CIRI); ∼706 (CIRI2); ∼257, and ∼398 circRNAs per sample, with an insignificant overlap ∼37 circRNAs (Fig. 2; Table S1), to previously published comparisons (Hansen, 2018; Hansen et al, 2016)
To access the host gene of circular RNAs and to reduce false-positive rates, only overlapping circRNAs (Fig. 2) were used in CircParser. This pipeline allows the elimination of non-informative outputs, while keeping more the relevant blast results and retrieving the likely host gene name for the circular RNAs; in the case of impossibility to find identical sequences in the database, this tool mark these sequences as NOT ASSIGNED)
Summary
De novo genome sequencing has become a routine procedure, due to a decrease in sequencing costs, diversification of high-throughput sequencing platforms and improvement of bioinformatic tools (Ekblom & Wolf, 2014). CircParser: a novel streamlined pipeline for circular RNA structure and host gene prediction in non-model organisms. CircRNAs are relatively poorly studied members of the non-coding RNA family. These unique single-stranded molecules are generated through back-splicing of pre-mRNAs in a wide range of eukaryotic and prokaryotic taxa (Danan et al, 2012; Holdt, Kohlmaier & Teupser, 2018), and even viruses (Huang et al, 2019). CircRNAs play a significant role in the regulation of the molecular pathways through modulating of microRNA and protein activity, and by the affecting transcription or splicing (Holdt, Kohlmaier & Teupser, 2018)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.