Abstract

Current bioinformatics workflows for PIWI-interacting RNA (piRNA) analysis focus primarily on germline-derived piRNAs and piRNA-clusters. Frequently, they suffer from outdated piRNA databases, questionable quantification methods, and lack of reproducibility. Often, pipelines specific to miRNA analysis are used for the piRNA research in silico. Furthermore, the absence of a well-established database for piRNA annotation, as for miRNA, leads to uniformity issues between studies and generates confusion for data analysts and biologists. For these reasons, we have developed WIND (Workflow for pIRNAs aNd beyonD), a bioinformatics workflow that addresses the crucial issue of piRNA annotation, thereby allowing a reliable analysis of small RNA sequencing data for the identification of piRNAs and other small non-coding RNAs (sncRNAs) that in the past have been incorrectly classified as piRNAs. WIND allows the creation of a comprehensive annotation track of sncRNAs combining information available in RNAcentral, with piRNA sequences from piRNABank, the first database dedicated to piRNA annotation. WIND was built with Docker containers for reproducibility and integrates widely used bioinformatics tools for sequence alignment and quantification. In addition, it includes Bioconductor packages for exploratory data and differential expression analysis. Moreover, WIND implements a "dual" approach for the evaluation of sncRNAs expression level quantifying the aligned reads to the annotated genome and carrying out an alignment-free transcript quantification using reads mapped to the transcriptome. Therefore, a broader range of piRNAs can be annotated, improving their quantification and easing the subsequent downstream analysis. WIND performance has been tested with several small RNA-seq datasets, demonstrating how our approach can be a useful and comprehensive resource to analyse piRNAs and other classes of sncRNAs.

Highlights

  • The GTF file was created as described in the Methods obtaining 149,549 different genomic locations corresponding to 39,812 sequences in human and 925,759 distinct genomic locations corresponding to 95,205 sequences in mouse for all small RNA types

  • We found that the PIWI-interacting RNA (piRNA) population identified in adult cardiac myocyte (aCM) represents 12% of all reads assigned to small RNAs, and the top 100 expressed molecules are listed in Supplementary Table 3 (Extended data)

  • We tried to solve many issues related to small RNA sequencing data analysis and, in particular, piRNA identification and quantification

Read more

Summary

Objectives

The goal of this study was to create a robust workflow for the identification and quantification of piRNA sequences in small RNA sequencing data. The aim of this article was to develop a workflow for sncRNA identification, focused on piRNAs, with the intention of detecting piRNAs independently of the tissue of origin

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call