Abstract
With the rapid accumulation of publicly available small RNA sequencing datasets, third-party meta-analysis across many datasets is becoming increasingly powerful. Although removing the 3´ adapter is an essential step for small RNA sequencing analysis, the adapter sequence information is not always available in the metadata. The information can be also erroneous even when it is available. In this study, we developed DNApi, a lightweight Python software package that predicts the 3´ adapter sequence de novo and provides the user with cleansed small RNA sequences ready for down stream analysis. Tested on 539 publicly available small RNA libraries accompanied with 3´ adapter sequences in their metadata, DNApi shows near-perfect accuracy (98.5%) with fast runtime (~2.85 seconds per library) and efficient memory usage (~43 MB on average). In addition to 3´ adapter prediction, it is also important to classify whether the input small RNA libraries were already processed, i.e. the 3´ adapters were removed. DNApi perfectly judged that given another batch of datasets, 192 publicly available processed libraries were “ready-to-map” small RNA sequence. DNApi is compatible with Python 2 and 3, and is available at https://github.com/jnktsj/DNApi. The 731 small RNA libraries used for DNApi evaluation were from human tissues and were carefully and manually collected. This study also provides readers with the curated datasets that can be integrated into their studies.
Highlights
Small RNA sequencing profiles the abundance of small RNAs such as small interfering RNAs, microRNAs, and PIWI-interacting RNAs [1]
Tested on 539 publicly available small RNA libraries accompanied with 3 ́ adapter sequences in their metadata, DNApi shows near-perfect accuracy (98.5%) with fast runtime (~2.85 seconds per library) and efficient memory usage (~43 MB on average)
We evaluated our algorithm using the 539 human small RNA libraries in the Gene Expression Omnibus (GEO) with 3 ́ adapter sequences (S1 Table)
Summary
Small RNA sequencing profiles the abundance of small RNAs (typically shorter than 35 nucleotides) such as small interfering RNAs, microRNAs, and PIWI-interacting RNAs [1]. The number of small RNA sequencing datasets in public repositories has been increasing rapidly [2], and meta-analysis of multiple datasets can provide novel insights, for example, [3] and [4]. Adapter removal is an essential step in computational preprocessing of small RNA sequencing libraries, because only those reads that contain the 3 ́ adapter are full-length small RNAs and kept for downstream analysis [5]. Many software packages can perform adapter removal with a given 3 ́ adapter sequence [6,7,8,9]. The adapter information is often not available and can be erroneous in the metadata. The 3 ́ adapters of multiplexed small RNA
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.