IREAD: a tool for intron retention detection from RNA-seq data

Hong-Dong Li,Nathan D Price,Cory C Funk

doi:10.1186/s12864-020-6541-0

Hong-Dong Li, Nathan D Price + Show 1 more

Open Access

https://doi.org/10.1186/s12864-020-6541-0

Copy DOI

Journal: BMC Genomics	Publication Date: Feb 6, 2020
Citations: 41	License type: open-access

Affiliation: Institute for Systems Biology, Seattle University

Abstract

BackgroundIntron retention (IR) has been traditionally overlooked as ‘noise’ and received negligible attention in the field of gene expression analysis. In recent years, IR has become an emerging field for interrogating transcriptomes because it has been recognized to carry out important biological functions such as gene expression regulation and it has been found to be associated with complex diseases such as cancers. However, methods for detecting IR today are limited. Thus, there is a need to develop novel methods to improve IR detection.ResultsHere we present iREAD (intron REtention Analysis and Detector), a tool to detect IR events genome-wide from high-throughput RNA-seq data. The command line interface for iREAD is implemented in Python. iREAD takes as input a BAM file, representing the transcriptome, and a text file containing the intron coordinates of a genome. It then 1) counts all reads that overlap intron regions, 2) detects IR events by analyzing the features of reads such as depth and distribution patterns, and 3) outputs a list of retained introns into a tab-delimited text file. iREAD provides significant added value in detecting IR compared with output from IRFinder with a higher AUC on all datasets tested. Both methods showed low false positive rates and high false negative rates in different regimes, indicating that use together is generally beneficial. The output from iREAD can be directly used for further exploratory analysis such as differential intron expression and functional enrichment. The software is freely available at https://github.com/genemine/iread.ConclusionBeing complementary to existing tools, iREAD provides a new and generic tool to interrogate poly-A enriched transcriptomic data of intron regions. Intron retention analysis provides a complementary approach for understanding transcriptome.

Highlights

Intron retention (IR) has been traditionally overlooked as ‘noise’ and received negligible attention in the field of gene expression analysis
Generation sequencing has resulted in a vast amount of RNA sequencing (RNA-seq) data, which provides a rich resource for the detection of IR in combination with bioinformatics tools
We present iREAD, for the identification of IR from poly-A enriched RNA-seq data. iREAD takes as input an existing Binary version of sequence alignment map (BAM) file and an annotation file that contains a list of introns that do not overlap any exons of any other splice isoforms or genes

Summary

Results

User interface and usage iREAD is implemented in Python with a command line interface, which can be run on Linux and Mac operating systems. Out of the top ranked 2498 IR events, we found that 287 (11.5%) were shared by both methods (Fig. 3a), which indicates that the intron annotation and/or criteria used by iREAD and IRFinder capture different features of intron retention events. Without using the default thresholds for detecting IRs, we pulled out all the introns analyzed by iREAD and IRFinder, and sorted them by FPKM and IRratio, respectively. Using simulated data with different sequencing depth and gold standard of retained introns of different quality, we found that both iREAD and IRFinder are accurate, and that many of their detected retention events are not shared due presumably to their differences in intron annotation and in the criteria for retention evaluation. Speed We compared the speed of IRFinder and iREAD for intron retention detection using the above-mentioned high sequencing depth mouse sample with 133 million reads (9.9G BAM file).

Background

Conclusion