Abstract

Associations between proteins and RNA–RNA duplexes are important in post-transcriptional regulation of gene expression. The CLASH (Cross-linking, Ligation and Sequencing of Hybrids) technique captures RNA–RNA interactions by physically joining two RNA molecules associated with a protein complex into a single chimeric RNA molecule. These events are relatively rare and considerable effort is needed to detect a small number of chimeric sequences amongst millions of non-chimeric cDNA reads resulting from a CLASH experiment. We present the “hyb” bioinformatics pipeline, which we developed to analyse high-throughput cDNA sequencing data from CLASH experiments. Although primarily designed for use with AGO CLASH data, hyb can also be used for the detection and annotation of chimeric reads in other high-throughput sequencing datasets. We examined the sensitivity and specificity of chimera detection in a test dataset using the BLAST, BLAST+, BLAT, pBLAT and Bowtie2 read alignment programs. We obtained the most reliable results in the shortest time using a combination of preprocessing with Flexbar and subsequent read-mapping using Bowtie2. The “hyb” software is distributed under the GNU GPL (General Public License) and can be downloaded from https://github.com/gkudla/hyb.

Highlights

  • RNA molecules are abundant in all living cells, but, like football fans around the world, they never walk alone

  • Inter- or intra-molecular RNA–RNA interactions are fundamental to many processes including splicing, translation, and gene regulation

  • Post-transcriptional regulation of gene expression mediated by miRNA molecules that base-pair with their RNA targets has been the focus of intense research efforts in recent years [1]

Read more

Summary

Introduction

RNA molecules are abundant in all living cells, but, like football fans around the world, they never walk alone. The development of CLIP (Crosslinking and Immunoprecipitation) [5,6] has allowed mapping of the RNA interactomes for a variety of proteins. Transcriptome-wide analysis of AGO–RNA interactions in human and mouse cells has led to the discovery of many putative miRNA binding sites, a number of which could be confirmed experimentally [7,8,9]. We have recently described CLASH (crosslinking, ligation and sequencing of hybrids), a method for transcriptome-wide analysis of RNA–RNA interactions [11,12]. Analysis of chimeric reads in RNA-Seq experiments has recently been used to identify a novel class of RNAs with regulatory potential, the circRNAs [13,14], and most of these circRNAs can be recovered by hyb

Overview of the method
CLASH data analysis
Preprocessing reads
Mapping reads and calling chimeras
In silico folding and merging of chimeras
Running the entire analysis as a single command
Further analysis of chimeras
Configuring environment variables
Implementation
General considerations
Recovery of simulated fusion and nonfusion reads
Optimization of preprocessing parameters
Optimization of mapping parameters
Optimization of chimera calling parameters
Influence of read and insert lengths
Findings
Mapping chimeras with TopHat2 fusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call