Abstract

A vast amount of public RNA-sequencing datasets have been generated and used widely to study transcriptome mechanisms. These data offer precious opportunity for advancing biological research in transcriptome studies such as alternative splicing. We report the first large-scale integrated analysis of RNA-Seq data of splicing factors for systematically identifying key factors in diseases and biological processes. We analyzed 1,321 RNA-Seq libraries of various mouse tissues and cell lines, comprising more than 6.6 TB sequences from 75 independent studies that experimentally manipulated 56 splicing factors. Using these data, RNA splicing signatures and gene expression signatures were computed, and signature comparison analysis identified a list of key splicing factors in Rett syndrome and cold-induced thermogenesis. We show that cold-induced RNA-binding proteins rescue the neurite outgrowth defects in Rett syndrome using neuronal morphology analysis, and we also reveal that SRSF1 and PTBP1 are required for energy expenditure in adipocytes using metabolic flux analysis. Our study provides an integrated analysis for identifying key factors in diseases and biological processes and highlights the importance of public data resources for identifying hypotheses for experimental testing.

Highlights

  • High-throughput expression profiling has been used to identify transcriptional changes associated with many diseases and biological processes (BPs)

  • Standard RNA-Seq analysis with data limited to a specific biological context is unable to identify key factors in a disease or BP

  • We filled this void by generating a comprehensive compendium of RNA-Seq data for 56 splicing factors (SFs); these expression profiles were used in an integrated analysis to reveal key factors in diseases and BPs

Read more

Summary

Introduction

High-throughput expression profiling has been used to identify transcriptional changes associated with many diseases and biological processes (BPs). Given the large scale of high-throughput expression profiling data that are publicly available, any method that can utilize these data to identify upstream factors of transcription in diseases and BPs will be of great value. As a popular method for transcriptome analysis, RNA-sequencing (RNA-Seq)[4] has enabled genome-wide analyses of RNA molecules at a high sequencing depth with high accuracy It has been used successfully on many mouse models[5,6], and thousands of RNA-Seq datasets have been generated and released to the public. This massive amount of biological data www.nature.com/scientificdata brings great opportunity for generating prominent biological hypotheses[7,8]. We developed an integrated analysis to reveal upstream factors of post-transcriptional changes and transcriptional changes in diseases and BPs using these public RNA-Seq data

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call