Abstract

BackgroundThe accurate interpretation of RNA-Seq data presents a moving target as scientists continue to introduce new experimental techniques and analysis algorithms. Simulated datasets are an invaluable tool to accurately assess the performance of RNA-Seq analysis methods. However, existing RNA-Seq simulators focus on modeling the technical biases and artifacts of sequencing, rather than on simulating the original RNA samples. A first step in simulating RNA-Seq is to simulate RNA.ResultsTo fill this need, we developed the Configurable And Modular Program Allowing RNA Expression Emulation (CAMPAREE), a simulator using empirical data to simulate diploid RNA samples at the level of individual molecules. We demonstrated CAMPAREE’s use for generating idealized coverage plots from real data, and for adding the ability to generate allele-specific data to existing RNA-Seq simulators that do not natively support this feature.ConclusionsSeparating input sample modeling from library preparation/sequencing offers added flexibility for both users and developers to mix-and-match different sample and sequencing simulators to suit their specific needs. Furthermore, the ability to maintain sample and sequencing simulators independently provides greater agility to incorporate new biological findings about transcriptomics and new developments in sequencing technologies. Additionally, by simulating at the level of individual molecules, CAMPAREE has the potential to model molecules transcribed from the same genes as a heterogeneous population of transcripts with different states of degradation and processing (splicing, editing, etc.). CAMPAREE was developed in Python, is open source, and freely available at https://github.com/itmat/CAMPAREE.

Highlights

  • The accurate interpretation of High-throughput sequencing of RNA (RNA-Seq) data presents a moving target as scientists continue to introduce new experimental techniques and analysis algorithms

  • Despite being developed as the basis for molecule-based RNA-Seq simulations, CAMPAREE is capable of Conclusions Arguably one of the most difficult aspects of simulating RNA-Seq data is generating a realistic RNA population to serve as the basis for the input sample

  • This problem has been largely overlooked by the community, where efforts tend to focus on simulating the biases introduced by library preparation and sequencing

Read more

Summary

Results

We developed the Configurable And Modular Program Allowing RNA Expression Emulation (CAMPAREE), a simulator using empirical data to simulate diploid RNA samples at the level of individual molecules. We demonstrated CAMPAREE’s use for generating idealized coverage plots from real data, and for adding the ability to generate allele-specific data to existing RNA-Seq simulators that do not natively support this feature

Conclusions
Background
Results and discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call