Abstract

Bioinformatics techniques to analyze time course bulk and single cell omics data are advancing. The absence of a known ground truth of the dynamics of molecular changes challenges benchmarking their performance on real data. Realistic simulated time-course datasets are essential to assess the performance of time course bioinformatics algorithms. We develop an R/Bioconductor package, CancerInSilico, to simulate bulk and single cell transcriptional data from a known ground truth obtained from mathematical models of cellular systems. This package contains a general R infrastructure for running cell-based models and simulating gene expression data based on the model states. We show how to use this package to simulate a gene expression data set and consequently benchmark analysis methods on this data set with a known ground truth. The package is freely available via Bioconductor: http://bioconductor.org/packages/CancerInSilico/

Highlights

  • The purpose of this supplement is to clearly describe the underlying machinery of the CancerInSilico package

  • Whereas the main manuscript focused on the high level design of the package and demonstrated some use cases, this document will focus on a detailed explanation of the parameters and models implemented in the package

  • There are three primary components of this simulation: the cellular growth model, the pathway activity simulation, and the platform specific error model. Each of these components contain distinct parameters that can be controlled by the user, and so understanding how these components work together is critical for assessing parameter sensitivity

Read more

Summary

Introduction

The purpose of this supplement is to clearly describe the underlying machinery of the CancerInSilico package. Whereas the main manuscript focused on the high level design of the package and demonstrated some use cases, this document will focus on a detailed explanation of the parameters and models implemented in the package. The main feature of CancerInSilico is the simulation of gene expression data. There are three primary components of this simulation: the cellular growth model, the pathway activity simulation, and the platform specific error model. Each of these components contain distinct parameters that can be controlled by the user, and so understanding how these components work together is critical for assessing parameter sensitivity. Given the number of parameters available to the user, parameter sensitivity is a critical part of any analysis done with CancerInSilico

General Model Framework
CellModel Parameters Description
Pathway Activity Simulation
Statistical Error Model for Gene Expression Data
Error Model Specific Parameters Description
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call