Abstract

BackgroundLINCS L1000 is a high-throughput technology that allows gene expression measurement in a large number of assays. However, to fit the measurements of ~1000 genes in the ~500 color channels of LINCS L1000, every two landmark genes are designed to share a single channel. Thus, a deconvolution step is required to infer the expression values of each gene. Any errors in this step can be propagated adversely to the downstream analyses.ResultsWe presented a LINCS L1000 data peak calling R package l1kdeconv based on a new outlier detection method and an aggregate Gaussian mixture model (AGMM). Upon the remove of outliers and the borrowing information among similar samples, l1kdeconv showed more stable and better performance than methods commonly used in LINCS L1000 data deconvolution.ConclusionsBased on the benchmark using both simulated data and real data, the l1kdeconv package achieved more stable results than the commonly used LINCS L1000 data deconvolution methods.

Highlights

  • Library of Integrated Network-based Cellular Signatures (LINCS) L1000 is a high-throughput technology that allows gene expression measurement in a large number of assays

  • Aggregate Gaussian mixture model To accurately detect the expression values of a pair of genes, we introduced a novel peak calling method to borrow the information of the samples at the same condition

  • The construction of simulation dataset To make a realistic evaluation of the performance of l1kdeconv, a simulated dataset was created with the key characteristics of LINCS L1000 data using a hierarchical model as described below

Read more

Summary

Results

The construction of simulation dataset To make a realistic evaluation of the performance of l1kdeconv, a simulated dataset was created with the key characteristics of LINCS L1000 data using a hierarchical model as described below. The Williams’s Test [2, 5] between the correlations shows that the PCC of AGMM with outlier detection is more significant than k-medians and naïve GMM with the pvalue less than 1 × 10−20. AGMM with the outlier detection resulted in 131% and 36% improvements compared with k-medians and naïve GMM, respectively. Unlike calling the peaks of each replicate separately by k-medians and naïve GMM, AGMM filters the outlier of each sample, and deconvolutes them in one group. −0.15 k-medians and naïve GMM are not flipped, AGMM with the outlier detection can fit the real data better.

Background
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call