Next-generation Sequencing Experiments Research Articles

BackgroundAnalyses of molecular high-throughput data often lack in robustness, i.e. results are very sensitive to the addition or removal of a single observation. Therefore, the identification of extreme observations is an important step of quality control before doing further data analysis. Standard outlier detection methods for univariate data are however not applicable, since the considered data are high-dimensional, i.e. multiple hundreds or thousands of features are observed in small samples. Usually, outliers in high-dimensional data are solely detected by visual inspection of a graphical representation of the data by the analyst. Typical graphical representation for high-dimensional data are hierarchical cluster tree or principal component plots. Pure visual approaches depend, however, on the individual judgement of the analyst and are hard to automate. Existing methods for automated outlier detection are only dedicated to data of a single experimental groups.ResultsIn this work we propose to use bagplots, the 2-dimensional extension of the boxplot, to automatically identify outliers in the subspace of the first two principal components of the data. Furthermore, we present for the first time the gemplot, the 3-dimensional extension of boxplot and bagplot, which can be used in the subspace of the first three principal components. Bagplot and gemplot surround the regular observations with convex hulls and observations outside these hulls are regarded as outliers. The convex hulls are determined separately for the observations of each experimental group while the observations of all groups can be displayed in the same subspace of principal components. We demonstrate the usefulness of this approach on multiple sets of artificial data as well as one set of gene expression data from a next-generation sequencing experiment, and compare the new method to other common approaches. Furthermore, we provide an implementation of the gemplot in the package ‘gemPlot’ for the R programming environment.ConclusionsBagplots and gemplots in subspaces of principal components are useful for automated and objective outlier identification in high-dimensional data from molecular high-throughput experiments. A clear advantage over other methods is that multiple experimental groups can be displayed in the same figure although outlier detection is performed for each individual group.

BackgroundRNA editing is a co-transcriptional modification that increases the molecular diversity, alters secondary structure and protein coding sequences by changing the sequence of transcripts. The most common RNA editing modification is the single base substitution (A→I) that is catalyzed by the members of the Adenosine deaminases that act on RNA (ADAR) family. Typically, editing sites are identified as RNA-DNA-differences (RDDs) in a comparison of genome and transcriptome data from next-generation sequencing experiments. However, a method for robust detection of site-specific editing events from replicate RNA-seq data has not been published so far. Even more surprising, condition-specific editing events, which would show up as differences in RNA-RNA comparisons (RRDs) and depend on particular cellular states, are rarely discussed in the literature.ResultsWe present JACUSA, a versatile one-stop solution to detect single nucleotide variant positions from comparing RNA-DNA and/or RNA-RNA sequencing samples. The performance of JACUSA has been carefully evaluated and compared to other variant callers in an in silico benchmark. JACUSA outperforms other algorithms in terms of the F measure, which combines precision and recall, in all benchmark scenarios. This performance margin is highest for the RNA-RNA comparison scenario.We further validated JACUSA’s performance by testing its ability to detect A→I events using sequencing data from a human cell culture experiment and publicly available RNA-seq data from Drosophila melanogaster heads. To this end, we performed whole genome and RNA sequencing of HEK-293 cells on samples with lowered activity of candidate RNA editing enzymes. JACUSA has a higher recall and comparable precision for detecting true editing sites in RDD comparisons of HEK-293 data. Intriguingly, JACUSA captures most A→I events from RRD comparisons of RNA sequencing data derived from Drosophila and HEK-293 data sets.ConclusionOur software JACUSA detects single nucleotide variants by comparing data from next-generation sequencing experiments (RNA-DNA or RNA-RNA). In practice, JACUSA shows higher recall and comparable precision in detecting A→I sites from RNA-DNA comparisons, while showing higher precision and recall in RNA-RNA comparisons.

Next-generation Sequencing Experiments Research Articles

Related Topics

Articles published on Next-generation Sequencing Experiments

Tick-borne pathogen detection: what's new?

Longrange PCR-based next-generation sequencing in pharmacokinetics and pharmacodynamics study of propofol among patients under general anaesthesia

Don't just dump your data and run: Authors should submit as much experimental information as possible when uploading sequence data.

Parallel mapping with site-directed hydroxyl radicals and micrococcal nuclease reveals structural features of positioned nucleosomes in vivo.

New insights into HCV replication in original cells from Aedes mosquitoes

SG-ADVISER mtDNA: a web server for mitochondrial DNA annotation with data from 200 samples of a healthy aging cohort

EccCL: parallelized GPU implementation of Ensemble Classifier Chains

Abstract 170: Differential Regulation of miRNA and mRNA Expression in the Myocardium of Nrf2 Knockout Mice

Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data.

Detecting KRAS and NRAS resistance mutations in plasma of lung cancer patients.

Automated multigroup outlier identification in molecular high-throughput data using bagplots and gemplots

JACUSA: site-specific identification of RNA editing events from replicate sequencing data

Identifying Cryptic Relationships.

Biomarker Detection and Categorization in Ribonucleic Acid Sequencing Meta-Analysis Using Bayesian Hierarchical Models

Combined Action of Histone Reader Modules Regulates NuA4 Local Acetyltransferase Function but Not Its Recruitment on the Genome.

SystemPipeR: NGS workflow and report generation environment

Beta-Binomial Model for the Detection of Rare Mutations in Pooled Next-Generation Sequencing Experiments.

Tcf4 Regulates Synaptic Plasticity, DNA Methylation, and Memory Function

Implementation of a Reliable Next-Generation Sequencing Strategy for Molecular Diagnosis of Dystrophinopathies

Recommendations on e-infrastructures for next-generation sequencing.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Next-generation Sequencing Experiments Research Articles

Related Topics

Articles published on Next-generation Sequencing Experiments

Tick-borne pathogen detection: what's new?

Longrange PCR-based next-generation sequencing in pharmacokinetics and pharmacodynamics study of propofol among patients under general anaesthesia

Don't just dump your data and run: Authors should submit as much experimental information as possible when uploading sequence data.

Parallel mapping with site-directed hydroxyl radicals and micrococcal nuclease reveals structural features of positioned nucleosomes in vivo.

New insights into HCV replication in original cells from Aedes mosquitoes

SG-ADVISER mtDNA: a web server for mitochondrial DNA annotation with data from 200 samples of a healthy aging cohort

EccCL: parallelized GPU implementation of Ensemble Classifier Chains

Abstract 170: Differential Regulation of miRNA and mRNA Expression in the Myocardium of Nrf2 Knockout Mice

Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data.

Detecting KRAS and NRAS resistance mutations in plasma of lung cancer patients.

Automated multigroup outlier identification in molecular high-throughput data using bagplots and gemplots

JACUSA: site-specific identification of RNA editing events from replicate sequencing data

Identifying Cryptic Relationships.

Biomarker Detection and Categorization in Ribonucleic Acid Sequencing Meta-Analysis Using Bayesian Hierarchical Models

Combined Action of Histone Reader Modules Regulates NuA4 Local Acetyltransferase Function but Not Its Recruitment on the Genome.

SystemPipeR: NGS workflow and report generation environment

Beta-Binomial Model for the Detection of Rare Mutations in Pooled Next-Generation Sequencing Experiments.

Tcf4 Regulates Synaptic Plasticity, DNA Methylation, and Memory Function

Implementation of a Reliable Next-Generation Sequencing Strategy for Molecular Diagnosis of Dystrophinopathies

Recommendations on e-infrastructures for next-generation sequencing.