Bioinformatics Challenges Research Articles

Abstract Growing interest in cancer classification and progression has accelerated the rate of novel gene fusions discovery with increasing recognition of their roles as biomarkers. RNA-Seq is an attractive method for expressed fusion discovery and detection because of its ability to provide unbiased fusion sequencing information. The ability to detect low expressing fusion transcripts, however, require high sequencing depth and represents a significant financial barrier and identification of clinically relevant fusion sequences from a large data set can be a bioinformatics challenge. To address these challenges we have tested the Ovation® Fusion Panel Target Enrichment System V2, a targeted RNA sequencing method using the Single Primer Enrichment Technology (SPET), with a number of control and clinical samples. Initial studies were performed using a comprehensive target enrichment panel targeting 502 genes with three samples from Horizon DX containing known fusions. Target enriched libraries were constructed with 10 ng and 100 ng inputs and the data was analyzed using the NuFuseD pipeline (available as a point and click BaseSpace application or downloadable linux package) which has been optimized for fusion analysis from this data. Expected fusions were identified at both input levels, even when down sampled to 500K reads, with fewer fusion calls compared to other publically available fusion detection software (Chimerascan and SOAPFuse), suggesting a lower false positive rate. NuFuseD fusion calls are provided with a P-value to help prioritize the identified fusions for subsequent validation. Additionally, NuFuseD detected novel fusions in the control samples demonstrating the advantage of a comprehensive panel compared to more restricted panels. We further validated the target panel using control RNA (UHR and Human Brain) and fresh or FFPE cell lines (NCI-H2228, HCC1937) to further demonstrate our ability to identify known fusions. Finally, the system was evaluated at an external site using patient FFPE samples. These samples (N=8) were from a set of breast, liver and ovarian cancers, containing a unique fusion in 4 of the samples based on DNA based sequencing. Only 1 of the 4 expected fusions were identified using whole transcriptome data (100 million reads) while 3 of the 4 fusions were detected with this assay (10 million reads) demonstrating its ability to generate targeted RNA sequencing libraries with increased sensitivity of gene fusion detection and reduced sequencing costs compared to standard RNA-Seq methods. Citation Format: Ashesh Saraiya, Brandon Young, Tobias Meißner, Brian L. Jones, Stephanie C. Huelga, Doug A. Amorese. A comprehensive target enrichment panel for fusion detection [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 487. doi:10.1158/1538-7445.AM2017-487

Read full abstract

BackgroundA major challenge of bioinformatics in the era of precision medicine is to identify the molecular biomarkers for complex diseases. It is a general expectation that these biomarkers or signatures have not only strong discrimination ability, but also readable interpretations in a biological sense. Generally, the conventional expression-based or network-based methods mainly capture differential genes or differential networks as biomarkers, however, such biomarkers only focus on phenotypic discrimination and usually have less biological or functional interpretation. Meanwhile, the conventional function-based methods could consider the biomarkers corresponding to certain biological functions or pathways, but ignore the differential information of genes, i.e., disregard the active degree of particular genes involved in particular functions, thereby resulting in less discriminative ability on phenotypes. Hence, it is strongly demanded to develop elaborate computational methods to directly identify functional network biomarkers with both discriminative power on disease states and readable interpretation on biological functions.ResultsIn this paper, we present a new computational framework based on an integer programming model, named as Comparative Network Stratification (CNS), to extract functional or interpretable network biomarkers, which are of strongly discriminative power on disease states and also readable interpretation on biological functions. In addition, CNS can not only recognize the pathogen biological functions disregarded by traditional Expression-based/Network-based methods, but also uncover the active network-structures underlying such dysregulated functions underestimated by traditional Function-based methods. To validate the effectiveness, we have compared CNS with five state-of-the-art methods, i.e. GSVA, Pathifier, stSVM, frSVM and AEP on four datasets of different complex diseases. The results show that CNS can enhance the discriminative power of network biomarkers, and further provide biologically interpretable information or disease pathogenic mechanism of these biomarkers. A case study on type 1 diabetes (T1D) demonstrates that CNS can identify many dysfunctional genes and networks previously disregarded by conventional approaches.ConclusionTherefore, CNS is actually a powerful bioinformatics tool, which can identify functional or interpretable network biomarkers with both discriminative power on disease states and readable interpretation on biological functions. CNS was implemented as a Matlab package, which is available at http://www.sysbio.ac.cn/cb/chenlab/images/CNSpackage_0.1.rar.

Read full abstract

Bioinformatics Challenges Research Articles

Related Topics

Articles published on Bioinformatics Challenges

Proteomics and phosphoproteomics in precision medicine: applications and challenges.

Evaluation of tools for long read RNA-seq splice-aware alignment.

MATAM: reconstruction of phylogenetic marker genes from short sequencing reads in metagenomes.

Unified Deep Learning Architecture for Modeling Biology Sequence.

Current and Emerging Technologies for Probing Molecular Signatures of Traumatic Brain Injury.

PCSF: An R-package for network-based interpretation of high-throughput data.

Comparative analysis of targeted long read sequencing approaches for characterization of a plant\u2019s immune receptor repertoire

Abstract 487: A comprehensive target enrichment panel for fusion detection

Bioinformatics challenges in molecular epidemiology of cancers

MicroTaboo: a general and practical solution to the k-disjoint problem

Hybrid Wrapper/Filter Gene Selection Using an Ensemble of Classifiers and PSO Algorithm

MetCCS predictor: a web server for predicting collision cross-section values of metabolites in ion mobility-mass spectrometry based metabolomics.

Comparative network stratification analysis for identifying functional interpretable network biomarkers

Ant colony optimization based hierarchical multi-label classification algorithm

FocusHeuristics \u2013 expression-data-driven network optimization and disease gene prediction

Critical Issues in Mycobiota Analysis.

An end-to-end software solution for the analysis of high-throughput single-cell migration data

An Efficient Exact Method For Identifying Mutated Driver Pathways In Cancer

10th Workshop on Biomedical and Bioinformatics Challenges for Computer Science – BBC2017

Combining independent de novo assemblies optimizes the coding transcriptome for nonconventional model eukaryotic organisms.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Bioinformatics Challenges Research Articles

Related Topics

Articles published on Bioinformatics Challenges

Proteomics and phosphoproteomics in precision medicine: applications and challenges.

Evaluation of tools for long read RNA-seq splice-aware alignment.

MATAM: reconstruction of phylogenetic marker genes from short sequencing reads in metagenomes.

Unified Deep Learning Architecture for Modeling Biology Sequence.

Current and Emerging Technologies for Probing Molecular Signatures of Traumatic Brain Injury.

PCSF: An R-package for network-based interpretation of high-throughput data.

Comparative analysis of targeted long read sequencing approaches for characterization of a plant\u2019s immune receptor repertoire

Abstract 487: A comprehensive target enrichment panel for fusion detection

Bioinformatics challenges in molecular epidemiology of cancers

MicroTaboo: a general and practical solution to the k-disjoint problem

Hybrid Wrapper/Filter Gene Selection Using an Ensemble of Classifiers and PSO Algorithm

MetCCS predictor: a web server for predicting collision cross-section values of metabolites in ion mobility-mass spectrometry based metabolomics.

Comparative network stratification analysis for identifying functional interpretable network biomarkers

Ant colony optimization based hierarchical multi-label classification algorithm

FocusHeuristics \u2013 expression-data-driven network optimization and disease gene prediction

Critical Issues in Mycobiota Analysis.

An end-to-end software solution for the analysis of high-throughput single-cell migration data

An Efficient Exact Method For Identifying Mutated Driver Pathways In Cancer

10th Workshop on Biomedical and Bioinformatics Challenges for Computer Science – BBC2017

Combining independent de novo assemblies optimizes the coding transcriptome for nonconventional model eukaryotic organisms.