Unmapped Sequences Research Articles

Abstract INTRODUCTION: The tumor microenvironment consists of a complicated interplay between a heterogeneous mixture of tumor cells and the surrounding tissue and there is a burgeoning interest in the role of the tissue microbiota within this microenvironment. Our group has investigated the breast tumor microbiome using the unaligned (non-human) reads from 796 RNASequencing data obtained from The Cancer Genome Atlas (TCGA) repository of Breast Cancer. METHODS: Unmapped sequences for 720 TCGA tumor samples and 76 paired normal samples were analyzed with Kraken v0.10.5 using the RefSeq 16S and standard genomes. To minimize false discoveries Kraken results were filtered to species that have bacterial genomic and 16S read support. Cluster analysis of the T-distributed Stochastic Neighbor Embedding (tsne) was performed and consensus voting of 26 maximum scoring metrics identified 3 distinct clusters. Differential abundance analysis was performed with the edgeR package. Covariate association analysis was performed for tumor clinical characteristics, as well as sample heterogeneity characteristics. RESULTS: There were three microbiome clusters we identified using TCGA breast samples. The average silhouette width of the third cluster was deemed statistically insignificant (0.071), suggesting further stratification of this cluster is necessary. The remaining two TCGA clusters (467) shared an average silhouette width of 0.39, using Bray's dissimilarity index. Differential abundance analysis of the two clusters after Bonferroni correction identified 450 significant species. The biomes were represented by 63 triple negative, 93 HER2+, 258 Luminal, and 50 tumor adjacent normals. Concordant cluster assignment was observed in 93.7 +/- 7.3% of the normal-tumor pairings, averaged across clinical subtypes. We observed clear delineation of biome populations at the order taxonomic level, which was not associated with the clinical subtype or the tumor/tumor adjacent pairings. We found enrichment of Burkholderiales in the first biome and enrichment of Bacillales and Lactobacillales within the second biome. The second biome appears to be deplete of representatives from the Asian population (p value 0.005) and survival associations were observed to be significantly reduced for HER2 patients associated with Biome 1 (p value 0.025). Geneset enrichment analysis of the expression signatures indicated drug metabolism and chemical carcinogenesis pathways to be associated with biome 2. CONCLUSIONS: In our study, we observed two significantly different microbiomes among TCGA breast cancer samples that were not associated with the clinical subtypes. Current efforts include validating these biomes, using 16S sequencing on matching fresh frozen paraffin embedded samples. Additionally, we are evaluating the influence of the biomes in independent adjuvant and neo-adjuvant studies with next generation sequencing data. Citation Format: Kevin J. Thompson, Jason Sinnwell, Xiaojia Tang, James N. Ingle, Matthew P. Goetz, Krishna R. Kalari. Distinct microbiome populations within breast cancer microenvironments. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 3298.

Projects to obtain whole-genome sequences for 10,000 vertebrate species1 and for 5,000 insect and related arthropod species2 are expected to take place over the next 5 years. For example, the sequencing of the genomes for 15 malaria mosquitospecies is currently being done using an Illumina platform3,4. This Anopheles species cluster includes both vectors and non-vectors of malaria. When the genome assemblies become available, researchers will have the unique opportunity to perform comparative analysis for inferring evolutionary changes relevant to vector ability. However, it has proven difficult to use next-generation sequencing reads to generate high-quality de novo genome assemblies5. Moreover, the existing genome assemblies for Anopheles gambiae, although obtained using the Sanger method, are gapped or fragmented4,6. Success of comparative genomic analyses will be limited if researchers deal with numerous sequencing contigs, rather than with chromosome-based genome assemblies. Fragmented, unmapped sequences create problems for genomic analyses because: (i) unidentified gaps cause incorrect or incomplete annotation of genomic sequences; (ii) unmapped sequences lead to confusion between paralogous genes and genes from different haplotypes; and (iii) the lack of chromosome assignment and orientation of the sequencing contigs does not allow for reconstructing rearrangement phylogeny and studying chromosome evolution. Developing high-resolution physical maps for species with newly sequenced genomes is a timely and cost-effective investment that will facilitate genome annotation, evolutionary analysis, and re-sequencing of individual genomes from natural populations7,8. Here, we present innovative approaches to chromosome preparation, fluorescent in situ hybridization (FISH), and imaging that facilitate rapid development of physical maps. Using An. gambiae as an example, we demonstrate that the development of physical chromosome maps can potentially improve genome assemblies and, thus, the quality of genomic analyses. First, we use a high-pressure method to prepare polytene chromosome spreads. This method, originally developed for Drosophila9, allows the user to visualize more details on chromosomes than the regular squashing technique10. Second, a fully automated, front-end system for FISH is used for high-throughput physical genome mapping. The automated slide staining system runs multiple assays simultaneously and dramatically reduces hands-on time11. Third, an automatic fluorescent imaging system, which includes a motorized slide stage, automatically scans and photographs labeled chromosomes after FISH12. This system is especially useful for identifying and visualizing multiple chromosomal plates on the same slide. In addition, the scanning process captures a more uniform FISH result. Overall, the automated high-throughput physical mapping protocol is more efficient than a standard manual protocol.

Unmapped Sequences Research Articles

Related Topics

Articles published on Unmapped Sequences

Localizing unmapped sequences with families to validate the Telomere-to-Telomere assembly and identify new hotspots for genetic diversity.

Presence of periodontal pathogenic bacteria in blood of patients with coronary artery disease

Transcriptomic and Metabolomic Analyses Provide Insights Into an Aberrant Tissue of Tea Plant (Camellia sinensis).

Genome- wide structural and functional variant discovery of rice landraces using genotyping by sequencing.

A novel microRNA boosts hyper-β-oxidation of fatty acids in liver by impeding CEP350-mediated sequestration of PPARα and thus restricts chronic hepatitis C

DeePaC: predicting pathogenic potential of novel DNA with reverse-complement neural networks.

P53 and STAT3 Dependent Expression of a Novel microRNA Restricts Hepatitis C Virus Infection Through Hyper-β-Oxidation of Fatty Acids by Impeding Cytosolic CAP350

From trash to treasure: detecting unexpected contamination in unmapped NGS data

Comparative whole genome re-sequencing analysis in upland New Rice for Africa: insights into the breeding history and respective genome compositions

Unmapped sequencing reads identify additional candidate genes linked to magnetoreception in rainbow trout

Abstract 3298: Distinct microbiome populations within breast cancer microenvironments

Anchored pseudo-de novo assembly of human genomes identifies extensive sequence variation from unmapped sequence reads.

Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers.

Whole-genome re-sequencing of non-model organisms: lessons from unmapped reads.

Construction of Pseudomolecule Sequences of the aus Rice Cultivar Kasalath for Comparative Genomics of Asian Cultivated Rice

High-throughput Physical Mapping of Chromosomes using Automated <em>in situ</em> Hybridization

High-throughput Physical Mapping of Chromosomes using Automated <em>in situ</em> Hybridization

Mapping the pericentric heterochromatin by comparative genomic hybridization analysis and chromosome deletions in Drosophila melanogaster

Structural Rules and Complex Regulatory Circuitry Constrain Expression of a Notch- and EGFR-Regulated Eye Enhancer

Update of the Anopheles gambiae PEST genome assembly

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Unmapped Sequences Research Articles

Related Topics

Articles published on Unmapped Sequences

Localizing unmapped sequences with families to validate the Telomere-to-Telomere assembly and identify new hotspots for genetic diversity.

Presence of periodontal pathogenic bacteria in blood of patients with coronary artery disease

Transcriptomic and Metabolomic Analyses Provide Insights Into an Aberrant Tissue of Tea Plant (Camellia sinensis).

Genome- wide structural and functional variant discovery of rice landraces using genotyping by sequencing.

A novel microRNA boosts hyper-β-oxidation of fatty acids in liver by impeding CEP350-mediated sequestration of PPARα and thus restricts chronic hepatitis C

DeePaC: predicting pathogenic potential of novel DNA with reverse-complement neural networks.

P53 and STAT3 Dependent Expression of a Novel microRNA Restricts Hepatitis C Virus Infection Through Hyper-β-Oxidation of Fatty Acids by Impeding Cytosolic CAP350

From trash to treasure: detecting unexpected contamination in unmapped NGS data

Comparative whole genome re-sequencing analysis in upland New Rice for Africa: insights into the breeding history and respective genome compositions

Unmapped sequencing reads identify additional candidate genes linked to magnetoreception in rainbow trout

Abstract 3298: Distinct microbiome populations within breast cancer microenvironments

Anchored pseudo-de novo assembly of human genomes identifies extensive sequence variation from unmapped sequence reads.

Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers.

Whole-genome re-sequencing of non-model organisms: lessons from unmapped reads.

Construction of Pseudomolecule Sequences of the aus Rice Cultivar Kasalath for Comparative Genomics of Asian Cultivated Rice

High-throughput Physical Mapping of Chromosomes using Automated &lt;em&gt;in situ&lt;/em&gt; Hybridization

High-throughput Physical Mapping of Chromosomes using Automated &lt;em&gt;in situ&lt;/em&gt; Hybridization

Mapping the pericentric heterochromatin by comparative genomic hybridization analysis and chromosome deletions in Drosophila melanogaster

Structural Rules and Complex Regulatory Circuitry Constrain Expression of a Notch- and EGFR-Regulated Eye Enhancer

Update of the Anopheles gambiae PEST genome assembly

High-throughput Physical Mapping of Chromosomes using Automated <em>in situ</em> Hybridization

High-throughput Physical Mapping of Chromosomes using Automated <em>in situ</em> Hybridization