Abstract
BackgroundA variety of bacteria are known to influence carcinogenesis. Therefore, we sought to investigate if publicly available whole genome and whole transcriptome sequencing data generated by large public cancer genome efforts, like The Cancer Genome Atlas (TCGA), could be used to identify bacteria associated with cancer. The Burrows-Wheeler aligner (BWA) was used to align a subset of Illumina paired-end sequencing data from TCGA to the human reference genome and all complete bacterial genomes in the RefSeq database in an effort to identify bacterial read pairs from the microbiome.ResultsThrough careful consideration of all of the bacterial taxa present in the cancer types investigated, their relative abundance, and batch effects, we were able to identify some read pairs from certain taxa as likely resulting from contamination. In particular, the presence of Mycobacterium tuberculosis complex in the ovarian serous cystadenocarcinoma (OV) and glioblastoma multiforme (GBM) samples was correlated with the sequencing center of the samples. Additionally, there was a correlation between the presence of Ralstonia spp. and two specific plates of acute myeloid leukemia (AML) samples. At the end, associations remained between Pseudomonas-like and Acinetobacter-like read pairs in AML, and Pseudomonas-like read pairs in stomach adenocarcinoma (STAD) that could not be explained through batch effects or systematic contamination as seen in other samples.ConclusionsThis approach suggests that it is possible to identify bacteria that may be present in human tumor samples from public genome sequencing data that can be examined further experimentally. More weight should be given to this approach in the future when bacterial associations with diseases are suspected.
Highlights
A variety of bacteria are known to influence carcinogenesis
While most cancer-related bacteria are the dominant member of the microbiome, it is possible that rare members could cause driver mutations and/or that dominant members might be more abundant in tumors due to a favorable tumor microenvironment
Bacterial presence in The Cancer Genome Atlas (TCGA) data In order to determine the microbial component of various cancers, we analyzed the relative abundance of bacteria-derived paired-end Illumina sequencing in TCGA data that was made available in the Sequence Read Archive (SRA), as previously described [30]
Summary
We sought to investigate if publicly available whole genome and whole transcriptome sequencing data generated by large public cancer genome efforts, like The Cancer Genome Atlas (TCGA), could be used to identify bacteria associated with cancer. The Burrows-Wheeler aligner (BWA) was used to align a subset of Illumina paired-end sequencing data from TCGA to the human reference genome and all complete bacterial genomes in the RefSeq database in an effort to identify bacterial read pairs from the microbiome. A subset of these viruses are known to integrate into the human genome [2, 3], while viruses, parasites, and bacteria can all promote cancer through other mechanisms [4]. Of the bacteria known to be associated with carcinogenesis, the mechanisms linking H. pylori to gastric carcinoma and gastric mucosa-associated lymphoid tissue (MALT) lymphoma are best understood [5]. While most cancer-related bacteria are the dominant member of the microbiome, it is possible that rare members could cause driver mutations and/or that dominant members might be more abundant in tumors due to a favorable tumor microenvironment
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.