Abstract

A necessary step in many metagenomic studies is to determine organisms present in a sample. Knowledge of the similarity among genomes of present organisms allows for more accurate mapping of high throughput sequencing reads to the correct genome for expression quantification. This study investigates current metrics of genome similarity as they relate to cross mapping percentage, defined as the percentage of sequence reads from one organism mapping to another organism's genome. This study aims to establish a new metric for genome similarity, incorporating cross mapping percentage. Paired-end reads were generated using Artificial FASTQ Generator (AFG), for 10 organisms fitting into two categories -- host and pathogen. The reads were mapped to reference genomes and the cross mapping percentage was calculated using Bowtie2. Bowtie2 produced higher values for organisms with a lower calculated genomic distance, which led to the conclusion that hosts and pathogens could easily be distinguished, while pathogens and other microbial genomes themselves were harder to separate. The genomes were aligned using MUMmer and an overall percent similarity between the sequences was determined. A metric for genome similarity was established by modifying formulas calculated within DSMZ's Genome-to-Genome Distance Calculator (GGDC) to incorporate cross mapping percentages. Formula manipulation did not change the trend present in genomic distance values which supports that cross mapping percentage, distance calculated with the original formulas and distance calculated with the new formulas are interchangeable. This work helps establish at what resolution organisms in a sample can be distinguished using whole genome sequence information. That is, how similar organisms can be and still be distinguished in a metagenomic study for the purposes of computing expression values. These findings allow for organisms in metagenomic studies to be better identified and an accurate quantification of expression computed in metatranscriptomic studies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call