Abstract

Large amounts of genomic data have been obtained due to the rapid advances in DNA sequencing technology. With efficient computational platforms, these data can provide many possibilities to improve our knowledge on species evolution and their genetic makeup. The general interest of this thesis is to facilitate studies on important biological questions by attaining the relevant information from transcriptomic and genomic data. The aims of my thesis were i) to develop the pan-genome based RNA-Seq data analysis pipeline in order to analyze ex vivo gene expression profiles of uro-pathogenic Escherichia coli isolates and ii) to create the consensus sequence of the Pseudomonas aeruginosa core genome in order to identify single nucleotide polymorphisms (SNPs) at high accuracy and to find the patho-adaptive mutations in P. aeruginosa clinical isolates. To address these aims I developed and used the pan-genome of E. coli in order to map and analyze the RNA-Seq reads that were associated with an acute urinary tract infection. Whereas the in vivo gene expression profiles of the majority of genes were conserved among the 21 E. coli strains, the specific gene expression profiles of the accessory genome were diverse and reflected phylogenetic relationships. In addition to that, whole genome sequencing data was used to gain insights into the genetic variations of 99 clinical P. aeruginosa isolates. I created the consensus sequence for every core gene based on the most frequent nucleotide. I used it as reference for the identification of SNPs across all clinical isolates. The identified SNPs were classified into clonal-specific, single and phylogenetically independent SNPs. The majority of the SNPs were clonal-dependent and single SNPs. However, I identified a large set of 2,252 genes which had one or more phylogenetically independent non-synonymous mutation. Moreover, the ratio of dN/dS on 3,814 genes revealed that the core genome is not under selection pressure. In summary, this thesis explores pan-genome-based as well as consensus sequence-based approaches on transcriptomic and genomic sequencing data of clinical isolates of E. coli and P. aeruginosa respectively. The results of the thesis contributed to understanding of sequence variations that are selected in the environment of the human host and lead to bacterial adaptation and pathogenicity. This is not only important for the basic scientific research, but also to understand the link between diversity and community structure and function.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call