Abstract

Two important and not yet solved problems in bacterial genome research are the identification of horizontally transferred genes and the prediction of gene expression levels. Both problems can be addressed by multivariate analysis of codon usage data. In particular dimensionality reduction methods for visualization of multivariate data have shown to be effective tools for codon usage analysis. We here propose a multidimensional scaling approach using a novel similarity measure for codon usage tables. Our probabilistic similarity measure is based on P-values derived from the well-known chi-square test for comparison of two distributions. Experimental results on four microbial genomes indicate that the new method is well-suited for the analysis of horizontal gene transfer and translational selection. As compared with the widely-used correspondence analysis, our method did not suffer from outlier sensitivity and showed a better clustering of putative alien genes in most cases.

Highlights

  • The standard genetic code of protein coding DNA sequences shows a redundancy, since different triplet codons may be used to code for the same amino acid

  • These clusters can for instance provide evidence for horizontal gene transfer according to groups of putative alien genes [1,2] or for translational selection according to groups of highly expressed genes [3,4]

  • Data sets To evaluate our multidimensional scaling (MDS) approach, we focused on visualizations of ribosomal protein genes and putative alien genes for different microbial genomes

Read more

Summary

Background

The standard genetic code of protein coding DNA sequences shows a redundancy, since different triplet codons may be used to code for the same amino acid. Our visualization method is based on multidimensional scaling and a new similarity measure for codon usage data. In the following we first introduce our probabilistic similarity measure for codon usage tables and outline the corresponding algorithm for multidimensional scaling based on P-values. For the analysis of codon usage tables we developed a special similarity measure which has been derived from the well-known chi-square test for the comparison of two distributions. Unlike the classical chi-square test we do not decide whether two distributions are equal or not, but instead we only use the corresponding P-values to compute a similarity measure for the underlying codon usage tables. Note that nij corresponds to the number of occurrences of amino acid ai in gene j With these counts we compute the chi-square statistic for each pair (j, k) of genes:. The M components of x1 and x2 provide the x1 and x2 coordinates for the M genes, which are utilized for scatter plot visualization

Experimental results
Conclusion
Hill MO
Mclnerney JO
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.