Abstract

With the remarkable increase of genomic sequence data of microorganisms, novel tools are needed for comprehensive analyses of the big sequence data available. The self-organizing map (SOM) is an effective tool for clustering and visualizing high-dimensional data, such as oligonucleotide composition on one map. By modifying the conventional SOM, we developed batch-learning SOM (BLSOM), which allowed classification of sequence fragments (e.g., 1 kb) according to phylotypes, solely depending on oligonucleotide composition. Metagenomics studies of uncultivable microorganisms in clinical and environmental samples should allow extensive surveys of genes important in life sciences. BLSOM is most suitable for phylogenetic assignment of metagenomic sequences, because fragmental sequences can be clustered according to phylotypes, solely depending on oligonucleotide composition. We first constructed oligonucleotide BLSOMs for all available sequences from genomes of known species, and by mapping metagenomic sequences on these large-scale BLSOMs, we can predict phylotypes of individual metagenomic sequences, revealing a microbial community structure of uncultured microorganisms, including viruses. BLSOM has shown that influenza viruses isolated from humans and birds clearly differ in oligonucleotide composition. Based on this host-dependent oligonucleotide composition, we have proposed strategies for predicting directional changes of virus sequences and for surveilling potentially hazardous strains when introduced into humans from non-human sources.

Highlights

  • The phylogenetic analysis based on sequence homology searches is a well-established and an irreplaceably important method for studying gene and protein sequences [1,2,3]

  • As an example of the batch-learning SOM (BLSOM) application to a large-scale metagenome study, we introduce here our previous analysis [10] on metagenomic sequences obtained from the Sargasso Sea near Bermuda reported by Venter et al [29]

  • This indicates that most sequence fragments derived from one genome in a metagenome library can be reassociated in silico and, provides the rationale for phylogenetic classification of sequences, even those derived from a high-complexity library

Read more

Summary

A Novel Bioinformatics Strategy to Analyze Microbial

Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM). Yuki Iwasaki 1,2, Takashi Abe 1,3,*, Kennosuke Wada 1, Yoshiko Wada 1,4 and Toshimichi Ikemura 1. Faculty of Medicine, Shiga University of Medical Science, Shiga-ken 520-2121, Japan. Received: 26 September 2013; in revised form: 5 November 2013 / Accepted: 8 November 2013 /

Introduction
An Alignment-Free Clustering Method “BLSOM” Developed for Genome Informatics
Basic Characteristics of BLSOM Separation
Application of BLSOM to Metagenome Studies
A General Strategy of Phylogenetic Assignments of Metagenomic Sequences
Oligonucleotide-BLSOM Applied to Studies of the Influenza Virus Genomes
Host-Dependent Clustering of Influenza Virus Genome Sequences
Retrospective Time Series Changes Visualized for Human Viruses
Diagnosstic Oligonuucleotides Responsible e for Host-D
A Strateegy for Findding Potentiially Hazarddous Strains
BLSOM Analyses of Individual Segments
Other Applications of BLSOM and Future Prospects
Findings
Addition of Computer-Generated Random Sequences
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.