Abstract
The biological implications of bioinformatics can already be seen in various implementations. Biological taxonomy may seem like a simple science in which the biologists merely observe similarities among organisms and construct classifications according to those similarities[1], but it is not so simple. By applying data mining techniques on gene sequence database we can cluster the data to find interesting similarities in the gene expression data. One of the applications of such kind of clustering is taxonomically clustering the organisms based on their gene sequential expressions. In this study we outlined a method for taxonomical clustering of species of the organisms based on the genetic profile using Principal Component Analysis and Self Organizing Neural Networks. We have implemented the idea using Matlab and tried to cluster the gene sequences taken from PAUP version of the ML5/ML6 database. The taxa used for some of the basidiomycetous fungi form the database. To study the scalability issues another large gene sequence database was used. The proposed method clustered the species of organisms correctly in almost all the cases. The obtained were more significant and promising. The proposed method clustered the species of organisms correctly in almost all the cases. The obtained results were more significant and promising.
Highlights
Taxonomy: Biological taxonomy may seem like a simple science - Biologists merely observe similarities among organisms and construct classifications according to those similarities, but it is not so simple
Dogs have a different morphology than coyotes and dogs and coyotes are more similar to one another than either is to foxes
Mammals come in neat morphological packages; morphology is an inadequate marker for classifying many organisms, especially insects, molds, fungi and bacteria
Summary
Taxonomy: Biological taxonomy may seem like a simple science - Biologists merely observe similarities among organisms and construct classifications according to those similarities, but it is not so simple. The biological implications of bioinformatics can already be seen in the simple existence and usage of the databases and search engines These tools have sped up the scientific research. Starting with a nucleotide sequence for a human gene, this example uses alignment algorithms to locate a similar gene in another organism. A data set xi, ( i = 1,..., n ) is summarized as a linear combination of orthonormal vectors (called principal components):. This work used the Principal Component Analysis for Feature Vector Selection from the Gene Sequence information. To cluster the Feature Vectors of the Gene Sequence Data, we used Self Organizing Feature Maps. Naïve k-means algorithm: One of the most popular heuristics for solving the k-means problem is based on a simple iterative scheme for finding a locally optimal solution[7]. The following operations are performed in the steps: Gene Sequences of Organisms to be Clustered
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have