Abstract
The ability to analyze and classify three-dimensional (3D) biological morphology has lagged behind the analysis of other biological data types such as gene sequences. Here, we introduce the techniques of data mining to the study of 3D biological shapes to bring the analyses of phenomes closer to the efficiency of studying genomes. We compiled five training sets of highly variable morphologies of mammalian teeth from the MorphoBrowser database. Samples were labeled either by dietary class or by conventional dental types (e.g. carnassial, selenodont). We automatically extracted a multitude of topological attributes using Geographic Information Systems (GIS)-like procedures that were then used in several combinations of feature selection schemes and probabilistic classification models to build and optimize classifiers for predicting the labels of the training sets. In terms of classification accuracy, computational time and size of the feature sets used, non-repeated best-first search combined with 1-nearest neighbor classifier was the best approach. However, several other classification models combined with the same searching scheme proved practical. The current study represents a first step in the automatic analysis of 3D phenotypes, which will be increasingly valuable with the future increase in 3D morphology and phenomics databases.
Highlights
Statistical analysis of shape is a fundamental problem that is frequently encountered in biology
Our results showed that for all five training sets the same basic set of features combined with an appropriate feature selection schemes yielded accurate classifiers in a very short time
This demonstrates that relationships between shape and categorical factors of interest can be extrapolated from a given training set to new data in an automated fashion
Summary
Statistical analysis of shape is a fundamental problem that is frequently encountered in biology. Automatic detection of phenotypic features is hindered by the facts that 3D morphology is difficult to measure and that the theory behind the analysis of complex structures such as 3D surfaces or density maps is more or less in its infancy [1]. This makes shape an unfavorable source of information when compared to other variables that can be assessed through linear or sequential measurements (e.g. gene sequences). To make effective use of these data there will be a need for fast automated methods for conducting searches and building descriptive and predictive models on selected data sets
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.