Abstract

Microbial communities perform many important ecological functions across a wide range of natural and man-made environments. Recently, the utility of trait based approaches for microbial communities has been identified. Increasing availability of whole genome sequences provide the opportunity to explore the genetic foundations of a variety of functional traits. In this paper, we proposed a machine learning framework to quantitatively link the genotype with functional traits. Genes from bacteria genomes belonging to different functional trait groups were grouped to Cluster of Orthologs (COGs), and were used as features. Then, TF-IDF technique from the text mining domain was applied to transform the data to accommodate the abundance and importance of each COG. After TF-IDF processing, COGs were ranked using feature selection methods to identify their relevance to the functional trait of interest. We focused on a binary functional trait in this paper, but plan to extend our approach to continuous functional traits in the future. Experimental results demonstrated that functional trait related genes can be detected using our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call