Inter-functional analysis of high-throughput phenotype data by non-parametric clustering and its application to photosynthesis.

Qiaozi Gao,Elisabeth Ostendorf,Jeffrey A Cruz,Rong Jin,David M Kramer,Jin Chen

doi:10.1093/bioinformatics/btv515

Abstract

Phenomics is the study of the properties and behaviors of organisms (i.e. their phenotypes) on a high-throughput scale. New computational tools are needed to analyze complex phenomics data, which consists of multiple traits/behaviors that interact with each other and are dependent on external factors, such as genotype and environmental conditions, in a way that has not been well studied. We deployed an efficient framework for partitioning complex and high dimensional phenotype data into distinct functional groups. To achieve this, we represented measured phenotype data from each genotype as a cloud-of-points, and developed a novel non-parametric clustering algorithm to cluster all the genotypes. When compared with conventional clustering approaches, the new method is advantageous in that it makes no assumption about the parametric form of the underlying data distribution and is thus particularly suitable for phenotype data analysis. We demonstrated the utility of the new clustering technique by distinguishing novel phenotypic patterns in both synthetic data and a high-throughput plant photosynthetic phenotype dataset. We biologically verified the clustering results using four Arabidopsis chloroplast mutant lines. Software is available at www.msu.edu/~jinchen/NPM. Supplementary data are available at Bioinformatics online. jinchen@msu.edu, kramerd8@cns.msu.edu or rongjin@cse.msu.edu.

Full Text