Abstract

Identification of gene expression patterns when studying complex and dynamic biological processes such as gene regulatory functions is critical. Gene expression is a continuous biological phenomenon and can be represented by a continuous function (curve). Each gene behaving in such a continuous functions often shares similar functional forms. However, patterns such as numbers, shape, and the identities of those genes sharing similar functional forms remain unknown. To identify such functional forms we introduce a clustering model for identification of time course gene expression patterns. The method utilizes an S-spline approach to model the functional curves and a penalized log-likelihood approach to fit the model. In addition, a rejection-controlled EM algorithm is designed minimizes the error and computational cost during mean curve estimation. Furthermore, the method utilizes general crossvalidation to select smoothing parameters and further measure the clustering uncertainty using the Bayesian information criterion. The interest of the method is illustrated by its application to D. melanogaster life cycle datasets. Simulation results indicated our method accurately estimates mean expression curve to true functional forms by assigning the gene to cluster, predicting mean curve and providing 95% associated confidence bands for each cluster. Based on Gene Ontology term description, the estimated mean curve in each cluster reflects true gene functional annotations with biologically meaningful gene expression patterns. Finally, comparative clustering performance indicates our method to outperform Fuzzy-cMeans and K-Means by misclassification rate of 0.1289 and overall success rate of 98.71%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call