Abstract
Computing analysis of gene expression data has been an essential approach for understanding cellular activities and identifying gene function. However, expression profiles generated by the high-throughput microarray experiments often contain missing values, which significantly affect the performance of subsequent statistical analysis and machine learning algorithms. So there is a great need for estimating these missing values as accurately as possible. Although there have been many estimation algorithms, but each of them has its flaws. This paper proposes an estimation method for missing values based on principal curve which is a nonlinear generalization of the first linear principal component analysis. Through finding the self-consistent smooth one dimensional curves that pass through the `middle' of a multidimensional data set, principal curve can integrate the linear and nonlinear relationships between genes, and reveal the distribution of genes. Based on the framework of all the expression profiles, missing values can be estimated more accurately. To assess the performance of the method, comparisons with recently proposed estimation algorithms are carried out on several microarray data sets. The results shows that our method provides a better solution for the estimation of missing values in DNA microarray gene expression data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.