Abstract

Missing data estimation is one of the important task performed during the pre-processing of data before applying the data mining algorithms. Generally the data which is used for data mining or machine learning consisting of complete data without missing entries. But usually the data used for data mining or machine learning occurred with missing values due to some technical or general reasons. Performance of data mining or machine learning entirely depends on the form of data and its existing values. If the dataset have huge missing entries then it drastically affect the performance of the algorithms and degrades the accuracy. So it is important to estimate and impute these missing values before computation. Several data imputation algorithms were proposed in the literature. In this paper, we proposed a new methodology by reducing the dimensionality of the data by using fuzzy-C means clustering and later used the generalized additive model to estimate the missing values in the data set. Three real time microarray gene expression datasets were used in this paper and performance of these are compared with existing algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.