Abstract

Missing values are a huge constraint in microarray technologies towards improving and identifying disease-causing genes. Estimating missing values is an undeniable scenario faced by field experts. The imputation method is an effective way to impute the proper values to proceed with the next process in microarray technology. Missing value imputation methods may increase the classification accuracy. Although these methods might predict the values, classification accuracy rates prove the ability of the methods to identify the missing values in gene expression data. In this study, a novel method, Optimised Hybrid of Fuzzy C-Means and Majority Vote (opt-FCMMV), was proposed to identify the missing values in the data. Using the Majority Vote (MV) and optimisation through Particle Swarm Optimisation (PSO), this study predicted missing values in the data to form more informative and solid data. In order to verify the effectiveness of opt-FCMMV, several experiments were carried out on two publicly available microarray datasets (i.e. Ovary and Lung Cancer) under three missing value mechanisms with five different percentage values in the biomedical domain using Support Vector Machine (SVM) classifier. The experimental results showed that the proposed method functioned efficiently by showcasing the highest accuracy rate as compared to the one without imputations, with imputation by Fuzzy C-Means (FCM), and imputation by Fuzzy C-Means with Majority Vote (FCMMV). For example, the accuracy rates for Ovary Cancer data with 5% missing values were 64.0% for no imputation, 81.8% (FCM), 90.0% (FCMMV), and 93.7% (opt-FCMMV). Such an outcome indicates that the opt-FCMMV may also be applied in different domains in order to prepare the dataset for various data mining tasks.

Highlights

  • In many areas, the quality of data is a very serious problem in the current rapid world that produces millions of data each day that are often noisy and incomplete

  • Experiments were conducted on a total of fifteen datasets in the biomedica o2Euf, sxafiipnfctegoeremiSmnpVedaMnarittsa1oscenlaatnswsdsiainfsEitemxhrpeaebbdraieiosmebmedeentwdtoin2ece,adnlaidfocfopoemtrm-eaFnpiCtnar.MlieIsnvowenElisxtwhpoaeFsfrCimmmMaiesdnasetinn1dbgeaFtnvwCdaeMlEuenexMspoVetproitu-msFeiexCnnagMtmiwneiththFeCeMffi SpVroMpocsleadssmifieetrhobda.sTedheoSnVdMiffecrleansstiflieevrewlsasofusmedissbiansgedvaolnuethsetodeefaxualmt pinaeratmheeter values usi eBffiacsiiesnKcyeronfelth(Re BpFro)p(oWseadhymudetiheotda.l.T, h2e01S0V) Mprocvliadsesdifiienr twheasLuibsSedVMbasseodftwonare package

  • The achieved accuracy rates are higher through the improved method from Fuzzy C-Means (FCM), Fuzzy C-Means with Majority Vote (FCMMV) to opt-FCMMV

Read more

Summary

Introduction

The quality of data is a very serious problem in the current rapid world that produces millions of data each day that are often noisy and incomplete. The consequences faced by real-world healthcare research centres, such as the production of biased data and invalid inferences, undermine the purpose of data (Suphanchaimat et al, 2017). This is due to experimental errors, insufficient resolutions, and scratches or dust in slides during the laboratory processes (Yaraghi et al, 2012). As mentioned by Ouyang et al (2004), every microarray experiment virtually contains missing expressions, and this affects more than 90% of the genes. It is important to consider the treatment of missing values before analysing the microarray data

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.