Evolutionary k-nearest neighbor imputation algorithm for gene expression data

Hiroshi De Silva,A Shehan Perera

doi:10.4038/icter.v10i1.7183

Abstract

Large data sets are produced by the gene expression process which is done by using the DNA microarray technology. These gene expression data are recognized as a common data source which contains missing expression values. In this paper, we present a genetic algorithm optimized k- Nearest neighbor algorithm (Evolutionary kNNImputation) for missing data imputation. Despite the common imputation methods this paper addresses the effectiveness of using supervised learning algorithms for missing data imputation. Missing data imputation approaches can be categorized into four main categories and among the four approaches, our focus is mainly on local approach where the proposed Evolutionary k- Nearest Neighbor Imputation Algorithm falls in. The Evolutionary k- Nearest Neighbor Imputation Algorithm is an extension of the common k- nearest Neighbor Imputation Algorithm which the genetic algorithm is used to optimize some parameters of k- Nearest Neighbor Algorithm. The selection of similarity matrix and the selection of the parameter value k can be identified as the optimization problem. We have compared the proposed Evolutionary k- Nearest Neighbor Imputation algorithm with k- Nearest Neighbor Imputation algorithm and mean imputation method. The three algorithms were tested using gene expression datasets. Certain percentages of values are randomly deleted in the datasets and recovered the missing values using the three algorithms. Results show that Evolutionary kNNImputation outperforms kNNImputation and mean imputation while showing the importance of using a supervised learning algorithm in missing data estimation. Even though mean imputation happened to show low mean error for a very few missing rates, supervised learning algorithms became effective when it comes to higher missing rates in datasets which is the most common situation among datasets.

Highlights

DNA microarray technology is widely used to analyze gene expression data
We present a genetic algorithm optimized knearest neighbor algorithm imputing missing data compared to the k- Nearest Neighbor Imputation Algorithm and several other common imputation methods
The training dataset should not have missing values because the evolutionary k- nearest neighbor is implemented to run an optimization process before the imputation where weight values will be assigned to each attribute of the dataset based on the importance of the attributes towards the prediction of missing value

Summary

Introduction

DNA microarray technology is widely used to analyze gene expression data. These expression data sets are large and frequently found with some missing values. Given the expense of collecting data, we cannot afford to start over or to wait until wedevelop fool proof methods of gathering information[3]. As it is very time consuming and expensive to repeat the process, scientists are moving into missing data imputation as a solution [4]. The importance of using a machine learning algorithm is discussed in this paper as most of the common imputation methods such as: case deletion and mean imputation method are showing less effective results by not considering the correlation of data

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal on Advances in ICT for Emerging Regions (ICTer)	Publication Date: May 4, 2017
Citations: 6	License type: cc-by

R Discovery Prime

R Discovery Prime

Evolutionary k-nearest neighbor imputation algorithm for gene expression data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal on Advances in ICT for Emerging Regions (ICTer)

Lead the way for us

Similar Papers

Missing data imputation using Evolutionary k- Nearest neighbor algorithm for gene expression data
Hiroshi De Silva ... A Shehan Perera
-
Hiroshi De Silva, et. al.Hiroshi De Silva ... A Shehan Perera
01 Sep 2016
01 Sep 2016

Analysis and Comparison of Prediction of Heart Disease Using Novel K Nearest Neighbor and Decision Tree Algorithm
G Pavithraa ... S Sivaprasad
CARDIOMETRY | VOL. -
G Pavithraa, et. al.G Pavithraa ... S Sivaprasad
14 Feb 2023
CARDIOMETRY | VOL. -

K-Means Clustering with KNN and Mean Imputation on CPU Benchmark Compilation Data
Irma Santikarama ... Puspita Nurul Sabrina
Journal of Applied Informatics and Computing | VOL. 7
Irma Santikarama, et. al.Irma Santikarama ... Puspita Nurul Sabrina
05 Dec 2023
Journal of Applied Informatics and Computing | VOL. 7

Simulation study on missing data imputation methods for longitudinal data in cohort studies
Y M Li ... F Y Chen
Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi | VOL. 42
Y M Li, et. al.Y M Li ... F Y Chen
10 Oct 2021
Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evolutionary k-nearest neighbor imputation algorithm for gene expression data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal on Advances in ICT for Emerging Regions (ICTer)