Summarising multiple clustering-centric estimates with OWA operators for improved KNN imputation on microarray data

Phimmarin Keerin,Natthakan Iam-On,Jing Jing Liu,Tossapon Boongoen,Qiang Shen

doi:10.1016/j.fss.2023.108718

Abstract

As part of celebrating the success of OWA operators and their contributions over the past decades, this work presents an original investigation of exploiting OWA in dealing with missing value imputation witnessed in microarray experimental data. This task is significant in life science and its realisation to humanity. Both argument-independent and -dependent variants of such operators are applied to summarise a collection of estimates, determined through the concept of clustering-centric KNN imputation. This provides an innovative alternative to the state-of-the-art model that makes use of a single clustering to identify neighbours of a particular instance of interest. Instead of manually specify data partition, the proposed approach works by selecting a subset of diverse clusterings or committees from a candidate pool, which has been prepared using k-means and different (and popular) generation strategies invented for ensemble clustering. It is automated through a greedy forward-search looking for a desired number of committee members. Based on published gene expression datasets and different experimental settings, the resulting model generally outperforms its baselines, being competitive to related methods found in the literature. Further extensions to iterative refinement and supervised imputation are also discussed in addition to the analysis of algorithmic parameters.

Full Text