Identification of Biomarkers for Arsenicosis Employing Multiple Kernel Learning Embedded Multiobjective Swarm Intelligence.

Anirban Dey,Kaushik Das Sharma,Pritha Bhattacharjee,Tamalika Sanyal

doi:10.1109/tnb.2022.3194091

Abstract

Arsenic is a carcinogen, and long-term exposure to it may result in the development of multi-organ disease. Understanding the underlying intricate molecular network of toxicity and carcinogenicity is crucial for identifying a small set of differentially expressed biomarker genes to predict the risk of the exposed population. In this paper, a multiple kernel learning (MKL) embedded multi-objective swarm intelligence technique has been proposed to identify the candidate biomarker genes from the transcriptomic profile of arsenicosis samples. To achieve the optimal classification accuracy along with the minimum number of genes, a multi-objective random spatial local best particle swarm optimization (MO-RSplbestPSO) has been utilized. The proposed MO-RSplbestPSO also guides the multiple kernel learning mechanism which provides data specific classification. The proposed computational framework has been applied to the developed whole genome DNA microarray prepared using blood samples collected from a specific arsenic exposed area of the Indian state of West Bengal. A set of twelve biomarker genes, with four novel genes, are successfully identified for the classification of exposure to arsenic and its subcategories, which can be used as future prognostic biomarkers for screening of arsenic exposed populations. Also, the biological significance of each gene is detailed to delineate the complex molecular networking and mode of toxicity.

Full Text