Biological Data Outlier Detection Based on Kullback-Leibler Divergence

Jung Hun Oh,Jean Gao,Kevin Rosenblatt

doi:10.1109/bibm.2008.76

Jung Hun Oh, Jean Gao + Show 1 more

https://doi.org/10.1109/bibm.2008.76

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Outlier detection is imperative in biomedical data analysis to achieve reliable knowledge discovery. In this paper, a new outlier detection method based on Kullback-Leibler (KL) divergence is presented. The original concept of KL divergence was designed as a measure of distance between two distributions. Stemming from that, we extend it to biological sample outlier detection by forming sample sets composed of nearest neighbors. To handle the non-linearity during the KL divergence calculation and to tackle with the singularity problem due to small sample size, we map the original data into a higher feature space and apply kernel functions without resorting to a mapping function. A sample possessing the largest KL divergence is detected as an outlier. The proposed method is tested with one synthetic data, two public gene expression data sets, and our own mass spectrometry data generated for prostate cancer study.

Full Text