Abstract
Outlier detection is imperative in biomedical data analysis to achieve reliable knowledge discovery. In this paper, a new outlier detection method based on Kullback-Leibler (KL) divergence is presented. The original concept of KL divergence was designed as a measure of distance between two distributions. Stemming from that, we extend it to biological sample outlier detection by forming sample sets composed of nearest neighbors. To handle the non-linearity during the KL divergence calculation and to tackle with the singularity problem due to small sample size, we map the original data into a higher feature space and apply kernel functions without resorting to a mapping function. A sample possessing the largest KL divergence is detected as an outlier. The proposed method is tested with one synthetic data, two public gene expression data sets, and our own mass spectrometry data generated for prostate cancer study.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have