Correlation and Probability Based Similarity Measure for Detecting Outliers in Categorical Data

Noorul Islam Centre For Higher Education, Kumaracoil, India ,Roy Thomas*,J.E Judith

doi:10.35940/ijitee.c9053.019320

Noorul Islam Centre For Higher Education, Kumaracoil, India , Roy Thomas* + Show 1 more

Open Access

https://doi.org/10.35940/ijitee.c9053.019320

Copy DOI

Abstract

Determining the similarity or distance among data objects is an important part in many research fields such as statistics, data mining, machine learning etc. There are many measures available in the literature to define the distance between two numerical data objects. It is difficult to define such a metric to measure the similarity between two categorical data objects since categorical data objects are not ordered. Only a few distance measures are available in the literature to find the similarities among categorical data objects. This paper presents a comparative evaluation of various similarity measures for categorical data and also introduces a novel similarity measure for categorical data based on occurrence frequency and correlation. We evaluated the performance of these similarity measures in the context of outlier detection task in data mining using real world data sets. Experimental results show that the proposed similarity measure outperform the existing similarity measures to detect outliers in categorical datasets. The performances are evaluated in the context of outlier detection task in data mining.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Correlation and Probability Based Similarity Measure for Detecting Outliers in Categorical Data

Abstract

Talk to us

Similar Papers

More From: The International Journal of Innovative Technology and Exploring Engineering

Lead the way for us

Journal: The International Journal of Innovative Technology and Exploring Engineering	Publication Date: Jan 30, 2020
Citations: 1

Similar Papers

Similarity Measures for Categorical Data: A Comparative Evaluation
Shyam Boriah ... Varun Chandola
-
Shyam Boriah, et. al.Shyam Boriah ... Varun Chandola
24 Apr 2008
24 Apr 2008

Outlier Analysis of Categorical Data using NAVF
D Lakshmi Sreenivasa Reddy ... B Raveendra Babu
Informatica Economica | VOL. 17
D Lakshmi Sreenivasa Reddy, et. al.D Lakshmi Sreenivasa Reddy ... B Raveendra Babu
30 Mar 2013
Informatica Economica | VOL. 17

A hybrid algorithm for mining local outliers in categorical data
Meiling Liu ... Weidong Tang
International Journal of Wireless and Mobile Computing | VOL. 13
Meiling Liu, et. al.Meiling Liu ... Weidong Tang
01 Jan 2017
International Journal of Wireless and Mobile Computing | VOL. 13

A New Approach for Calculating Similarity of Categorical Data
Cheng Hao Jin ... Yang Koo Lee
-
Cheng Hao Jin, et. al.Cheng Hao Jin ... Yang Koo Lee
01 Jan 2010
01 Jan 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Correlation and Probability Based Similarity Measure for Detecting Outliers in Categorical Data

Abstract

Talk to us

Similar Papers

More From: The International Journal of Innovative Technology and Exploring Engineering