A Model for Improving Classifier Accuracy using Outlier Analysis

Lakshmi Sreenivasa Reddy.D,Dr B Raveendrababu,Dr A Govardhan

doi:10.24297/ijct.v7i1.3480

Lakshmi Sreenivasa Reddy.D, Dr B Raveendrababu + Show 1 more

Open Access

https://doi.org/10.24297/ijct.v7i1.3480

Copy DOI

Abstract

Anomalies are those records, which have different behavior and do not comply with the remaining records in the dataset. Outlier analysis is the concept to find anomalies in Datasets. Â Detecting outliers efficiently is an important issue in many fields of science, medicine and technology. Many methods are available to detect anomalies in numerical datasets but a limited number of methods available for categorical datasets. In this work, a novel method to detect outliers in categorical data based on entropy is proposed. This algorithm finds anomalies based on each record score and has great intuitive appeal. These scores called BAD scores. This algorithm utilizes the frequency of each value in the dataset. Greedy method needs k- scans of dataset to find â€˜kâ€™ outliers where as the proposed method needs only one scan of dataset and it calculates BAD score of each record directly. It avoids the problem of giving â€˜kâ€™ as an input and can find any number of outliers based on our data set directly.AVF method has less time complexity when compared with the other methods like Greedy, FPOF and FDOD. Greedy has good accuracy when compared with other methods like AVF and FPOF, FDOD (which are based on frequency patterns of all combinations of values in each record). Our algorithm shows better results in accuracy than AVF algorithm and Greedy. But this method has reached nearest to AVF in time complexity.Â This algorithm has been applied on Nursery dataset and Bank dataset taken from â€œUCI Machine Learning Repositoryâ€. In this work, it is proposed to extend Normal distribution [11], and Fuzzy concept [12] to BAD score [13] that is NAVF combined with Fuzzy AVF is applied to BAD Score. Â Numerical attributes are excluded from Datasets for our analysis. The experimental results show that it is efficient for outlier detection in categorical dataset.

Highlights

Outlier analysis is an important research field in many fields like networks, medicine and Business decisions
Most of the existing systems concentrate on numerical attributes or ordinal attributes and sometimes, categorical attribute values can be converted into ordinal values there to categorical values
Attribute Value Frequency (AVF) method is one of the efficient methods to detect outliers in categorical data in time complexity and greedy in accuracy. The mechanism in this AVF method is that, it calculates frequency of each value in each data attribute and finds their probability, and it finds the attribute value frequency for each record by averaging probabilities and selects top k- outliers based on the least AVF score

Summary

Introduction

Outlier analysis is an important research field in many fields like networks, medicine and Business decisions. The parameters used in FPOF and FDOD are σ, a threshold value to decide frequent item sets in each data object and „k‟, the number of outliers. There are many drawbacks in this method like difficulty of finding a correct model for different datasets and the efficiency of these models decreases as the number of dimensions increases [4] The remedy for this problem is applying the Principle Component Analysis. Knorr‟s et al [5], achieved some improvements in the distance-based algorithms They have explained that a part of dataset records belong to each outlier must be less than some threshold value. These density based methods have some advantages that they can detect outliers those are left by techniques with single, global criterion methods These methods find characteristics of objects instead of finding distances, densities and statistical parameters.

TERMINOLOGY

Experimental results

Sample Method

Conclusion and Future work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY	Publication Date: May 21, 2013
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Model for Improving Classifier Accuracy using Outlier Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY

Lead the way for us

Similar Papers

Knowledge Discovery from Earth Science Data
Sangram Panigrahi ... Priyanka Tripathi
-
Sangram Panigrahi, et. al.Sangram Panigrahi ... Priyanka Tripathi
01 Apr 2014
01 Apr 2014

Outlier Analysis of Categorical Data using NAVF
D Lakshmi Sreenivasa Reddy ... A Govardhan
Informatica Economica | VOL. 17
D Lakshmi Sreenivasa Reddy, et. al.D Lakshmi Sreenivasa Reddy ... A Govardhan
30 Mar 2013
Informatica Economica | VOL. 17

Outlier analysis of categorical data using FuzzyAVF
D Lakshmi Sreenivasa Reddy ... B Raveendra Babu
-
D Lakshmi Sreenivasa Reddy, et. al.D Lakshmi Sreenivasa Reddy ... B Raveendra Babu
01 Mar 2013
01 Mar 2013

Comparision of Classifiers Accuracies from FAVF and NOFI for Categorical Data
D Lakshmi Sreenivasa Reddy ... Mudimbi Krishna Murthy
-
D Lakshmi Sreenivasa Reddy, et. al.D Lakshmi Sreenivasa Reddy ... Mudimbi Krishna Murthy
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Model for Improving Classifier Accuracy using Outlier Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: INTERNATIONAL JOURNAL OF COMPUTERS &amp; TECHNOLOGY

More From: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY