Detecting Data Outliers with Machine Learning

Ghalia Nassreddine,Joumana Younis,Thaer Falahi

doi:10.55145/ajest.2023.02.02.018

Abstract

Anomalies are instances or collections of data that occur very rarely in a dataset and where features differ significantly from most of the data. In the age of technology, data is widely used in all sectors. Thus, anomalies in the data may produce problems if they are not detected. Anomaly detection involves examining specific data points and detecting rare occurrences that seem suspicious because they are different from the established pattern of behaviors. In this study, an approach to anomaly detection is built using a machine learning technique. The clustering distance-based method (k_means) is adopted. First, the anomaly existence is tested using p_value. After that, the anomaly data is detected using the clustering method. The proposed method was tested using real data collected from Kaggle. The results show the good performance of the k_means algorithm in the detection of outlier data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Detecting Data Outliers with Machine Learning

Abstract

Talk to us

Similar Papers

More From: Al-Salam Journal for Engineering and Technology

Lead the way for us

Journal: Al-Salam Journal for Engineering and Technology	Publication Date: May 16, 2023
License type: CC BY 4.0

Similar Papers

Detecting Anomalies in Financial Data Using Machine Learning Algorithms
Alexander Bakumenko ... Ahmed Elragal
Systems | VOL. 10
Alexander Bakumenko, et. al.Alexander Bakumenko ... Ahmed Elragal
25 Aug 2022
Systems | VOL. 10

A survey of machine learning methods applied to anomaly detection on drinking-water quality data
Eustace M Dogo ... Clinton Aigbavboa
Urban Water Journal | VOL. 16
Eustace M Dogo, et. al.Eustace M Dogo ... Clinton Aigbavboa
16 Mar 2019
Urban Water Journal | VOL. 16

Review of Machine and Deep Learning Techniques in Epileptic Seizure Detection using Physiological Signals and Sentiment Analysis
Deba Prasad Dash ... Mohammad R Khosravi
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23
Deba Prasad Dash, et. al.Deba Prasad Dash ... Mohammad R Khosravi
15 Jan 2024
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23

Tree crop yield estimation and prediction using remote sensing and machine learning: A systematic review
Carolina Trentin ... Luciano Shiratsuchi
Smart Agricultural Technology | VOL. 9
Carolina Trentin, et. al.Carolina Trentin ... Luciano Shiratsuchi
01 Sep 2024
Smart Agricultural Technology | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Detecting Data Outliers with Machine Learning

Abstract

Talk to us

Similar Papers

More From: Al-Salam Journal for Engineering and Technology