Abstract

Anomalies are instances or collections of data that occur very rarely in a dataset and where features differ significantly from most of the data. In the age of technology, data is widely used in all sectors. Thus, anomalies in the data may produce problems if they are not detected. Anomaly detection involves examining specific data points and detecting rare occurrences that seem suspicious because they are different from the established pattern of behaviors. In this study, an approach to anomaly detection is built using a machine learning technique. The clustering distance-based method (k_means) is adopted. First, the anomaly existence is tested using p_value. After that, the anomaly data is detected using the clustering method. The proposed method was tested using real data collected from Kaggle. The results show the good performance of the k_means algorithm in the detection of outlier data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.