A Unique Pipeline Model to Improve Anomaly Detection in High Dimensional Data

Et Al Upasana Gupta

doi:10.17762/ijritcc.v11i10.8810

Abstract

This paper presents a comprehensive method for dimension reduction and detecting anomalies in high-dimensional data (on healthcare datasets) using R. Realizing that traditional linear methods such as Principal Component Analysis (PCA) often ignore the complexity of the non-linear manifold of the data, our approach exploits iterative learning, on the belief that high-dimensional data is largely based on a low-dimensional manifold. The methodology starts by preparing the data using R libraries like Keras, dplyr, and ggplot2, addressing challenges like missing values ??and visualizing meaningful information. Using the Mahalanobis distance, the paper identifies and removes country-specific outliers. The pipelined model integrates Principal Component Analysis (PCA) for data transformation and combines an Autoencoder with t-SNE for dimensionality reduction. This refined dataset is then used to train a Multi-Layer Perceptron (MLP) artificial neural network, which facilitates anomaly detection based on reconstruction errors, illustrated by the point cloud. Additionally, the paper explores metric multidimensional scaling using artificial neural networks, tests large datasets such as healthcare and wine, and compares the results of the work using conventional techniques. This study highlights the effectiveness of integrating various pre-processing, visualization, and artificial neural network strategies through R for effective anomaly detection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Unique Pipeline Model to Improve Anomaly Detection in High Dimensional Data

Abstract

Talk to us

Similar Papers

More From: International Journal on Recent and Innovation Trends in Computing and Communication

Lead the way for us

Similar Papers

Multilayer perceptron neural networking for prediction of quality attributes of spray-dried vegetable oil powder
Mousumi Ghosh ... Shubhangi Srivastava
Soft Computing | VOL. 24
Mousumi Ghosh, et. al.Mousumi Ghosh ... Shubhangi Srivastava
13 Nov 2019
Soft Computing | VOL. 24

Evaluate the performance of different algorithms of pixel- based classification in providing the landscape map (Case Study: Malekshahicity, Ilam province)
...
-
, et. al. ...
01 Jan 2017
01 Jan 2017

Application of artificial neural networks in predicting biomass higher heating value: an early appraisal
Joshua O Ighalo ... Gonçalo Marques
Energy Sources, Part A: Recovery, Utilization, and Environmental Effects | VOL. 46
Joshua O Ighalo, et. al.Joshua O Ighalo ... Gonçalo Marques
30 Aug 2020
Energy Sources, Part A: Recovery, Utilization, and Environmental Effects | VOL. 46

Clustering High-Dimensional Stock Data using Data Mining Approach
Dhea Indriyanti ... Arian Dhini
-
Dhea Indriyanti, et. al.Dhea Indriyanti ... Arian Dhini
01 Jul 2019
01 Jul 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Unique Pipeline Model to Improve Anomaly Detection in High Dimensional Data

Abstract

Talk to us

Similar Papers

More From: International Journal on Recent and Innovation Trends in Computing and Communication