Abstract

With its potential, extensive data analysis is a vital part of biomedical applications and of medical practitioner interpretations, as data analysis ensures the integrity of multidimensional datasets and improves classification accuracy; however, with machine learning, the integrity of the sources is compromised when the acquired data pose a significant threat in diagnosing and analysing such information, such as by including noisy and biased samples in the multidimensional datasets. Removing noisy samples in dirty datasets is integral to and crucial in biomedical applications, such as the classification and prediction problems using artificial neural networks (ANNs) in the body’s physiological signal analysis. In this study, we developed a methodology to identify and remove noisy data from a dataset before addressing the classification problem of an artificial neural network (ANN) by proposing the use of the principal component analysis–sample reduction process (PCA–SRP) to improve its performance as a data-cleaning agent. We first discuss the theoretical background to this data-cleansing methodology in the classification problem of an artificial neural network (ANN). Then, we discuss how the PCA is used in data-cleansing techniques through a sample reduction process (SRP) using various publicly available biomedical datasets with different samples and feature sizes. Lastly, the cleaned datasets were tested through the following: PCA–SRP in ANN accuracy comparison testing, sensitivity vs. specificity testing, receiver operating characteristic (ROC) curve testing, and accuracy vs. additional random sample testing. The results show a significant improvement in the classification of ANNs using the developed methodology and suggested a recommended range of selectivity (Sc) factors for typical cleaning and ANN applications. Our approach successfully cleaned the noisy biomedical multidimensional datasets and yielded up to an 8% increase in accuracy with the aid of the Python language.

Highlights

  • The material presented in this paper shows a significant improvement in the accuracy of an artificial neural networks (ANNs) in classification problems with the aid of principal component analysis–sample reduction process (PCA–SRP)

  • The ANN cast off 10% of the learning rate, two layers with 32 and 16 hidden neurons, ReLU activators in hidden layers, and SoftMax in output activators with 100 epochs or iterations, as shown in Table 3 based on the Principal component analysis (PCA)–SRP and ANN The

  • These datasets were used in the PCA–SRP + ANN accuracy comparison testing, sensitivity vs. specificity testing, receiver operating characteristic (ROC) curve testing, and accuracy vs. additional random samples testing; the results show significant improvements

Read more

Summary

Introduction

It is an outstanding contribution to human health, entertainment, the military, security, sports, and leisure and in analysing a patient’s physiological data and their interpretation These sensors are an integral part of biomedical devices; this innovation has an ever-evolving trend coupled along with challenges to perform intelligently [1]. An artificial neural network (ANN) is a part of a computing system that simulates the ability of the human neuron to learn the complex characteristics of the environment, to recognise patterns, and to generalise the inter-relationships between the features, including multidimensional datasets. ANNs enable the complex inter-relationships between the features within a given dataset to be identified and have seen widespread adoption in different applications, including biomedical and signal processing, which is readily and publicly available by wearable sensors [4].

Data-Cleaning Applications
Methods
Principal Component Analysis–Sample Reduction Process
PCA–SRP Implementation in ANN
Sc Range Identification
Multidimensional Datasets
Heart Disease Dataset
Gender Voice Recognition Dataset
Breast Cancer Classification Dataset
Cancer Patients Dataset
Discussion and Results
Conclusions and Future Research
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call