Abstract

Robust predictive modeling is the process of creating, validating, and testing models to obtain better prediction outcomes. Datasets usually contain outliers whose trend deviates from the most data points. Conventionally, outliers are removed from the training dataset during preprocessing before building predictive models. Such models, however, may have poor predictive performance on the unseen testing data involving outliers. In modern machine learning, outliers are regarded as complex signals because of their significant role and are not suggested for removal from the training dataset. Models trained in modern regimes are interpolated (over trained) by increasing their complexity to treat outliers locally. However, such models become inefficient as they require more training due to the inclusion of outliers, and this also compromises the models’ accuracy. This work proposes a novel complex signal balancing technique that may be used during preprocessing to incorporate the maximum number of complex signals (outliers) in the training dataset. The proposed approach determines the optimal value for maximum possible inclusion of complex signals for training with the highest performance of the model in terms of accuracy, time, and complexity. The experimental results show that models trained after preprocessing with the proposed technique achieve higher predictive accuracy with improved execution time and low complexity as compared to traditional predictive modeling.

Highlights

  • Data mining is the process to extract interesting patterns from structured and unstructured data

  • In the modern interpolation regime, models are overtrained after the interpolation point while their complexity is increased to overcome the effect of the outliers [2].These models consider outliers during the training process, but even being on high complexity levels, they usually fail to achieve the correctness of the classical models

  • The classifier interpolated in this way, with the Sensors 2022, 22, x FOR PEER REVIEoWutliers dealt locally, results in the minimum possible effect of outliers on predictio4no[f2]1.9 Hyper parameter tuning [14,15,16,17,18] is used to improve the predictive accuracy of machine learning algorithms

Read more

Summary

Introduction

Data mining is the process to extract interesting patterns from structured and unstructured data. In the modern interpolation regime, models are overtrained after the interpolation point while their complexity is increased to overcome the effect of the outliers [2].These models consider outliers during the training process, but even being on high complexity levels, they usually fail to achieve the correctness of the classical models. Gaining motivation from this aspect, this work proposes a novel technique to:. Identify and suggest an optimal point at which the maximum number of outliers (complex signals) may be included in the training set with minimum deteriorating impact on the performance of the model. Basic Concepts This section discusses the basic concepts regarding classical and modern machine learning

Classical Supervised Machine Learning
Proposed Approach
11. Retutn OptPerfλ
Prioritization
Identification of Complex Signals
Experiment Design
33.1.1. .CCoommppaarirsiosonnOof Results OonnUUnniviveerrssitityyoOf Pf ePsehsahwawaraDr Dataatsaestet
Results and Comparison on MNIST Dataset
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.