Abstract
Robust predictive modeling is the process of creating, validating, and testing models to obtain better prediction outcomes. Datasets usually contain outliers whose trend deviates from the most data points. Conventionally, outliers are removed from the training dataset during preprocessing before building predictive models. Such models, however, may have poor predictive performance on the unseen testing data involving outliers. In modern machine learning, outliers are regarded as complex signals because of their significant role and are not suggested for removal from the training dataset. Models trained in modern regimes are interpolated (over trained) by increasing their complexity to treat outliers locally. However, such models become inefficient as they require more training due to the inclusion of outliers, and this also compromises the models’ accuracy. This work proposes a novel complex signal balancing technique that may be used during preprocessing to incorporate the maximum number of complex signals (outliers) in the training dataset. The proposed approach determines the optimal value for maximum possible inclusion of complex signals for training with the highest performance of the model in terms of accuracy, time, and complexity. The experimental results show that models trained after preprocessing with the proposed technique achieve higher predictive accuracy with improved execution time and low complexity as compared to traditional predictive modeling.
Highlights
Data mining is the process to extract interesting patterns from structured and unstructured data
In the modern interpolation regime, models are overtrained after the interpolation point while their complexity is increased to overcome the effect of the outliers [2].These models consider outliers during the training process, but even being on high complexity levels, they usually fail to achieve the correctness of the classical models
The classifier interpolated in this way, with the Sensors 2022, 22, x FOR PEER REVIEoWutliers dealt locally, results in the minimum possible effect of outliers on predictio4no[f2]1.9 Hyper parameter tuning [14,15,16,17,18] is used to improve the predictive accuracy of machine learning algorithms
Summary
Data mining is the process to extract interesting patterns from structured and unstructured data. In the modern interpolation regime, models are overtrained after the interpolation point while their complexity is increased to overcome the effect of the outliers [2].These models consider outliers during the training process, but even being on high complexity levels, they usually fail to achieve the correctness of the classical models. Gaining motivation from this aspect, this work proposes a novel technique to:. Identify and suggest an optimal point at which the maximum number of outliers (complex signals) may be included in the training set with minimum deteriorating impact on the performance of the model. Basic Concepts This section discusses the basic concepts regarding classical and modern machine learning
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.