Abstract
With the advent of big data and the popularity of black-box deep learning methods, it is imperative to address the robustness of neural networks to noise and outliers. We propose the use of Winsorization to recover model performances when the data may have outliers and other aberrant observations. We provide a comparative analysis of several probabilistic artificial intelligence and machine learning techniques for supervised learning case studies. Broadly, Winsorization is a versatile technique for accounting for outliers in data. However, different probabilistic machine learning techniques have different levels of efficiency when used on outlier-prone data, with or without Winsorization. We notice that Gaussian processes are extremely vulnerable to outliers, while deep learning techniques in general are more robust.
Highlights
Machine learning (ML) and artificial intelligence (AI) techniques have met astounding success in different industries and research problems
In general, the presence of noise using some degree of Winsorization in the training and validation set improves the model performance as opposed to when Winsorization is not used
We compare different probabilistic neural networks in terms of model performance and time taken for training
Summary
Machine learning (ML) and artificial intelligence (AI) techniques have met astounding success in different industries and research problems These techniques have the singular focus of improving prediction accuracy in complex data analysis problems. Unlike classical statistical frameworks involving relatively small datasets with few features, it is not possible in big data to carefully select and either drop or modify observations in a pre-processing step prior to the main data analysis. In any case, such ad hoc pre-processing steps can lead to a violation of standard regularity conditions that are required for a proper probabilistic analysis [1,2]. Similar issues have been noted in the context of model selection and other problems see [3] and related literature for deep theoretical discussions and results
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.