Abstract

The paper presents results of the research related to influence of the level of outliers in the data (train and test data considered separately) on the quality of a model prediction in a classification task. The set of 100 semi–artificial time series was taken into consideration, which independent variables was close to real ones, observed in a underground coal mining environment and dependent variable was generated with the decision tree. For every considered method (decision trees, naive bayes, logistic regression and kNN) a reference model was built (no outliers in the data) which quality was compared with the quality of two models: Out–Out (outliers in train and test data) and Non-out–Out (outliers only in test data). 50 levels of outliers in the data were considered, from 1 % to 50 %. Statistical comparison of models was done on the basis of sign test.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.