Abstract

Text data streams have widely appeared in real-world applications, in which, concept drifts owe a significant challenge for classification. Compared with relational data streams, concept drifts hidden in text streams usually reflect in the relationship between the feature vector and the instance labels. Meanwhile, existing concept drifting detection methods are mainly based on error rates of classification. When applying these methods in text streams, they perform poorly in the evaluations of false alarms and missing detections, etc. Motivated by this, we firstly give a systematic analysis of the concept drifts in text data streams. Then, we propose a three-layer concept drifting detection approach, where the three layers indicate the layer of label space, the layer of feature space and the layer of the mapping relationships between labels and features, respectively. In this approach, the latter two layers are based on the values of WoE (Weight of Evidence) and the IV (Information Value) index. Experimental results show that our approach can improve the performance of concept drifting detection and the accuracy of classification, especially when concept drifts in text data streams are frequent.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.