Abstract

Identifying changes in the dynamics of a classification scheme is an important task to solve using textual data streams. Changes in the volume of documents classified into one category could be a sign of a new emerging structure, which therefore gives clues on the need to update the classification scheme. In this paper, we present a method based on forecasting techniques, change detection and time series monitoring in order to raise alerts as soon as a change occurs in the volume of a given category. We build features only based on the textual content that enable us to accurately predict the expected temporal evolution of such category. Then, we use statistical process control to determine if the current volume is too far away from the one we might expect. We test our method on the New York Times Annotated Corpus and on an industrial data set from Electricité de France (EDF) and we observe that it raises alerts at the right time compared to other techniques from the literature.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call