When an event occurs, it attracts attention of information sources to publish related documents along its lifespan. The task of event detection is to automatically identify events and their related documents from a document stream, which is a set of chronologically ordered documents collected from various information sources. Generally, each event has a distinct activeness development so that its status changes continuously during its lifespan. When an event is active, there are a lot of related documents from various information sources. In contrast when it is inactive, there are very few documents, but they are focused. Previous works on event detection did not consider the characteristics of the event's activeness, and used rigid thresholds for event detection. We propose a concept called life profile, modeled by a hidden Markov model, to model the activeness trends of events. In addition, a general event detection framework, LIPED, which utilizes the learned life profiles and the burst-and-diverse characteristic to adjust the event detection thresholds adaptively, can be incorporated into existing event detection methods. Based on the official TDT corpus and contest rules, the evaluation results show that existing detection methods that incorporate LIPED achieve better performance in the cost and F1 metrics, than without.
Read full abstract