Abstract

Abstract Although timely access to information is becoming increasingly important and gaining such access is no longer a problem, the capacity for humans to assimilate such huge amounts of information is limited. Topic Detection(TD) is then a promising research area that addresses speedy access of desired information. However, ironically, the time complexity of existing TD algorithms themselves is usually O(n 3 ) or up to the x-th power of e. Linear performance requirement of real world topic detection has not been significantly addressed. This paper reveals a new patented topic detection algorithm called RMIR that combines relevance model with information retrieval technique to improve on time efficiency. Relevance Model(RM) is a theoretical extension of statistical language modeling that was developed for the task of document retrieval. To reduce the costs of fetching RM, we reduce the number of comparisons for stories by a query-based approach that makes similar stories exist in the top-k query results...

Highlights

  • Timely access to information is becoming increasingly important in today’s knowledge-based economy, gaining such access is no longer a problem because of the widespread availability of broadband in both homes and businesses

  • From fig. 5, we can see that the RMIR algorithm's time complexity is almost linear, that is consistent with our theoretical models and analysis

  • Linear performance requirement of real world topic detection has not been seriously addressed in literatures

Read more

Summary

Introduction

Timely access to information is becoming increasingly important in today’s knowledge-based economy, gaining such access is no longer a problem because of the widespread availability of broadband in both homes and businesses. Detection and Tracking(TDT) event is defined as something that happens at a specific time and place, along with all the necessary preconditions and unavoidable consequences [1]. Such an event might be a car accident, a meeting, or a court hearing. Based on our researches [16,17,18,19] , we start to view TD from an angle of information retrieval(IR) as there are lots of established high performance IR models and algorithms. Enlightened by achievements in IR field, we adopt Relevance Model(RM) for our language model and Inverted Indices for document retrieval.

Formal Representation
Document Representation
Unigram Language Model
Relevance Model
Description of Model Design
Fast Inverted Indices
Experimental Analysis
TDT Evaluation Metrics
Retrospective Event Detection
Conclusion
Findings
TDT 2004
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call