Abstract

Time is an important dimension of any information space and can be very useful in information retrieval tasks such as document exploration, similarity search, and clustering. As search applications keep gathering new and diverse information sources, presenting relevant information anchored in time becomes more important for exploration purposes. Temporal information is available in every document either explicitly, e.g., in the form of temporal expressions, or implicitly in the form of metadata. Recognizing such information and exploiting it for document retrieval and presentation purposes are important features that can significantly improve the functionality of search applications. In this research, we introduce a temporal document analysis framework for analyzing document collections from a temporal perspective in support of diverse information retrieval and exploration tasks. Our analysis is not based on document creation and/or modification timestamps but on extracting time from the content itself. An examination of a document with an emphasis on time data can lead into interesting discoveries. We study models and techniques to inspect a document based on its temporal content that can serve as the basis for comparison, clustering, and categorization. A core part of the framework is a system for annotating documents with temporal information that is implemented using existing tools. We present an add-on to traditional information retrieval applications in which we exploit various temporal information associated with documents to present and cluster documents along timelines. Using temporal entity extraction techniques, we show how temporal expressions are made explicit and used in the construction of multiple-granularity timelines. We discuss how hit-list based search results can be clustered according to temporal aspects, anchored in the constructed timelines, and how time-based document clusters can be used to explore search results. We examine how temporal information can be used for ranking purposes. We also propose an alternative document snippet technique that leverages new time aggregated measures that can help to highlight the most salient events. Finally, we present Pacha, an exploratory search system that combines the methods introduced earlier for exploring search results in timelines. Pacha is one of the many applications that can be developed using the framework.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call