Abstract

An organization performing environmental scanning generally monitors or tracks various events concerning its external environment. One of the major resources for environmental scanning is online news documents, which are readily accessible on news websites or infomediaries. However, the proliferation of the World Wide Web, which increases information sources and improves information circulation, has vastly expanded the amount of information to be scanned. Thus, it is essential to develop an effective event episode discovery mechanism to organize news documents pertaining to an event of interest. In this study, we propose two new metrics, Term Frequency × Inverse Document FrequencyTempo (TF×IDFTempo) and TF×Enhanced‐IDFTempo, and develop a temporal‐based event episode discovery (TEED) technique that uses the proposed metrics for feature selection and document representation. Using a traditional TF×IDF‐based hierarchical agglomerative clustering technique as a performance benchmark, our empirical evaluation reveals that the proposed TEED technique outperforms its benchmark, as measured by cluster recall and cluster precision. In addition, the use of TF×Enhanced‐IDFTempo significantly improves the effectiveness of event episode discovery when compared with the use of TF×IDFTempo.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.