Topic Detection and Tracking Based on Windowed DBSCAN and Parallel KNN

Chuanzhen Li,Juanjuan Cai,Yang Yu,Minqiao Liu,Hui Wang

doi:10.1109/access.2020.3047458

Chuanzhen Li, Juanjuan Cai + Show 3 more

Open Access

PDF Available

https://doi.org/10.1109/access.2020.3047458

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Dec 25, 2020
Citations: 10	License type: CC BY 4.0

Affiliation: Communication University of China, iQIYI (China)

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Topic Detection and Tracking technique (TDT) has been commonly used to identify the hot topics from the huge volume of Internet news information and keep up with the hot news. However, traditional topic detection and tracking methods have shown low accuracy and low efficiency. In this paper, a topic detection system driven by big data is built on the Spark platform, which aims at improving the efficiency of news collecting from the Internet and improving the accuracy and efficiency of topic detection and tracking tasks. This system can be easily employed in a distributed architecture and work as a parallelized news collecting and topic detection system. An improved density-based spatial clustering of application with noise (DBSCAN) clustering algorithm based on the time window is proposed to achieve accurate topic detection with the auxiliary advantage of reducing the time complexity. A parallel KNN based topic tracking algorithm is proposed for the topic tracking task. Experiments including comparison with some baseline algorithms and quantitative and qualitative analyses are conducted on pseudo-distributed Spark platform, which demonstrates the effectiveness and efficiency of the parallelized topic detection system.

Highlights

With the rapid paradigm shift of information access, news and information could be provided by online news websites, mainstream media, and individual users as well
Data collection layer is mainly responsible for the extraction and preprocessing of the news data, which serves as the data source of the hot topic detection system
Facing the challenge of high time complexity in the process of text clustering, the proposed model is based on the time windowed density-based spatial clustering of application with noise (DBSCAN) algorithm and big data platform

Summary

INTRODUCTION

With the rapid paradigm shift of information access, news and information could be provided by online news websites, mainstream media, and individual users as well. Facing the challenge of high time complexity in the process of text clustering, the proposed model is based on the time windowed density-based spatial clustering of application with noise (DBSCAN) algorithm and big data platform. In such settings, the time complexity can be reduced from O(n2) to O(n) as analyzed in latter section. The main work of this paper is as below: 1) An improved DBSCAN clustering algorithm based on the time window is proposed, and it adopts an implementation of parallelization to process a huge amount of data stream. The timestamp feature is taken into consideration to reduce the computation cost

RELATED WORK

TIME WINDOW BASED SIMILARITY COMPUTATION

THE PARALLEL KNN BASED TOPIC TRACKING ALGORITHM

EXPERIMENTS AND EVALUATIONS

3) EVALUATION METRICS

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Topic Detection and Tracking Based on Windowed DBSCAN and Parallel KNN

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A month to topic detection and tracking in Hindi
James Allan ... Margaret E Connell
ACM Transactions on Asian Language Information Processing | VOL. 2
James Allan, et. al.James Allan ... Margaret E Connell
01 Jun 2003
ACM Transactions on Asian Language Information Processing | VOL. 2

An Improved Clustering Algorithm based on Single-pass
Biao Wang ... Jinwei Li
-
Biao Wang, et. al.Biao Wang ... Jinwei Li
19 Jul 2019
19 Jul 2019

NIST's 1998 topic detection and tracking evaluation (TDT2)
Jon Fiscus ... Alvin Martin
-
Jon Fiscus, et. al.Jon Fiscus ... Alvin Martin
05 Sep 1999
NIST's 1998 topic detection and tracking evaluation (TDT2)
Jon Fiscus ... Alvin Martin

A Topic Detection and Tracking System with TF-Density
Shu-Wei Liu ... Hsien-Tsung Chang
-
Shu-Wei Liu, et. al.Shu-Wei Liu ... Hsien-Tsung Chang
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Topic Detection and Tracking Based on Windowed DBSCAN and Parallel KNN

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access