NTreeClus: A tree-based sequence encoder for clustering categorical series

Hadi Jahanshahi,Mustafa Gokce Baydogan

doi:10.1016/j.neucom.2022.04.076

Hadi Jahanshahi, Mustafa Gokce Baydogan

Open Access

https://doi.org/10.1016/j.neucom.2022.04.076

Copy DOI

Journal: Neurocomputing	Publication Date: Apr 22, 2022
Citations: 8	License type: elsevier-specific: oa user license

Affiliation: Toronto Metropolitan University, Boğaziçi University

Abstract

The overwhelming presence of categorical/sequential data in diverse domains emphasizes the importance of sequence mining. The challenging nature of sequences proves the need for continuing research to find a more accurate and faster approach providing a better understanding of their (dis) similarities. This paper proposes a new Model-based approach for clustering sequence data, namely nTreeClus. The proposed method deploys Tree-based Learners, k-mers, and autoregressive models for categorical time series, culminating with a novel numerical representation of the categorical sequences. Adopting this new representation, we cluster sequences, considering the inherent patterns in categorical time series. Accordingly, the model showed robustness to its parameter. Under different simulated scenarios, nTreeClus improved the baseline methods for various internal and external cluster validation metrics for up to 10.7% and 2.7%, respectively. The empirical evaluation using synthetic and real datasets, protein sequences, and categorical time series showed that nTreeClus is competitive or superior to most state-of-the-art algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

NTreeClus: A tree-based sequence encoder for clustering categorical series

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Similar Papers

Comparative Studies on Some Metrics for External Validation of QSPR Models
Kunal Roy ... Supratik Kar
Journal of Chemical Information and Modeling | VOL. 52
Kunal Roy, et. al.Kunal Roy ... Supratik Kar
17 Jan 2012
Journal of Chemical Information and Modeling | VOL. 52

A Novel Cosine-based Internal and External Validation metrics to assess twitter Data Clustering using Hybrid Topic Models
R M Noorullah ... Moulana Mohammed
IOP Conference Series: Materials Science and Engineering | VOL. 981
R M Noorullah, et. al.R M Noorullah ... Moulana Mohammed
01 Dec 2020
IOP Conference Series: Materials Science and Engineering | VOL. 981

Jerry A Nathanson Basic Environmental Technology — Water Supply, Waste Management and Pollution Control 4 th ed 2002 Published by Prentice Hall of India Pvt Ltd. New Delhi 81-203-2228-2 Pages 532 Price Rs 325 (Indian Reprint)
Mp Cariappa
Medical Journal Armed Forces India | VOL. 60
Mp CariappaMp Cariappa
01 Apr 2004
Medical Journal Armed Forces India | VOL. 60

Computational modeling of aquatic toxicity of polychlorinated naphthalenes (PCNs) employing 2D-QSAR and chemical read-across
Aniket Nath ... Kunal Roy
Aquatic Toxicology | VOL. 257
Aniket Nath, et. al.Aniket Nath ... Kunal Roy
25 Feb 2023
Aquatic Toxicology | VOL. 257

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NTreeClus: A tree-based sequence encoder for clustering categorical series

Abstract

Talk to us

Similar Papers

More From: Neurocomputing