Comparative Study of Topic Modeling and Word Embedding Approaches for Web Service Clustering

Neha Agarwal,Lalit Kumar Awasthi,Geeta Sikka

doi:10.1145/3474124.3474169

Neha Agarwal, Lalit Kumar Awasthi + Show 1 more

https://doi.org/10.1145/3474124.3474169

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Vector space representation of web services plays a prominent role in enhancing the performance of different web service-based processes like clustering, recommendation, ranking, discovery, etc. Generally, Term Frequency - Inverse Document Frequency (TF-IDF) and topic modeling methods are widely used for service representation. In recent years, word embedding techniques have attracted researchers a lot because they can map services or documents based on semantic similarity. This paper provides a comparative analysis of two topic modeling techniques, i.e., Latent Dirichlet Allocation (LDA) and Gibbs Sampling algorithm for Dirichlet Multinomial Mixture (GSDMM) & two word embedding techniques, i.e., word2vec and fastText. These topic modeling and word embedding techniques are applied to a dataset of web service documents for vector space representation. K-Means clustering is used to analyze the performance, and results are evaluated based on standard evaluation criteria. Results demonstrate that word2vec model outperforms other techniques and provides a satisfactory improvement on clustering.

Full Text