Drugs clustering based on their compositions using Word2Vec and K-means clustering

Rahmat Hidayat,Nur Aini Rakhmawati

doi:10.1063/5.0104981

Abstract

With the rapid growth in medicine, it is essential to determine a method of cluster drug composition data to make it easy for industries to define medicine composition. K-means clustering is one way to cluster the composition of drugs. In this paper, we use the Word2Vec model and convert the composition of the drug into a vector. We cluster it using K-means, also visualize the data results of the clustering. In Word2Vec, we use two methods, namely CBOW and SG. Meanwhile, in K-means, we determine the number of centroids using the Elbow Criterion and Silhouette Coefficient method. Datasets consist of more than 250 product names of drug from Farmaku and K24. The experiment results show that the Silhouette Coefficient value using the CBOW and SG methods are 0.901 and 0.877. Both CBOW and SG method generating the best value of the number of clusters is three.

Full Text