In Indonesia, many local websites, such as Irama Nusantara, hold valuable information related to music and culture. Although rich in data, the utilization of this information is still limited. This research aims to utilize query expansion techniques through data mining methods in analyzing data from the Irama Nusantara website. Data was collected from the Irama Nusantara website through a crawling process, resulting in 5404 entries covering audio, images and text. The analysis was conducted using Natural Language Processing (NLP) techniques starting with the preprocessing stage. Next, the K-Means algorithm was applied for clustering, and the Term Frequency-Inverse Document Frequency (TF-IDF) method was used for term weighting. Classification models were built using Support Vector Machine (SVM) and Naive Bayes for comparison. The analysis shows that the use of query expansion significantly improves the accuracy of information retrieval on the Irama Nusantara website. The method evaluation showed that SVM gave better results in terms of accuracy and precision compared to Naive Bayes. In addition, Principal Component Analysis (PCA) shows that 70-95% of the variance in the data can be explained by the resulting principal components, which signifies the efficiency of the applied method. This research not only provides a deeper insight into the patterns and trends in the analyzed data, but also contributes to the development of information technology in the field of culture in Indonesia. This research successfully developed an effective analysis model to utilize data from the Irama Nusantara website.
Read full abstract