This research focuses on classifying the Hulu and Kuala Banjarese dialects in the prose text “Datu Kandangan and Datu Kartamina”. These dialects represent linguistic variations resulting from geographical, social, and cultural differences among language communities, particularly in South Kalimantan, Indonesia. Language analysis methods such as Python Natural Language Toolkit (NLTK), NumPy, and Latent Dirichlet Allocation (LDA) Visualization (LyLDAvis) were employed to classify the dialects, involving data preprocessing steps like tokenization, punctuation removal, stop word normalization, and stemming. The research findings reveal the superiority of the "Naive Bayes" method over the "Boolean Query," achieving high accuracy in identifying positive examples and classifying texts into Upper and Lower Banjar dialects. The "Naive Bayes" method outperforms the "Boolean Query" with precision and recall values of 0.955563 and 0.956098, while the "Boolean Query" only reaches 0.021416 and 0.146341. This study makes a significant scholarly contribution to understanding language and cultural diversity in South Kalimantan, opening opportunities for further exploration in developing Natural Language Processing (NLP) technology for Indonesian regional languages.
Read full abstract