Abstract

The slang word is a word that is widely used everyday by social media users, especially in Indonesia. The language generally contains abbreviations or strange words that have absolutely no connection with the word in question, this often leads to ambiguity when analysis such as sentiment analysis and Information Retrieval (IR). But lately there has been a slang dictionary that keeps a list of slang words that are often used by internet users, the dictionary informs slang words into words in formal Indonesian. But there are still many studies, especially text mining that ignores or does not filter out the word slang so that it will likely affect the results in research that has been done. The Nazief & Adriani algorithm is an algorithm to transform basic forms using Indonesian morphology by eliminating prefixes and suffixes. The Ratcliff/Obershelp algorithm can determine the similarity between two strings by matching the two strings. The purpose of this study is to compare the word slang to the formal word Indonesian, it aims to find out the similarity between the word slang with the formal word so that it will be known the percentage level of slang similarity with the formal word to find out whether the formal basic word can indirectly representing the word slang with the degree of similarity. The dataset used by Kaggle and Github, the results of the study of the use of text preprocessing and stemming of the Nazief & Adriani algorithm affect the similarity of slang and formal words with the most frequent distribution in the range of values 0.8-0.89 (80% -89.99%) in three different tests with two types of datasets and affect the level of similarity of strings precisely on the dataset from Kaggle at the value of similarity 1 (100%) which previously amounted to 40 data to 927 data, the dataset from Github also increased to the value of similarity 1 (100%) which previously did not have data to 409 data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call