Using Big Data Approach to Create a Treebank of Informal and Formal Indonesian

Danang Satria Nugraha

doi:10.36346/sarjet.2024.v06i01.001

Abstract

This research presents a comprehensive Big Data Approach that was utilized to create a Treebank of Informal and Formal Indonesian (TINTA). The study focuses on the dynamic spectrum of language usage in Indonesia. It incorporates extensive data collection, preprocessing, and annotation strategies to construct a dual-tiered corpus encompassing formal and informal linguistic expressions. Through leveraging advanced computational techniques, the creation of TINTA aims to capture the nuanced variations in Indonesian language structures across diverse contexts. This annotated treebank provides a valuable resource for advancing Natural Language Processing (NLP) applications and linguistic research endeavors by facilitating more profound insights into the grammatical intricacies and semantic nuances prevalent in informal and formal Indonesian language.

Full Text