Abstract

This research presents a comprehensive Big Data Approach that was utilized to create a Treebank of Informal and Formal Indonesian (TINTA). The study focuses on the dynamic spectrum of language usage in Indonesia. It incorporates extensive data collection, preprocessing, and annotation strategies to construct a dual-tiered corpus encompassing formal and informal linguistic expressions. Through leveraging advanced computational techniques, the creation of TINTA aims to capture the nuanced variations in Indonesian language structures across diverse contexts. This annotated treebank provides a valuable resource for advancing Natural Language Processing (NLP) applications and linguistic research endeavors by facilitating more profound insights into the grammatical intricacies and semantic nuances prevalent in informal and formal Indonesian language.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call