Waste Pollution Classification in Indonesian Language using DistilBERT

Bambang Nursandi,Abba Suganda Girsang

doi:10.31943/gw.v15i1.645

Abstract

In Indonesia, waste pollution poses pressing environmental and health challenges, making accurate classification vital for targeted mitigation efforts. DistilBERT emerges as a streamlined counterpart to the acclaimed BERT architecture, designed to mirror BERT's advanced linguistic comprehension but with reduced computational demands. By leveraging the essence of transfer learning, DistilBERT benefits from a wealth of information obtained from extensive textual datasets, positioning it as an ideal choice for scenarios marked by limited data accessibility. In our research, we adopted DistilBERT to address the niche challenge of classifying waste types using a constrained dataset derived from Twitter conversations in Indonesian language—a medium notorious for its concise and often ambiguous content. Notwithstanding the dataset's restricted scope and the noise inherent to Twitter, DistilBERT demonstrated an astounding efficacy, registering a precision rate of 98%. This outcome accentuates DistilBERT's capability to navigate and discern complex textual nuances even in data-restricted environments and further highlights the significance of transfer learning in contemporary natural language processing challenges, especially in contexts as critical as Indonesia's waste management efforts

Full Text