Penerapan Convolutional Neural Networks untuk Mesin Penerjemah Bahasa Daerah Minangkabau Berbasis Gambar

Mayanda Mega Santoni Mayanda Mega Santoni,Nurul Chamidah Nurul Chamidah,Helena Nurramdhani Irmanda Helena Nurramdhani Irmanda,Reza Amarta Prayoga Reza Amarta Prayoga,Ria Astriratma Ria Astriratma,Desta Sandya Prasvita Desta Sandya Prasvita

doi:10.29207/resti.v5i6.3614

Abstract

One of efforts by the Indonesian people to defend the country is to preserve and to maintain the regional languages. The current era of modernity makes the regional language image become old-fashioned, so that most them are no longer spoken. If it is ignored, then there will be a cultural identity crisis that causes regional languages to be vulnerable to extinction. Technological developments can be used as a way to preserve regional languages. Digital image-based artificial intelligence technology using machine learning methods such as machine translation can be used to answer the problems. This research will use Deep Learning method, namely Convolutional Neural Networks (CNN). Data of this research were 1300 alphabetic images, 5000 text images and 200 vocabularies of Minangkabau regional language. Alphabetic image data is used for the formation of the CNN classification model. This model is used for text image recognition, the results of which will be translated into regional languages. The accuracy of the CNN model is 98.97%, while the accuracy for text image recognition (OCR) is 50.72%. This low accuracy is due to the failure of segmentation on the letters i and j. However, the translation accuracy increases after the implementation of the Leveinstan Distance algorithm which can correct text classification errors, with an accuracy value of 75.78%. Therefore, this research has succeeded in implementing the Convolutional Neural Networks (CNN) method in identifying text in text images and the Leveinstan Distance method in translating Indonesian text into regional language texts.

Highlights

Salah satu upaya bela negara yang dapat dilakukan oleh masyarakat Indonesia yaitu melestarikan dan memertahankan bahasa daerah
be a cultural identity crisis that causes regional languages to be vulnerable to extinction
Technological developments can be used as a way to preserve regional languages

Summary

Metode Penelitian

Tahapan penelitian meliputi beberapa tahapan, yaitu studi literatur, pengumpulan data, pembentukan model klasifikasi OCR CNN dan evaluasi model klasifikasi. Gambaran umum tahapan penelitian dapat dilihat pada menggunakan Leveinstan Distance dalam melakukan perbaikan otomatis pada hasil terjemahan Bahasa Bengali ke Bahasa Inggris dengan tingkat akurasi sebesar 78.13%. Penelitian Wint, Ducros, dan Aritsugi [20] menggunakan Leveinstan Distance untuk. Melalukan perbaikan ejaan pada dataset sosial media dengan tingkat akurasi 90%. Arifudin, dan Alamsyah [21] juga menggunakan Leveinstan Distance untuk melakukan autocomplete dan spell checking dalam proses pencarian data perpustakaan

Studi Literatur

Pembentukan Model Klasifikasi CNN

Dataset

Findings

Hasil Eksperimen dan Evaluasi