Automatic Idiom Identification Model for Amharic Language

Anduamlak Abebe Fenta,Seffi Gebeyehu

doi:10.1145/3606864

Abstract

Idiomatic expressions are important natural parts of all languages and prominent parts of our daily speech. Idioms cannot be interpreted from the words that they are formed with directly and people may not understand the meaning. From past literature, it was noted that idiom affects Natural Language Processing research like machine translation, semantic analysis, and sentiment analysis. Other languages like English, Chinese, and Indian idioms are recognized through different methods in different research. As there is no standard method and research to identify Amharic idioms, this study is aimed to build a model to identify idioms for the Amharic language using a supervised machine learning approach. The study used 800 labeled expressions for training and 200 expressions for testing from Amharic idiom books “የአማ ረኛ ፈሊጦች” and different Amharic documents. To measure the performance of the model, we used accuracy, precision, recall, and F-score. Finally, a 97.5% accuracy result was achieved from the testing dataset showing a promising result. The study contributes to the information systems discourse about improving the awareness and knowledge of researchers on Amharic idioms.

Full Text