Named-Entity Recognition on Indonesian Tweets using Bidirectional LSTM-CRF

Deni Cahya Wintaka,Moch Arif Bijaksana,Ibnu Asror

doi:10.1016/j.procs.2019.08.161

Deni Cahya Wintaka, Moch Arif Bijaksana + Show 1 more

Open Access

https://doi.org/10.1016/j.procs.2019.08.161

Copy DOI

Journal: Procedia Computer Science	Publication Date: Jan 1, 2019
Citations: 17	License type: cc-by-nc-nd

Affiliation: Telkom University

Abstract

The massive amount of Twitter data allow it to be analyzed using Named-Entity Recognition. Named-Entity Recognition (NER) is a sub-task of Information Extraction that can recognize entities in a text. Most NERs are trained to handle formal text such as news articles, but when applied to informal texts such as tweets, it provides poor performance. The limited number of words, informal and messy grammar on tweets makes it difficult to classify the entities needed. In this study, it was built the model using a combination of deep learning and machine learning approaches, Bidirectional Long Short-Term Memory (BLSTM) and Conditional Random Field (CRF) as the solutions. Entities identified in the form of Person, Location and Organization. The corpus tested included 600 Indonesian tweets comprising 250 formal tweets and 350 informal tweets. The model got the best F1 score results by adding the word embedding type FastText, which are 86,13% for formal tweets, 81,17% for informal tweets, and 84,11% for combined tweets.

Full Text