Abstract

The massive amount of Twitter data allow it to be analyzed using Named-Entity Recognition. Named-Entity Recognition (NER) is a sub-task of Information Extraction that can recognize entities in a text. Most NERs are trained to handle formal text such as news articles, but when applied to informal texts such as tweets, it provides poor performance. The limited number of words, informal and messy grammar on tweets makes it difficult to classify the entities needed. In this study, it was built the model using a combination of deep learning and machine learning approaches, Bidirectional Long Short-Term Memory (BLSTM) and Conditional Random Field (CRF) as the solutions. Entities identified in the form of Person, Location and Organization. The corpus tested included 600 Indonesian tweets comprising 250 formal tweets and 350 informal tweets. The model got the best F1 score results by adding the word embedding type FastText, which are 86,13% for formal tweets, 81,17% for informal tweets, and 84,11% for combined tweets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.