Defined entity extraction based on Indonesian text document

Tito Mangasi,Heru Purnomo Ipung,Alva Erwin

doi:10.1109/ictss.2014.7013152

Abstract

Entity Extraction basically is a part of process to extract document from unstructured metadata text documents. It is important to know whether the words stated in some documents are useful and contains of important information. With the growth of technology including website and internet, some involved in how semantic and technical challenged to make entity extraction much more efficient. In this case there are several tools that complied with existing name finder extraction. OpenNLP plays a good instrument to imply. Extracting entities such as person names, location and organization become terminology to defined the field of entity extraction. In generating the model for training set, Indonesian articles and documents need to be plenty and diverse so those entity easily to know exactly how to differentiate each other entities. There are several problems that necessary to minimize such as accuracy and efficiency. Percentage of word inside training set also need to have more custom and unique sentence. The result shown will be based on training set and the model generated. Mainly whole articles are in Indonesian language and this is not yet created in OpenNLP models.

Full Text