Abstract

Automatic Multi-document summarization in Indonesian Language can help people to get more comprehensive online news information. The clustering algorithm which is widely developed over a decade in the text data domains is Latent Dirichlet Allocation (LDA). The LDA method contributes quite well in the field of text classification and information retrieval. One of LDA's usages is a document summarization method, since LDA is able to get the framework in a document. The multi-document summarization in Indonesian language using unsupervised learning approach, especially LDA, is still limited. The LDA and Significance Sentence methods have the advantage of choosing representative sentences from source documents. The testing model was performed using a combination of alpha parameters 0.1 and 0.001 as well as beta 0.001 and 0.1, which is combined with compression rate at 10%, 30% and 50% in the sentence ranking process of each document. Testing results show that the best result was obtained under parameters combination as follows: alpha value is 0.01, beta value is 0.1, compression rate is 50% and cosine similarity value is 0.931.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call