Deep sequential pattern mining for readability enhancement of Indonesian summarization

Dian Sa'Adillah Maylawati,Yogan Jaya Kumar,Muhammad Ali Ramdhani,Fauziah Kasmin

doi:10.11591/ijece.v14i1.pp782-795

Abstract

In text summarization research, readability is a great issue that must be addressed. Our hypothesis is readability can be accomplished by using text representations that keep the meaning of text documents intact. Therefore, this study aims to combine sequential pattern mining (SPM) in producing a sequence of a word as text representation with unsupervised deep learning to produce an Indonesian text summary called DeepSPM. This research uses PrefixSpan as an SPM algorithm and deep belief network (DBN) as an unsupervised deep learning method. This research uses 18,774 Indonesian news text from IndoSum. The readability aspect is evaluated by recall-oriented understudy for gisting evaluation (ROUGE) as a co-selection-based analysis; Dwiyanto Djoko Pranowo metrics, Gunning fog index (GFI), and Flesch-Kincaid grade level (FKGL) as content-based analysis; and human readability evaluation with two experts. The experiment result shows that DeepSPM yields better than DBN, with the F-measure value of ROUGE-1 enhanced to 0.462, ROUGE-2 is 0.37, and ROUGE-L is 0.41. The significance of ROUGE results also be tested using T-Test. The content-based analysis and human readability evaluation findings are conformable with the findings of co-selection-based analysis that generated summaries are only partially readable or have a medium level of readability aspect.

Full Text