Abstract

The vast availability of information sources has created a need for research on automatic summarization. Current methods perform either by extraction or abstraction. The extraction methods are interesting, because they are robust and independent of the language used. An extractive summary is obtained by selecting sentences of the original source based on information content. This selection can be automated using a classification function induced by a machine learning algorithm. This function classifies sentences into two groups: important or non-important. The important sentences then form the summary. But, the efficiency of this function directly depends on the used training set to induce it. This paper proposes an original way of optimizing this training set by inserting lexemes obtained from ontological knowledge bases. The training set optimized is reinforced by ontological knowledge. An experiment with four machine learning algorithms was made to validate this proposition. The improvement achieved is clearly significant for each of these algorithms.

Highlights

  • Research works on automatic summarization have greatly increased in recent years

  • This paper proposes an original way of optimizing this training set by inserting lexemes obtained from ontological knowledge bases

  • When the process of classification is finished, we identify four categories of sentences among all those analyzed: True Positive Case (TP): if the function predicts correctly a sentence labeled as important; True Negative Case (TN): if the function predicts correctly a sentence labeled as non-important; False Positive Case (FP): if the function predicts incorrectly a sentence labeled as important; False Negative Case (FN): if the function predicts incorrectly a sentence labeled as non-important

Read more

Summary

Introduction

Research works on automatic summarization have greatly increased in recent years. digital sources of information have become increasingly available. A summary obtained by extraction is composed of a set of sentences selected from the source document(s) by using statistical or heuristic methods based on information entropy of sentences. Automatic summarization process by abstraction is usually decomposed into three steps: interpretation of source document(s) to obtain representation, transformation of this representation, and production of a textual synthesis [5]. Both approaches have their advantages and drawbacks. Data too scattered do not facilitate a good estimate, nor obtain good classification models This problem is tackled by using heuristic methods based on linear approximations, which optimize the training set by reducing it or constructing a new smaller set from another series of attributes [9].

Insert Ontological Knowledge in Summary Extraction Process
Summarization Process Considered
Insertion of Ontological Knowledge
Evaluation Method
Experiment and Results
Results for ROC Curves
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call