Quantifying the informativeness for biomedical literature summarization: An itemset mining method

Milad Moradi,Nasser Ghadiri

doi:10.1016/j.cmpb.2017.05.011

Abstract

ObjectiveAutomatic text summarization tools can help users in the biomedical domain to access information efficiently from a large volume of scientific literature and other sources of text documents. In this paper, we propose a summarization method that combines itemset mining and domain knowledge to construct a concept-based model and to extract the main subtopics from an input document. Our summarizer quantifies the informativeness of each sentence using the support values of itemsets appearing in the sentence. MethodsTo address the concept-level analysis of text, our method initially maps the original document to biomedical concepts using the Unified Medical Language System (UMLS). Then, it discovers the essential subtopics of the text using a data mining technique, namely itemset mining, and constructs the summarization model. The employed itemset mining algorithm extracts a set of frequent itemsets containing correlated and recurrent concepts of the input document. The summarizer selects the most related and informative sentences and generates the final summary. ResultsWe evaluate the performance of our itemset-based summarizer using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics, performing a set of experiments. We compare the proposed method with GraphSum, TexLexAn, SweSum, SUMMA, AutoSummarize, the term-based version of the itemset-based summarizer, and two baselines. The results show that the itemset-based summarizer performs better than the compared methods. The itemset-based summarizer achieves the best scores for all the assessed ROUGE metrics (R-1: 0.7583, R-2: 0.3381, R-W-1.2: 0.0934, and R-SU4: 0.3889). We also perform a set of preliminary experiments to specify the best value for the minimum support threshold used in the itemset mining algorithm. The results demonstrate that the value of this threshold directly affects the accuracy of the summarization model, such that a significant decrease can be observed in the performance of summarization due to assigning extreme thresholds. ConclusionCompared to the statistical, similarity, and word frequency methods, the proposed method demonstrates that the summarization model obtained from the concept extraction and itemset mining provides the summarizer with an effective metric for measuring the informative content of sentences. This can lead to an improvement in the performance of biomedical literature summarization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Quantifying the informativeness for biomedical literature summarization: An itemset mining method

Abstract

Talk to us

Similar Papers

More From: Computer Methods and Programs in Biomedicine

Lead the way for us

Journal: Computer Methods and Programs in Biomedicine	Publication Date: May 27, 2017
Citations: 37

Similar Papers

Graph-based biomedical text summarization: An itemset mining and sentence clustering approach
Mozhgan Nasr Azadani ... Ensieh Davoodijam
Journal of Biomedical Informatics | VOL. 84
Mozhgan Nasr Azadani, et. al.Mozhgan Nasr Azadani ... Ensieh Davoodijam
15 Jun 2018
Journal of Biomedical Informatics | VOL. 84

Extractive Text Summarization via Graph Entropy Çizge Entropi ile Çıkarıcı Metin Özetleme
Cengiz Hark ... Ebubekir Seyyarer
-
Cengiz Hark, et. al.Cengiz Hark ... Ebubekir Seyyarer
01 Sep 2019
01 Sep 2019

Ar-CM-ViMETA: Arabic Image Captioning based on Concept Model and Vision-based Multi-Encoder Transformer Architecture
Asmaa Osman ... Mohamed Shalaby
The International Arab Journal of Information Technology | VOL. 21
Asmaa Osman, et. al.Asmaa Osman ... Mohamed Shalaby
01 Jan 2024
The International Arab Journal of Information Technology | VOL. 21

Decomposition–based multi-objective differential evolution for extractive multi-document automatic text summarization
Muhammad Hafizul Hazmi Wahab ... Mohamed Othman
Applied Soft Computing | VOL. 151
Muhammad Hafizul Hazmi Wahab, et. al.Muhammad Hafizul Hazmi Wahab ... Mohamed Othman
31 Oct 2023
Applied Soft Computing | VOL. 151

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Quantifying the informativeness for biomedical literature summarization: An itemset mining method

Abstract

Talk to us

Similar Papers

More From: Computer Methods and Programs in Biomedicine