Generating Document Summary using Data Mining and Clustering Techniques

Deepak Singh Rana

doi:10.17762/msea.v70i1.2310

Abstract

    Abstract This paper presents a novel approach to generating document summaries using data mining and clustering techniques, specifically K-means clustering and bisecting K-means clustering algorithms. With the exponential growth of textual data, there is an increasing need for efficient and accurate summarization techniques to aid users in understanding the key information within large collections of documents. This study explores the potential of data mining and clustering methods in extracting salient features from textual data and producing high-quality summaries. By applying K-means clustering and bisecting K-means clustering algorithms to the preprocessed textual data, the proposed approach groups similar sentences together and selects the most representative sentences from each cluster to form the final summary. The performance of the proposed method is evaluated using standard evaluation metrics, such as precision, recall, and F1-score, and compared with existing summarization techniques. The results demonstrate that the combination of data mining and clustering techniques provides a promising solution for generating accurate and concise document summaries, with potential applications in various domains, such as news aggregation, scientific literature summarization, and social media content analysis.   

Full Text