Abstract
An efficient full-text search is achieved by indexing the raw data with an additional 20 to 30 percent storagecost. In the context of Big Data, this additional storage space is huge and introduces challenges to entertainfull-text search queries with good performance. It also incurs overhead to store, manage, and update the largesize index. In this paper, we propose and evaluate a method to minimize the index size to offer full-text searchover Big Data using an automatic extractive-based text summarization method. To evaluate the effectivenessof the proposed approach, we used two real-world datasets. We indexed actual and summarized datasets usingApache Lucene and studied average simple overlapping, Spearman’s rho correlation, and average rankingscore measures of search results obtained using different search queries. Our experimental evaluation showsthat automatic text summarization is an effective method to reduce the index size significantly. We obtained amaximum of 82% reduction in index size with 42% higher relevance of the search results using the proposedsolution to minimize the full-text index size.
Highlights
376 datasets, the performance of Lucene decreases Information Technology and Contsroigl nificantly
We propose and evaluate a method to minimize the index size to offer full-text search over Big Data using an automatic extractive-based text summarization method
The main contributions of this paper includes: _ We propose an automatic extractive-based text summarization for Big Data index minimization for the full-text search problem. _ We evaluate the effectiveness of the proposed method by studying relevance and overlapping of the search query results with baseline datasets. _ Study the effect of different text summarization threshold levels on data index minimization and search results
Summary
376 datasets, the performance of Lucene decreases Information Technology and Contsroigl nificantly. Recent advancements and adaptation of can be reduced to a smaller representative technology are contributing to growing digital data dataset for indexing to offer full-text search exponentially. The expected line is plotted by fitting the line using small size datasets varying from 1 GB to 10 GB This shows that on increasing size of with actWuael daptraospetos.se an automatic extractivebased text summarization for Big Data index minimization for the full-text search problem. The main contributions of this paper includes: _ We propose an automatic extractive-based text summarization for Big Data index minimization for the full-text search problem. _ Study the effect of different text summarization threshold levels on data index minimization and search results.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have