Semantic key phrase-based model for document management

Prafulla Bafna,Dhanya Pramod,Shailaja Shrwaikar,Atiya Hassan

doi:10.1108/bij-04-2018-0102

Abstract

PurposeDocument management is growing in importance proportionate to the growth of unstructured data, and its applications are increasing from process benchmarking to customer relationship management and so on. The purpose of this paper is to improve important components of document management that is keyword extraction and document clustering. It is achieved through knowledge extraction by updating the phrase document matrix. The objective is to manage documents by extending the phrase document matrix and achieve refined clusters. The study achieves consistency in cluster quality in spite of the increasing size of data set. Domain independence of the proposed method is tested and compared with other methods.Design/methodology/approachIn this paper, a synset-based phrase document matrix construction method is proposed where semantically similar phrases are grouped to reduce the dimension curse. When a large collection of documents is to be processed, it includes some documents that are very much related to the topic of interest known as model documents and also the documents that deviate from the topic of interest. These non-relevant documents may affect the cluster quality. The first step in knowledge extraction from the unstructured textual data is converting it into structured form either as term frequency-inverse document frequency matrix or as phrase document matrix. Once in structured form, a range of mining algorithms from classification to clustering can be applied.FindingsIn the enhanced approach, the model documents are used to extract key phrases with synset groups, whereas the other documents participate in the construction of the feature matrix. It gives a better feature vector representation and improved cluster quality.Research limitations/implicationsVarious applications that require managing of unstructured documents can use this approach by specifically incorporating the domain knowledge with a thesaurus.Practical implicationsExperiment pertaining to the academic domain is presented that categorizes research papers according to the context and topic, and this will help academicians to organize and build knowledge in a better way. The grouping and feature extraction for resume data can facilitate the candidate selection process.Social implicationsApplications like knowledge management, clustering of search engine results, different recommender systems like hotel recommender, task recommender, and so on, will benefit from this study. Hence, the study contributes to improving document management in business domains or areas of interest of its users from various strata’s of society.Originality/valueThe study proposed an improvement to document management approach that can be applied in various domains. The efficacy of the proposed approach and its enhancement is validated on three different data sets of well-articulated documents from data sets such as biography, resume and research papers. These results can be used for benchmarking further work carried out in these areas.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semantic key phrase-based model for document management

Abstract

Talk to us

Similar Papers

More From: Benchmarking: An International Journal

Lead the way for us

Journal: Benchmarking: An International Journal	Publication Date: Jun 19, 2019
Citations: 1

Similar Papers

Task recommender system using semantic clustering to identify the right personnel
Prafulla Bafna ... Shailaja Shirwaikar
VINE Journal of Information and Knowledge Management Systems | VOL. 49
Prafulla Bafna, et. al.Prafulla Bafna ... Shailaja Shirwaikar
12 Mar 2019
VINE Journal of Information and Knowledge Management Systems | VOL. 49

Finding the best trade-off between performance and interpretability in predicting hospital length of stay using structured and unstructured data.
Franck Jaotombo ... Badih Ghattas
PLOS ONE | VOL. 18
Franck Jaotombo, et. al.Franck Jaotombo ... Badih Ghattas
30 Nov 2023
PLOS ONE | VOL. 18

Identification and Prediction of Human Behavior through Mining of Unstructured Textual Data
Mohammad Reza Davahli ... Edgar Gutierrez
Symmetry | VOL. 12
Mohammad Reza Davahli, et. al.Mohammad Reza Davahli ... Edgar Gutierrez
19 Nov 2020
Symmetry | VOL. 12

Text Mining: Extraction of Interesting Association Rule with Frequent Itemsets Mining for Korean Language from Unstructured Data
Irfan Ajmal Khan ... Junghyun Woo
International Journal of Multimedia and Ubiquitous Engineering | VOL. 10
Irfan Ajmal Khan, et. al.Irfan Ajmal Khan ... Junghyun Woo
30 Nov 2015
International Journal of Multimedia and Ubiquitous Engineering | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semantic key phrase-based model for document management

Abstract

Talk to us

Similar Papers

More From: Benchmarking: An International Journal