An novel cluster based feature selection and document classification model on high dimension trec data

Lalitha Kumari,Ch Satyanarayana

doi:10.14419/ijet.v7i1.1.10146

Abstract

TREC text documents are complex to analyze the features its relevant similar documents using the traditional document similarity measures. As the size of the TREC repository is increasing, finding relevant clustered documents from a large collection of unstructured documents is a challenging task. Traditional document similarity and classification models are implemented on homogeneous TREC data to find essential features for document entities that are similar to the TREC documents. Also, most of the traditional models are applicable to limited text document sets for text analysis. The main issues in the traditional text mining models in TREC repository include :1) Each document is represented in vector form with many sparsity values 2) Failed to find the document semantic similarity between the intra and inter clusters 3) High mean squared error rate. In this paper, novel feature selection based clustered and classification model is proposed on large number of different TREC repositories. Traditional latent Semantic Indexing and document clustering models are failed to find the topic relevance on large number of TREC clinical text document sets due to computational memory and time. Proposed document feature selection and clustered based classification model is applied on TREC clinical benchmark datasets. From the experimental results, it is proved that the proposed model is efficient than the existing models in terms of computational memory, accuracy and error rate are concerned.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An novel cluster based feature selection and document classification model on high dimension trec data

Abstract

Talk to us

Similar Papers

More From: International Journal of Engineering & Technology

Lead the way for us

Similar Papers

Comparing Feature Selection Techniques for Software Quality Estimation Using Data-Sampling-Based Boosting Algorithms
Taghi M Khoshgoftaar ... Amri Napolitano
International Journal of Reliability, Quality and Safety Engineering | VOL. 22
Taghi M Khoshgoftaar, et. al.Taghi M Khoshgoftaar ... Amri Napolitano
01 Jun 2015
International Journal of Reliability, Quality and Safety Engineering | VOL. 22

A Novel Feature Selection Based Classification Algorithm for Real-Time Medical Disease Prediction
Satuluri Naganjaneyulu ... Buraga Srinivasa Rao
-
Satuluri Naganjaneyulu, et. al.Satuluri Naganjaneyulu ... Buraga Srinivasa Rao
01 Jul 2018
01 Jul 2018

A Hybrid Document Features Extraction with Clustering based Classification Framework on Large Document Sets
S Anjali Devi ... S Siva
International Journal of Advanced Computer Science and Applications | VOL. 11
S Anjali Devi, et. al.S Anjali Devi ... S Siva
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 11

A Novel Hierarchical Document Clustering Framework on Large TREC Biomedical Documents
Pilli Lalitha Kumari ... Ch Satyanarayana
International Journal of Information Technology and Computer Science | VOL. 14
Pilli Lalitha Kumari, et. al.Pilli Lalitha Kumari ... Ch Satyanarayana
08 Jun 2022
International Journal of Information Technology and Computer Science | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An novel cluster based feature selection and document classification model on high dimension trec data

Abstract

Talk to us

Similar Papers

More From: International Journal of Engineering &amp; Technology

More From: International Journal of Engineering & Technology