Role of Pre-processing Phase in Document Clustering Technique for Gurmukhi Script

Mukesh Kumar*,Amandeep Verma

doi:10.35940/ijitee.c9105.019320

Abstract

Document clustering plays a central role in knowledge discovery and data mining by representing large data-sets into a certain number of data objects called clusters. Each cluster consists similar data objects in such a way that data objects in the same cluster are more similar and dissimilar to the data objects of other clusters. Document clustering technique for Gurmukhi script consists two phases namely: 1) Pre-processing phase 2) Processing phase. This paper concentrates pre-processing phase of document clustering technique for Gurmukhi script. The purpose of pre-processing phase is to convert unstructured text into structured text format. Various sub-phases of pre-processing phase are: segmentation, tokenization, removal of stop words, stemming, and normalization. The purpose of this paper is to present the significant role of pre-processing phase in an overall performance of document clustering technique for Gurmukhi script. The experimental results represent the significant role of pre-processing phase in terms of performance regarding assignment of data objects to the relevant clusters as well as in creation of meaningful cluster title list.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Role of Pre-processing Phase in Document Clustering Technique for Gurmukhi Script

Abstract

Talk to us

Similar Papers

More From: International Journal of Innovative Technology and Exploring Engineering

Lead the way for us

Similar Papers

Development of Document Clustering Technique for Gurmukhi Script using Fuzzy Term Weight
Mukesh Kumar ... Amandeep Verma
International Journal of Recent Technology and Engineering (IJRTE) | VOL. 8
Mukesh Kumar, et. al.Mukesh Kumar ... Amandeep Verma
30 Jul 2019
International Journal of Recent Technology and Engineering (IJRTE) | VOL. 8

Topic Oriented Probability Based and Semi Supervised Document Clustering
M Karthikeyan ... P Aruna
Asian Journal of Engineering and Applied Technology | VOL. 1
M Karthikeyan, et. al.M Karthikeyan ... P Aruna
05 May 2012
Asian Journal of Engineering and Applied Technology | VOL. 1

A survey on methodologies used for semantic document clustering
Aditi Gupta ... Ajay Kumar
-
Aditi Gupta, et. al.Aditi Gupta ... Ajay Kumar
01 Aug 2017
01 Aug 2017

Clustering Data in Peer-to-Peer Systems
Mei Li ... Wang-Chien Lee
-
Mei Li, et. al.Mei Li ... Wang-Chien Lee
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Role of Pre-processing Phase in Document Clustering Technique for Gurmukhi Script

Abstract

Talk to us

Similar Papers

More From: International Journal of Innovative Technology and Exploring Engineering