Category cluster discovery from distributed WWW directories

Mei-Ling Shyu,Choochart Haruechaiyasak,Shu-Ching Chen

doi:10.1016/s0020-0255(03)00169-5

Abstract

Due to the inherently distributed nature of many networks, including the Internet, information and knowledge are generated and organized independently by different groups of people. To discover and exploit all the knowledge from different sources, a method of knowledge integration is usually required. Considering the document category sets as information sources, we define a problem of information integration called category merging. The purpose of category merging is to automatically construct a unified category set which represents and exploits document information from several different sources. This merging process is based on the clustering concept where categories with similar characteristics are merged into the same cluster under certain distributed constraints. To evaluate the quality of the merged category set, we measure the precision and recall values under three classification methods, Naive Bayes, Vector Space Model, and K-Nearest Neighbor. In addition, we propose a performance measure called cluster entropy, which determines how well the categories from different sources are distributed over the resulting clusters. We perform the merging process by using the real data sets collected from three different Web directories. The results show that our merging process improves the classification performance over the non-merged approach and also provides a better representation for all categories from distributed directories.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Category cluster discovery from distributed WWW directories

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Jul 25, 2003
Citations: 39

Similar Papers

COMPARISON OF VSM, GVSM, AND LSI IN INFORMATION RETRIEVAL FOR INDONESIAN TEXT
Jasman Pardede ... Milda Gustiana Husada
Jurnal Teknologi | VOL. 78
Jasman Pardede, et. al.Jasman Pardede ... Milda Gustiana Husada
16 May 2016
Jurnal Teknologi | VOL. 78

The Comparative Early Prediction Model for Cardiovascular Disease Using Machine Learning
Sri Sumarlinda ... Azizah Binti Rahmat
International Journal of Scientific Research in Computer Science, Engineering and Information Technology | VOL. -
Sri Sumarlinda, et. al. Sri Sumarlinda ... Azizah Binti Rahmat
01 Jan 2024
International Journal of Scientific Research in Computer Science, Engineering and Information Technology | VOL. -

Classification of Gases and Concentration Levels Obtained from Sensor Array Detection as Electronic Nose
Ferry Wahyu Wibowo ... Wihayati
-
Ferry Wahyu Wibowo, et. al.Ferry Wahyu Wibowo ... Wihayati
29 Jul 2021
29 Jul 2021

Perbandingan Metode Klasifikasi Data Mining Untuk Rekomendasi Tanaman Pangan
Merlinda Wibowo ... Rafian Ramadhani
JURNAL MEDIA INFORMATIKA BUDIDARMA | VOL. 5
Merlinda Wibowo, et. al.Merlinda Wibowo ... Rafian Ramadhani
31 Jul 2021
JURNAL MEDIA INFORMATIKA BUDIDARMA | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Category cluster discovery from distributed WWW directories

Abstract

Talk to us

Similar Papers

More From: Information Sciences