Investigate the Performance of Document Clustering Approach Based on Association Rules Mining

Noha Negm,Abdel Badeeh,Mohamed Amin,Passent Elkafrawy

doi:10.14569/ijacsa.2013.040820

Abstract

The challenges of the standard clustering methods and the weaknesses of Apriori algorithm in frequent termset clustering formulate the goal of our research. Based on Association Rules mining, an efficient approach for Web Document Clustering (ARWDC) has been devised. An efficient Multi-Tire Hashing Frequent Termsets algorithm (MTHFT) has been used to improve the efficiency of mining association rules by targeting improvement in mining of frequent termset. Then, the documents are initially partitioned based on association rules. Since a document usually contains more than one frequent termset, the same document may appear in multiple initial partitions, i.e., initial partitions are overlapping. After making partitions disjoint, the documents are grouped within the partition using descriptive keywords, the resultant clusters are obtained effectively. In this paper, we have presented an extensive analysis of the ARWDC approach for different sizes of Reuter's datasets. Furthermore the performance of our approach is evaluated with the help of evaluation measures such as, Precision, Recall and F-measure compared to the existing clustering algorithms like Bisecting K-means and FIHC. The experimental results show that the efficiency, scalability and accuracy of the ARWDC approach has been improved significantly for Reuters datasets. The internet has become the largest data repository, facing the problem of information overload. The existence of an abundance of information, in combination with the dynamic and heterogeneous nature of the Web, makes information retrieval a tedious process for the average user. Search engines, Meta-Search engines and Web Directories have been developed in order to help the users quickly and easily satisfy their information need. The Search engine performs exact matching between the query terms and the keywords that characterize each web page and presents the results to the user. These results are long lists of URLs, which are very hard to search. Furthermore, users without domain expertise are not familiar with the appropriate terminology thus not submitting the right query terms, leading to the retrieval of more irrelevant pages. This has led to the need for the development of new techniques to assist users effectively navigate, trace and organize the available web documents, with the ultimate goal of finding those best matching their needs. Document Clustering is one of the techniques that can play an important role towards the achievement of this objective. Document clustering has become an increasingly important task in analyzing huge numbers of documents distributed among various sites. Furthermore organizing them into different groups called as clusters, where the documents in each cluster share some common properties according to defined similarity measure. The fast and high-quality document clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Document clustering has been studied intensively because of its wide applicability in areas such as Web Mining, Search Engines, Information Retrieval, and Topological Analysis. Document Clustering is different than document classification. In document classification, the classes (and their properties) are known a priori, and documents are assigned to these classes; whereas, in document clustering, the number, properties, or membership (composition) of classes is not known in advance. Thus, classification is an example of supervised machine learning and clustering that of

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2013
Citations: 17	License type: cc-by

R Discovery Prime

R Discovery Prime

Investigate the Performance of Document Clustering Approach Based on Association Rules Mining

Abstract

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

An Integrated Approach and Framework for Document Clustering Using Graph Based Association Rule Mining
D S Rajput ... R S Thakur
-
D S Rajput, et. al.D S Rajput ... R S Thakur
01 Jan 2014
01 Jan 2014

Organizing Web Documents Resulting from an Information Retrieval System Using Formal Concept Analysis
N.N Myat ... Khin Haymar Saw Hla
-
N.N Myat, et. al.N.N Myat ... Khin Haymar Saw Hla
01 Jan 2004
01 Jan 2004

Effective algorithm of mining frequent itemsets for association rules
Pei-Qi Liu ... Zeng-Zhi Li
-
Pei-Qi Liu, et. al. Pei-Qi Liu ... Zeng-Zhi Li
26 Aug 2004
26 Aug 2004

An integration of fuzzy association rules and WordNet for document clustering
... Tyne Liang
Knowledge and Information Systems | VOL. 28
, et. al. ... Tyne Liang
27 Nov 2010
Knowledge and Information Systems | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Investigate the Performance of Document Clustering Approach Based on Association Rules Mining

Abstract

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications