A clustering technique using single pass clustering algorithm for search engine

Zul Indra,Jafreezal Jaafar,Norshuani Zamin

doi:10.1109/wict.2014.7077325

Abstract

Internet users rely heavily on search engine to explore and find useful information buried in the websites. Up to now, the search results returned by the search engines are still far from satisfaction due to a long list of search results which in practice contains a mix of relevant and irrelevant information. The manual process of filtering the irrelevant information is daunting and time consuming. Clustering is one of the popular solutions for this cumbersome task. However, our literature studies revealed that research on document clustering for Asian languages are relatively limited as compared to English. Whilst the application of document clustering technique in search engines is commonly less available. In this research, a clustering technique for search engine using Single Pass Clustering (SPC) Algorithm is proposed. The technique is experimented on a set of Indonesian news documents to support the limited research of document clustering for Indonesian language. An experiment done on 200 Indonesian news documents has produced a number of satisfactory labelled clusters and the application of the algorithm is shown on a simulated search engine.

Full Text