Abstract

Internet users rely heavily on search engine to explore and find useful information buried in the websites. Up to now, the search results returned by the search engines are still far from satisfaction due to a long list of search results which in practice contains a mix of relevant and irrelevant information. The manual process of filtering the irrelevant information is daunting and time consuming. Clustering is one of the popular solutions for this cumbersome task. However, our literature studies revealed that research on document clustering for Asian languages are relatively limited as compared to English. Whilst the application of document clustering technique in search engines is commonly less available. In this research, a clustering technique for search engine using Single Pass Clustering (SPC) Algorithm is proposed. The technique is experimented on a set of Indonesian news documents to support the limited research of document clustering for Indonesian language. An experiment done on 200 Indonesian news documents has produced a number of satisfactory labelled clusters and the application of the algorithm is shown on a simulated search engine.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.