Abstract
Stochastic simulation has been very effective in many domains but never applied to the World Wide Web (WWW). This study is a premiere in using neural networks in stochastic simulation of the number of rejected web pages per search query. The evaluation of the quality of search engines should involve not only the resulting set of web pages but also an estimate of the rejected set of web pages. The iterative Radial Basis Functions (RBF) neural network developed by Meghabghab and Nasr [Iterative RBF neural networks as meta-models for stochastic simulations, in: The Second International Conference on Intelligent Processing and Manufacturing of Materials, 1999, p. 729] was applied for the evaluation of the number of rejected web pages on four search engines, i.e., Yahoo, Alta Vista, Google, and Northern Light. Nine input variables were selected for the simulation: (1) precision, (2) overlap, (3) response time, (4) coverage (5) update frequency, (6) Boolean logic, (7) truncation, (8) word and multi-word searching and (9) portion of the web pages indexed. Typical stochastic simulation meta-modeling uses regression models in Response Surface Methods (RSM) to test the N training data or patterns collected. RBF neural networks become a natural target for RSM because they use a family of surfaces each of which naturally divides an input space into two regions Z + and Z −, and the N patterns for testing will be assigned either class Z + or Z −. This technique divides the resulting set of responses to a query into accepted and rejected web pages. To test the hypothesis that the evaluation of any search engine query should involve an estimate of the number of rejected web pages as part of the evaluation, RBF meta-model was trained on a set of 9000 different simulation runs on the nine different input variables. Results show that two of the variables can be eliminated which include: response time and portion of the web indexed without affecting evaluation results. Results show that the number of rejected web pages for a specific set of search queries on these four engines is very high. Also a goodness measure of a search engine for a given set of queries can be designed which is a function of the coverage of the search engine and the normalized age of a new document in result set for the query. This study concludes that unless search engine designers address the issue of rejected web pages, indexing, and crawling, the usage of the Web as a research tool for academic and educational purposes will stay hindered.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.