Webometrics: Some Critical Issues of WWW Size Estimation Methods

Srinivasan Mohana Arunachalam,Siegfried Handschuh,Adamantios Koumpis

doi:10.3390/mti2020012

Srinivasan Mohana Arunachalam, Siegfried Handschuh + Show 1 more

Open Access

PDF Available

https://doi.org/10.3390/mti2020012

Copy DOI

Export

Save

Cite

Journal: Multimodal Technologies and Interaction	Publication Date: Apr 2, 2018
Citations: 4	License type: CC BY 4.0

Affiliation: University of Passau

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The number of webpages in the Internet has increased tremendously over the last two decades however only a part of it is indexed by various search engines. This small portion is the indexable web of the Internet and can be usually reachable from a Search Engine. Search engines play a big role in making the World Wide Web accessible to the end user, and how much of the World Wide Web is accessible on the size of the search engine’s index. Researchers have proposed several ways to estimate this size of the indexable web using search engines with and without privileged access to the search engine’s database. Our report provides a summary of methods used in the last two decades to estimate the size of the World Wide Web, as well as describe how this knowledge can be used in other aspects/tasks concerning the World Wide Web.

Highlights

The World Wide Web consists of millions of websites and billions of documents which are accessed through a search engine
Their lexicon consisted of 2,190,702 terms and ran the experiment for a total of 438,141 one term queries. Their experiments did not consider disjunctive or conjunctive queries. They estimated the size of the indexable web to be more than 11.5 billion which is the sum of the individual index sizes of the four search engines they considered Google, MSN, Ask/Teoma and Yahoo! after considering their overlap. 8 years since the first experiment, Altavista was no longer the most popular website and was subsequently purchased by Yahoo! in 2003, Yahoo! which was later acquired by Verizon in 2017
The approaches described here do not require privileged access to a search engine’s database and while the results are influenced by many biases, with sampling bias persistent across all the different methods

Summary

Introduction

The World Wide Web consists of millions of websites and billions of documents which are accessed through a search engine. Google has the biggest index size, which means it covers a lot more of the World Wide Web than the rest of the search engines combined as appeared in [1] and in a more recent work in [2] which in turn cater to a wider audience. One of the immediate reasons why Google dominates the other search engines is its index size, which is the number of documents it has indexed at a point in time It is bigger than all the other search engines combined, which gives it a tremendous competitive advantage. What this means is that, Google covers a lot more of the Web than the search engines, attracting a wider audience. Multimodal Technologies and Interact. 2018, 2, 12; doi:10.3390/mti2020012 www.mdpi.com/journal/mti

On Webometrics

Study of Overlap

Graph Nature of the World Wide Web

Diameter of the Web Graph

Estimating the Size of the Indexed Web

Search Engines and WWW Size Estimation

Statistical Approach Using Web Page Sampling

Updated Experiment Setting

Size Estimation through Quadrat Sampling

Size Estimation through Extrapolation

Index Stability

Findings

Discussion and Conclusions

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Webometrics: Some Critical Issues of WWW Size Estimation Methods

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Multimodal Technologies and Interaction

Lead the way for us

Similar Papers

Internet Search Engines
Vijay Kasi ... Radhika Jain
-
Vijay Kasi, et. al.Vijay Kasi ... Radhika Jain
01 Jan 2006
01 Jan 2006

Internet Search Engines
Vijay Kasi ... Radhika Jain
-
Vijay Kasi, et. al.Vijay Kasi ... Radhika Jain
18 Jan 2011
18 Jan 2011

A Survey on Crawlers used in developing Search Engine
Smita Deshmukh ... Kantilal Vishwakarma
-
Smita Deshmukh, et. al.Smita Deshmukh ... Kantilal Vishwakarma
06 May 2021
06 May 2021

SEReleC# - C# implementation of SEReleC
Vishwas Raval ... Padam Kumar
-
Vishwas Raval, et. al.Vishwas Raval ... Padam Kumar
03 Sep 2012
03 Sep 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Webometrics: Some Critical Issues of WWW Size Estimation Methods

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Multimodal Technologies and Interaction