Extracting Network Structure for International and Malaysia Website via Random Walk

Kar Tim Chan

doi:10.32802/asmscj.2019.400

Abstract

World Wide Web is an information retrieval system accessible via the Internet. Since all the web resources and documents are interlinks with hypertext links, it formed a huge and complex information network. Besides information, the web is also a primary tool for commercial, entertainment and connecting people around the world. Hence, studying its network topology will give us a better understanding of the sociology of content on the web as well as the possibility of predicting new emerging phenomena. In this paper, we construct networks by using random walk process that traverses the web at two popular websites, namely google.com (global) and mudah.my (local). We perform measurement such as degree distribution, diameter and average path length on the networks to determine various structural properties. We also analyse the network at the domain level to identify some top-level domains appearing in both networks in order to understand the connectivity of the web in different regions. Using centrality analysis, we also reveal some important and popular websites and domain from the networks.

Full Text