TINB: a topical interaction network builder from WWW

Atul Srivastava,Anuradha Pillai,Arun Solanki,Deepika Punj,Anand Nayyar

doi:10.1007/s11276-020-02469-y

Abstract

Social network is a collection of people generally called ‘actors’ who are connected to each other based on some association criteria like a friend, follow, co-authorship, co-workers, etc. Interaction networks are the generalization of social networks. In recent developments of data sciences, analytics has applications in every significant area such as economy, general elections, epidemics, terrorism detection, clustering, marketing, etc. All of these areas require interaction data of various entities. Though the social network is a significant reservoir for such data, it covers only one segment of the information. A right amount of information is available on the web, but it is not useful for analytics in its raw form. This paper presents a framework that collects information from www using a parameterized crawler and prepares the social network-like structure of web pages, called interaction network. The interaction network prepared is similar to any traditional social network in every aspect. The web pages are selected based on contexts of the URLs found in the nearby vicinity of URLs, decided by predefined parameters. The proposed crawler is tested over several topics covering thousands of pages. More than 50 percent harvest rate is achieved by the proposed crawler. Properties of the interaction network such as degree distribution, clustering coefficient, modularity, distribution of communities, diameter and page rank have been investigated to establish the fact that it behaves like any traditional social network. The idea of preparing interaction network is extendible to the field of newage technologies like IoT, big data, deepweb, prediction models etc.

Full Text