Abstract
With the fast development of Internet and WWW, “information overload” has become an overwhelming problem, and collective attention of users will play a more important role nowadays. As a result, knowing how collective attention distributes and flows among different websites is the first step to understand the underlying dynamics of attention on WWW. In this paper, we propose a method to embed a large number of web sites into a high dimensional Euclidean space according to the novel concept of flow distance, which both considers connection topology between sites and collective click behaviors of users. With this geometric representation, we visualize the attention flow in the data set of Indiana university clickstream over one day. It turns out that all the websites can be embedded into a 20 dimensional ball, in which, close sites are always visited by users sequentially. The distributions of websites, attention flows, and dissipations can be divided into three spherical crowns (core, interim, and periphery). 20% popular sites (Google.com, Myspace.com, Facebook.com, etc.) attracting 75% attention flows with only 55% dissipations (log off users) locate in the central layer with the radius 4.1. While 60% sites attracting only about 22% traffics with almost 38% dissipations locate in the middle area with radius between 4.1 and 6.3. Other 20% sites are far from the central area. All the cumulative distributions of variables can be well fitted by “S”-shaped curves. And the patterns are stable across different periods. Thus, the overall distribution and the dynamics of collective attention on websites can be well exhibited by this geometric representation.
Highlights
In each second, 684478 pieces of content will be shared on Facebook, 204166667 emails will be sent, 100000 tweets will be posted, 27778 new posts will be published on Tumblr, and 571 new websites will be created, data keeps growing with no signs of stopping [1]
The flow distance notion both considers the topological closeness of websites and the average real behaviors of surfing which is apparently different from the traditional shortest path distance [30] and random walk distance [31,32,33,34] on close flow networks
The geometric representation of the websites is based on a novel notion of flow distance defined on the underlying open flow network model of attention flow which integrates the topological structure of hyperlinks and the collective behavior of user traffics between sites
Summary
684478 pieces of content will be shared on Facebook, 204166667 emails will be sent, 100000 tweets will be posted, 27778 new posts will be published on Tumblr, and 571 new websites will be created, data keeps growing with no signs of stopping [1]. Only 3 billion users (2014) consume this ever-accumulating information on the Internet [2], we are drowning in the sea of information and data. As pointed out by H.A. Simon, “a wealth of information creates a poverty of attention” [3], attention will doubtlessly play more and more important roles in the near future because of its scarcity and the overload of information. PLOS ONE | DOI:10.1371/journal.pone.0136243 September 1, 2015
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.