Breathing Mankind Thoughts

Philippe Jacquet

doi:10.1145/2964791.2901899

Abstract

Mankind has never been connected as it is now and as it will be tomorrow. Nowadays thanks to the rise of social networks such as Tweeter and Facebook, we can follow in real time the thought of millions of people. In fact we can almost feel the thoughts of a whole humanity and maybe project ourselves in a position where we could predict the major trends in the collective behavior of this humanity. However such an ambitious aim would require considerable resources in processing and networking which may be far from affordable. Indeed trends and topics are carried in a multiple of small texts written in various language and vocabularies like an hologram carries information in a dispersed way. Their capture and classification pose serious problems of data mining and analytics. Processes based on pure semantic analysis would require too much processing power and memory. We will present alternative methods based on string complexity also inspired on geolocalization in wireless networks which saves processing power by several order of magnitude. The ultimate goal is to detect when people are thinking about the very same topics before they become aware. Beyond the problem of topic detection and classification one must also estimate the potential of an isolated topic to become a lasting trend. In other word one must probe the topic foundations, for example by challenging how trustworthy are its sources. Designing an efficient source finder algorithm is indissociable with building realistic models about topic propagation. If we suppose that topics propagate inside communities via the followers-followees links, the propagation is highly amplified by the unbalances in the graph topology. It is established that dominating and semi dominating nodes such as the CNN Tweeter site are the main accelerator of topic propagation. The difficulty is to find the actual source of a topic beyond those screening nodes and the search is prone to false positive and true negative effects. In fact we will show that finding a source of topic is similar to finding a common ancestor in a Darwin channel where spurious mutations complicate the task.

Full Text