Abstract
Topological data analysis is a noble approach to extract meaningful information from high-dimensional data and is robust to noise. It is based on topology, which aims to study the geometric shape of data. In order to apply topological data analysis, an algorithm called mapper is adopted. The output from mapper is a simplicial complex that represents a set of connected clusters of data points. In this paper, we explore the feasibility of topological data analysis for mining social network data by addressing the problem of image popularity. We randomly crawl images from Instagram and analyze the effects of social context and image content on an image’s popularity using mapper. Mapper clusters the images using each feature, and the ratio of popularity in each cluster is computed to determine the clusters with a high or low possibility of popularity. Then, the popularity of images are predicted to evaluate the accuracy of topological data analysis. This approach is further compared with traditional clustering algorithms, including k-means and hierarchical clustering, in terms of accuracy, and the results show that topological data analysis outperforms the others. Moreover, topological data analysis provides meaningful information based on the connectivity between the clusters.
Highlights
These days, social networks have attracted billions of users to generate, consume and propagate content everyday
The same distance metrics that are used for topological data analysis are used for k-means and hierarchical clustering
The results show that when using a low dimensional feature, i.e., social context, traditional data mining techniques perform as well as topological data analysis
Summary
These days, social networks have attracted billions of users to generate, consume and propagate content everyday. In 2016, Twitter had about 313 million monthly active users, who shared more than 500 million tweets each day [1]. By the end of 2016, Facebook had an average of 1.23 billion daily active users [2]. In 2016, Instagram had 300 million users who were active on a daily basis and shared more than 95 million images and videos daily, which attracted more than 4 billion likes everyday [3]. The huge number of users, posts and interactions have allowed social networks to become a powerful source of information. Extracting meaningful information from such data has become more critical
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.