Abstract

Online social networks are one of the main platforms for arbitrary subjects of discussion. They are one of the main sources of data to analyse public opinion. For crawling and analysis of data from online social networks, are used data monitoring systems, which include a data collecting system. A typical system for collecting data from the Internet contains a crawler, parsers, a collection queue of tasks, a task scheduling subsystem, and a module for writing structured data to a storage system. The crawling from online social networks has a number of features. The paper considers methods of access to data from online social networks and a task planning subsystem. Formulates and underpins the requirements for a data collecting system to provide crawl results from online social networks, namely scalability, extensibility, and availability of a data storage subsystem and a queue of collection tasks.Describes main data accessing methods to have information from online social networks: API-based access, access through processing of HTML-pages and specialised interfaces for bots. Provides a description of main restrictions, which an online social network imposes, namely the need to register the application, the limited number of requests, the need to obtain user‘s permission to collect his (her) data. According to the analysis results, the anonymous download and processing of HTML pages were chosen, as a data access method.Formulates the task subsystem requirements, namely available types, hierarchy, and context of the task to be done. Describes the general architecture of the developed software system for crawling and analysis of data from online social networks, justifies its compliance with the earlier raised requirements.The problem of crawling and analysis of users’ ego-network graphs (sub-graphs of a social graph) are considered. Their collecting features are described and options of implementation are proposed depending on the amount of data collected.The results obtained can be used to build monitoring systems for online social networks and collect test data for experimentally estimated algorithms of social graphs analysis. Further development may be concerned with a detailed consideration of the problems of collecting other types of data from online social networks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.