Abstract

With the rapid development of the Internet, users are actively sharing their personal data and other information on many social networks. Information on the Internet should be analyzed to make sure that it is reliable and does not pose a threat to the public. Based on this, there is a need to collect, monitor and analyze this information. Data collection is a complex task, depending on the structure of each web page. Since not all resources allow you to collect information, you have to use many methods. The proposed article shows effective ways of using syntactic analysis to obtain information. The method of semantic analysis (parsing) of the contents of web pages is explained using a program written in Python based on the BeatifulSoup library. In addition, the focus is on methods of collecting information through other APIs, using tools to emulate user behavior in the browser. An algorithm for extracting information from thematic Internet resources using the BeatifulSoup + Requests library is presented. As a result, information was obtained from Englishand Russian-speaking hacker and carding forums.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call