Abstract

At present, the World Wide Web is developing rapidly, and every day the problem of automated collection and analysis of information placed on various web resources is becoming increasingly urgent. If in the 90s of the last century, the World Wide Web was a huge amount of poorly structured information, to search in which it was difficult for a person. It was then that the first developments in the field of automated agents began to appear, facilitating the task of finding the necessary information on the web. The main part of such systems is a search robot - a software package that navigates through web resources and collects information for a database. In the Kazan (Volga Region) Federal University, a monthly rating of academic staff is compiled based on data placed in the personal offices of employees in the Electronic University system. Now there is a need to move away from manually filling the Hirsch index in a personal account with KFU staff to avoid incorrect data filing and validation of the entered information by the Prospective Development Center. What was required was the creation of a search robot to automatically collect the Hirsch indices of KFU employees from the Scopus system. This article discusses the search robot: What is it? How does he work? How to write your program to collect information? All these issues were addressed in this article. The possible types of search robots and the whole process of their work were considered. The Scopus scientometric system and scientometric indicator - Hirsch index, its purpose, and calculation were considered. For implementation, the Python programming language was used and the tools for implementing HTTP requests and processing HTML pages were considered.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.