Abstract
Abstract The present article deals with data collection in a given field using the agent-based technologies from various information sources of the Internet with the aim to ob-tain reliable and up-to-date data. The agent-based approach is illustrated by the data collection on the nuclear power plants operating all over the world. Three open information sources have been selected for data extraction. The information sources concerned have been analyzed and the features of data provision structure identified. In the course of the present work the following tools for the develop-ment of the software agents have been described: the browser control for human behavior simulation, HTML markup analysis using the XPath query language and data extraction from PDF-documents using regular expressions. Above all, the article considers the software architecture and the database scheme. In the re-sult of the software operation, data regarding 789 nuclear power plants has been obtained.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.