Abstract

The designed crawler scheduling system is a JEE application based on Quartz scheduling framework that sends crawler tasks to the crawler control system automatically or manually at regular intervals. The crawler control system visually manages and controls the shipping crawlers. The HttpCilent-centric crawler system crawls the shipping website. The message queue ActiveMQ is used as a message middleware for the crawler control system and the crawler system to achieve asynchronous communication and reduce coupling. In this system, front-end frameworks such as Bootstrap, AngularJS, and SweetAlert are used to implement a visual scheduling system and data visualization system. Change the crawler deployment environment by deploying containers through Docker. Add Selenium dynamic page processing for shipping website. Adding message queues and MongoDB database data storage. By adopting a series of techniques and designing, testing and quality assurance of the system, we aim to make the crawler more stable and efficient, and to achieve security, stability and ease of maintenance of the system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call