Abstract

In order to generate a report for an enterprise where there is neither the API supporting from their existing website systems nor the granted database access rights approval, a daily business report generator system based on web scraping with k nearest neighbor (kNN) classification algorithm is proposed in this paper. It covers the web crawler technology that is to access existing website system and extract business data. The kNN algorithm is applied to identify the verification code on the login page, and the brief daily report generating in a spread-sheet style grid. Compared with some OCR engine for image recognition, the system in Python can automatically generate the brief daily business reports by the kNN algorithm, which is better than some library with default training set on validating the verification code.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.