Abstract

Web crawlers are programs that are used by search engines to collect necessary information from the internet automatically according to the rules set by the user. With so much information about sports news on the internet, it takes web crawlers with incredible speed in the process of crawling. There are several previous studies that discussed the process of extracting information in a web document that needs to be considered both in terms of both aspects, including in terms of the structure of the web page and the length of time needed. Therefore, in this research the web crawler application was developed by applying a multi-thread approach. This multi-thread approach to research is used to produce web crawlers that are faster in the process of crawling sports news by involving news sources more than one address at a time. In addition to the multi-thread approach, adjusting the structure of the website pages is also done to ensure the information to be extracted by web crawling. From the results of the multi-thread implementation test on the crawling process, this study has been able to increase speed compared to the single-thread method of 122.95 seconds. But the results of web update detection, have resulted in a speed that decreased by 6.27 seconds in the crawling process with unequal data and the speed on the crawling process has also decreased by 24.76 seconds on server 1 and by 23.92 seconds on server 2.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.