Extraction System Web Content Sports New Based On Web Crawler Multi Thread

Y D Pramudita,M A Rahmawanto,S S Putro,D R Anamisa

doi:10.1088/1742-6596/1569/2/022077

Y D Pramudita, M A Rahmawanto + Show 2 more

Open Access

https://doi.org/10.1088/1742-6596/1569/2/022077

Copy DOI

Journal: Journal of Physics: Conference Series	Publication Date: Jul 1, 2020
Citations: 3	License type: cc-by

Affiliation: Trunojoyo University

Abstract

Web crawlers are programs that are used by search engines to collect necessary information from the internet automatically according to the rules set by the user. With so much information about sports news on the internet, it takes web crawlers with incredible speed in the process of crawling. There are several previous studies that discussed the process of extracting information in a web document that needs to be considered both in terms of both aspects, including in terms of the structure of the web page and the length of time needed. Therefore, in this research the web crawler application was developed by applying a multi-thread approach. This multi-thread approach to research is used to produce web crawlers that are faster in the process of crawling sports news by involving news sources more than one address at a time. In addition to the multi-thread approach, adjusting the structure of the website pages is also done to ensure the information to be extracted by web crawling. From the results of the multi-thread implementation test on the crawling process, this study has been able to increase speed compared to the single-thread method of 122.95 seconds. But the results of web update detection, have resulted in a speed that decreased by 6.27 seconds in the crawling process with unequal data and the speed on the crawling process has also decreased by 24.76 seconds on server 1 and by 23.92 seconds on server 2.

Full Text