Implementation of Web Scraping on News Sites Using the Supervised Learning Method

Dedy Rahman Prehanto ,Aries Dwi Indriyanti ,Ginanjar Setyo Permadi ,I Gusti Lanang Eka Prismana ,Edwin Hari Agus Prastyo

doi:10.17051/ilkonline.2021.03.43

Abstract

Indonesia is one of the highest internet users in the world, including in the penetration of information on the internet, online news media. But in general news sites not only display news information, but most sites also display other information such as advertisements and also forms of navigation that interfere with news site readers and interfere with readers comfort, from these problems this study aims to implement web scraping techniques with supervised learning methods and analyzing the form of DOM tree and XPath news sites. The supervised learning approach method is the method used in this study, which is one of the methods of machine learning. By combining these web scraping techniques with supervised learning, the aim is to be able to implement and optimize web scraping techniques to gather news information from various sites. To do basic web scraping namely knowing DOM patterns, XPath structure as a data model or selector at each site. The results of research in the form of a web scrap application that can retrieve news site content without copy paste and the data is stored in a database and displayed to the user application form for the reader without any ads and navigation that disturb the reader.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Implementation of Web Scraping on News Sites Using the Supervised Learning Method

Abstract

Talk to us

Similar Papers

More From: İlköğretim Online

Lead the way for us

Journal: İlköğretim Online	Publication Date: Jan 1, 2021
Citations: 1

Similar Papers

An Approach of Web Scraping on News Website based on Regular Expression
Achmad Maududie ... Windi Eka Yulia Retnani
-
Achmad Maududie, et. al.Achmad Maududie ... Windi Eka Yulia Retnani
01 Nov 2018
01 Nov 2018

Pattern Matching-based scraping of news websites
Hamza Salem ... Manuel Mazzara
Journal of Physics: Conference Series | VOL. 1694
Hamza Salem, et. al.Hamza Salem ... Manuel Mazzara
01 Dec 2020
Journal of Physics: Conference Series | VOL. 1694

Sentiment analysis using web scraping for live news data with machine learning algorithms
Parneet Kaur
Materials Today: Proceedings | VOL. 65
Parneet KaurParneet Kaur
01 Jan 2021
Materials Today: Proceedings | VOL. 65

Quantifying the role of online news in linking conservation research to Facebook and Twitter.
S.K Papworth ... T.P.L Nghiem
Conservation Biology | VOL. 29
S.K Papworth, et. al.S.K Papworth ... T.P.L Nghiem
27 Jan 2015
Conservation Biology | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Implementation of Web Scraping on News Sites Using the Supervised Learning Method

Abstract

Talk to us

Similar Papers

More From: İlköğretim Online