The Data Extraction Using Distributed Crawler Inside Multi-Agent System

Karel Tomala,Miroslav Voznak,Lukas Rapant,Jan Plucar,Patrik Dubec

doi:10.15598/aeee.v11i6.867

Abstract

The paper discusses the use of web crawler technology. We created an application based on standard web crawler. Our application is determined for data extraction. Primarily, the application was designed to extract data using keywords from a social network Twitter. First, we created a standard crawler, which went through a predefined list of URLs and gradually download page content of each of the URLs. Page content was then parsed and important text and metadata were stored in a database. Recently, the application was modified in to the form of the multi-agent system. The system was developed in the C# language, which is used to create web applications and sites etc. Obtained data was evaluated graphically. The system was created within Indect project at the VSB-Technical University of Ostrava.

Highlights

Browsing the code of web pages, gathering the information found in the code and search links to other websites is the most common task of robots
We have faced the problem of data mining from social networks, such as Twitter
The obtained results show that the tool manages to download large amounts of data

Summary

Web Crawler

Web crawler itself is started within every agent instance. Multi-agent system is able to encapsulate any application that needs to be run inside the multi-agent system. It may delegate part of communication and management tasks to the control elements in lower levels of the hierarchy Such architecture can be represented in the form of a tree (Fig. 4). According to our experiments we have discovered that running about 30 – 40 crawlers is lowering the number of request that single crawler processes This is caused by manager agent not being able to handle all requests. These requests are divided between inbound and outbound, inbound being data returned from crawler and outbound being URLs to be crawled This phenomenon could be observed when running about 90 crawlers, where manager agent is overwhelmed with inbound requests and is not able to distribute new URLs to be crawled.

Introduction

Multi-Agent System

Used Technology and Methodology

Twitter Search API

Results

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Advances in Electrical and Electronic Engineering	Publication Date: Dec 31, 2013
Citations: 8	License type: cc-by

R Discovery Prime

R Discovery Prime

The Data Extraction Using Distributed Crawler Inside Multi-Agent System

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Advances in Electrical and Electronic Engineering

Lead the way for us

Similar Papers

빅데이터 분석 서비스 지원을 위한 지능형 웹 크롤러
Dongmin Seo ... Hanmin Jung
The Journal of the Korea Contents Association | VOL. 13
Dongmin Seo, et. al.Dongmin Seo ... Hanmin Jung
28 Dec 2013
The Journal of the Korea Contents Association | VOL. 13

Prediction of tourist traffic to Peru by using sentiment analysis in Twitter social network
Ricardo Linares ... Jose Herrera
-
Ricardo Linares, et. al.Ricardo Linares ... Jose Herrera
01 Oct 2015
01 Oct 2015

Crawlers in our life
Robert J Isaacson
The Angle orthodontist | VOL. 80
Robert J IsaacsonRobert J Isaacson
01 Nov 2010
The Angle orthodontist | VOL. 80

Predicting Vulnerabilities in Web Applications Based on Website Security Model
Ivan Kovacevic ... Stjepan Gros
-
Ivan Kovacevic, et. al.Ivan Kovacevic ... Stjepan Gros
22 Sep 2022
22 Sep 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Data Extraction Using Distributed Crawler Inside Multi-Agent System

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Advances in Electrical and Electronic Engineering