Algorithm for building a website model

Natalia A Huk,Stanislav V Dykhanov,Oleh D Matiushchenko

doi:10.26565/2304-6201-2020-47-03

Abstract

The analysis of the structure of the website modeling has been carried out. The models of Internet space representation in the form of semantic networks, frame structures and ontology have been analyzed. The web graph model has been chosen to represent the web resource. The pages of a web resource are connected by hyperlinks, which form the internal structure of the resource. To build a model of a website in the form of a web graph, a method and algorithm for scanning the pages of a web resource have been developed. The web resource scanning is performed by in depth searching with the LIFO (Last In - First Out) method. Links are searched by sorting the lines of the page markup text and extracting links by using regular expressions. Only links to pages within the resource are taken into account in the search process, external links are ignored. The crawling procedure is implemented by using the Scrapy framework and the Python. To account for the presence of additional filters used to select pages with criteria, the rules for selecting URL in HTML code have been strengthened. Web resources are scanned to build their web graphs. Storing information by using a list of edges and an adjacency matrix is used in further work with the obtained web graphs. To visualize the obtained graphs and calculate some metric characteristics, the Gephi software environment and the algorithm for stacking the vertices of the Yifan Hu graph has been used. The graph diameters, the average vertex degree, the average path length, the density factor of the graph are used for analysis of the structural connectivity of the graphs studied. The proposed approach can be applied during the site reengineering procedure.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Algorithm for building a website model

Abstract

Talk to us

Similar Papers

More From: Bulletin of V.N. Karazin Kharkiv National University, series «Mathematical modeling. Information technology. Automated control systems»

Lead the way for us

Journal: Bulletin of V.N. Karazin Kharkiv National University, series «Mathematical modeling. Information technology. Automated control systems»	Publication Date: Sep 28, 2020
License type: cc-by

Similar Papers

Design of a recommendation system based on the transition graph
Olga Verba ... Vladyslav Yevlakov
Eastern-European Journal of Enterprise Technologies | VOL. 3
Olga Verba, et. al.Olga Verba ... Vladyslav Yevlakov
29 Jun 2021
Eastern-European Journal of Enterprise Technologies | VOL. 3

Analysis of German National Electricity Grid at Risk of Random Damage - Case Study
Dominik Strzałka ... Piotr Hadaj
-
Dominik Strzałka, et. al.Dominik Strzałka ... Piotr Hadaj
01 Jan 2020
01 Jan 2020

Crawling on web graphs
Alan Frieze ... Colin Cooper
-
Alan Frieze, et. al.Alan Frieze ... Colin Cooper
19 May 2002
19 May 2002

Modelling Selected Parameters of Power Grid Network in the South-Eastern Part of Poland: The Case Study
Piotr Hadaj ... Dominik Strzałka
Energies | VOL. 13
Piotr Hadaj, et. al.Piotr Hadaj ... Dominik Strzałka
03 Jan 2020
Energies | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Algorithm for building a website model

Abstract

Talk to us

Similar Papers

More From: Bulletin of V.N. Karazin Kharkiv National University, series «Mathematical modeling. Information technology. Automated control systems»