A Large Visual, Qualitative, and Quantitative Dataset for Web Intelligence Applications

Christian Mejia-Escobar,Ester Martinez-Martin,Miguel Cazorla

doi:10.1155/2023/1094823

Christian Mejia-Escobar, Ester Martinez-Martin + Show 1 more

Open Access

https://doi.org/10.1155/2023/1094823

Copy DOI

Abstract

The Web is the communication platform and source of information par excellence. The volume and complexity of its content have grown enormously, with organizing, retrieving, and cleaning Web information becoming a challenge for traditional techniques. Web intelligence is a novel research area to improve Web‐based services and applications using artificial intelligence and automatic learning algorithms, for which a large amount of Web‐related data are essential. Current datasets are, however, limited and do not combine visual representation and attributes of Web pages. Our work provides a large dataset of 49,438 Web pages, composed of webshots, along with qualitative and quantitative attributes. This dataset covers all the countries in the world and a wide range of topics, such as art, entertainment, economics, business, education, government, news, media, science, and the environment, addressing different cultural characteristics and varied design preferences. We use this dataset to develop three Web Intelligence applications: knowledge extraction on Web design using statistical analysis, recognition of error Web pages using a customized convolutional neural network (CNN) to eliminate invalid pages, and Web categorization based solely on screenshots using a CNN with transfer learning to assist search engines, indexers, and Web directories.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computational Intelligence and Neuroscience	Publication Date: Jan 1, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Large Visual, Qualitative, and Quantitative Dataset for Web Intelligence Applications

Abstract

Talk to us

Similar Papers

More From: Computational Intelligence and Neuroscience

Lead the way for us

Similar Papers

Comparing pre-trained models for efficient leaf disease detection: a study on custom CNN
Touhidul Seyam Alam ... Abhijit Pathak
Journal of Electrical Systems and Information Technology | VOL. 11
Touhidul Seyam Alam, et. al.Touhidul Seyam Alam ... Abhijit Pathak
23 Feb 2024
Journal of Electrical Systems and Information Technology | VOL. 11

One-dimensional convolutional neural network and hybrid deep-learning paradigm for classification of specific language impaired children using their speech
Yogesh Sharma ... Bikesh Kumar Singh
Computer Methods and Programs in Biomedicine | VOL. 213
Yogesh Sharma, et. al.Yogesh Sharma ... Bikesh Kumar Singh
22 Oct 2021
Computer Methods and Programs in Biomedicine | VOL. 213

Tablext: A Combined Neural Network and Heuristic Based Table Extractor
Zach Colter ... Morteza Fayazi
SSRN Electronic Journal | VOL. -
Zach Colter, et. al.Zach Colter ... Morteza Fayazi
01 Jan 2021
SSRN Electronic Journal | VOL. -

Tablext: A combined neural network and heuristic based table extractor
Zach Colter ... Ronald Dreslinski
Array | VOL. 15
Zach Colter, et. al.Zach Colter ... Ronald Dreslinski
01 Sep 2022
Array | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Large Visual, Qualitative, and Quantitative Dataset for Web Intelligence Applications

Abstract

Talk to us

Similar Papers

More From: Computational Intelligence and Neuroscience