Abstract

The subject of research in the article is machine learning models for classifying web-pages by quality and compliance with SEO rules. The goal of the article is improving the efficiency of search engines by establishing and using factors that have the greatest impact on the degree of SEO optimization of web pages. The article solves the following tasks: study of the effectiveness of using machine learning methods to build a classification model that automatically classifies web pages according to the degree of adaptation to SEO optimization recommendations; assessment of the influence of relevant page factors (text on a web page, text in meta tags, links, image, HTML code) on the degree of SEO optimization using the developed classification models. The following methods are used: machine learning methods, classification methods and statistical methods. The following results were obtained: analysis of the effectiveness of the application of machine learning methods to determine the degree of adaptation of a web page to SEO recommendations was carried out; classifiers were trained on a data set of web pages randomly selected from the DMOZ catalog and rated by three independent SEO experts in the categories: “low SEO”, “medium SEO” and “high SEO”; five main classifiers were tested (decision trees, naive Bayes, logistic regression, KNN and SVM), on the basis of which it was revealed that all the studied models received greater accuracy (from 54.69% to 69.67%) than the accuracy of the baseline (48.83%); the results of the experiments confirm the hypothesis about the effectiveness of adapting web pages to SEO recommendations using classification algorithms based on machine learning. Conclusions. It was confirmed that with the help of classification algorithms built on the basis of machine learning and the knowledge of experts, it is possible to effectively adjust web pages to SEO recommendations. The considered methods can be adapted for various search engines and applicable to different languages, provided that a stamping or lemmatization algorithm has been developed for them. The results of the study can be used in the development of automated software to support the work of SEO in audit technologies to identify web pages in need of optimization and in spam detection processes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call