Abstract

Objectives The aim is to compare the efficiency of using the Euclidean and Mahalanobis metrics to solve the problem of determining the category of potential text recipients. The relevance of the task is determined by the need to develop a means of identifying the recipients of electronic documents. This has been complicated with the introduction of age restrictions on the content of Internet webpages and text resources. Moreover, there has been little coverage of this issue in the works of Russian researchers. Method A comparison of the relative efficiencies of using Euclid and Mahalanobis distances was carried out within the framework of the implementation of an intelligent system for text automatic classification based on the age category of their recipients. Results The main approaches to establishing proximity measures of objects represented as sets of classification characteristics are discussed and the choice of Euclidean and Mahalanobis metrics for numerical comparison of classification results is justified. A description of the sample texts and characteristics of category designations are given for a computational experiment. The computational experiment was carried out using texts included in the National Corpus of the Russian language. Conclusion The computational experiment allows the most effective method for solving the problem of determining the age category of potential text recipients to be selected. The results of the experiment showed the possibility of using Euclidean and Mahalanobis metrics for solving text classification problems; the preference for using Mahalanobis metrics for estimating distances by objects represented by correlated features was also confirmed. The presented comparison of the relative efficiencies of Euclid and Mahalanobis distances was carried out within the framework of the implementation of an intelligent system for automatic text classification based on the age category of their recipients.

Highlights

  • The main approaches to establishing proximity measures of objects represented as sets of classification characteristics are discussed and the choice of Euclidean and Mahalanobis metrics for numerical comparison of classification results is justified

  • The computational experiment was carried out using texts included in the National Corpus of the Russian language

  • The results of the experiment showed the possibility of using Euclidean and Mahalanobis metrics for solving text classification problems; the preference for using Mahalanobis metrics for estimating distances by objects represented by correlated features was confirmed

Read more

Summary

Introduction

Objectives The aim is to compare the efficiency of using the Euclidean and Mahalanobis metrics to solve the problem of determining the category of potential text recipients. Method A comparison of the relative efficiencies of using Euclid and Mahalanobis distances was carried out within the framework of the implementation of an intelligent system for text automatic classification based on the age category of their recipients. Conclusion The computational experiment allows the most effective method for solving the problem of determining the age category of potential text recipients to be selected. The presented comparison of the relative efficiencies of Euclid and Mahalanobis distances was carried out within the framework of the implementation of an intelligent system for automatic text classification based on the age category of their recipients.

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.