Transformer-Based Composite Language Models for Text Evaluation and Classification

Mihailo Škorić,Ranka Stanković,Miloš Utvić

doi:10.3390/math11224660

Abstract

Parallel natural language processing systems were previously successfully tested on the tasks of part-of-speech tagging and authorship attribution through mini-language modeling, for which they achieved significantly better results than independent methods in the cases of seven European languages. The aim of this paper is to present the advantages of using composite language models in the processing and evaluation of texts written in arbitrary highly inflective and morphology-rich natural language, particularly Serbian. A perplexity-based dataset, the main asset for the methodology assessment, was created using a series of generative pre-trained transformers trained on different representations of the Serbian language corpus and a set of sentences classified into three groups (expert translations, corrupted translations, and machine translations). The paper describes a comparative analysis of calculated perplexities in order to measure the classification capability of different models on two binary classification tasks. In the course of the experiment, we tested three standalone language models (baseline) and two composite language models (which are based on perplexities outputted by all three standalone models). The presented results single out a complex stacked classifier using a multitude of features extracted from perplexity vectors as the optimal architecture of composite language models for both tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Transformer-Based Composite Language Models for Text Evaluation and Classification

Abstract

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Journal: Mathematics	Publication Date: Nov 16, 2023
License type: CC BY 4.0

Similar Papers

Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature
Julien Knafou ... Hira Imeri
Systematic Reviews | VOL. 12
Julien Knafou, et. al.Julien Knafou ... Hira Imeri
05 Jun 2023
Systematic Reviews | VOL. 12

Scalar dissipation rate based multi-zone model for early-injected and conventional diesel engine combustion
Bernhard Jochim ... Heinz Pitsch
Combustion and Flame | VOL. 175
Bernhard Jochim, et. al.Bernhard Jochim ... Heinz Pitsch
10 Sep 2016
Combustion and Flame | VOL. 175

Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars
Tong Xiao ... Jingbo Zhu
ACM Transactions on Asian Language Information Processing | VOL. 10
Tong Xiao, et. al.Tong Xiao ... Jingbo Zhu
01 Dec 2011
ACM Transactions on Asian Language Information Processing | VOL. 10

Strategic Research, Innovation and Implementation Agenda for Digital Language Equality in Europe by 2030
Georg Rehm ... Andy Way
-
Georg Rehm, et. al.Georg Rehm ... Andy Way
01 Jan 2023
Strategic Research, Innovation and Implementation Agenda for Digital Language Equality in Europe by 2030
Georg Rehm ... Andy Way

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transformer-Based Composite Language Models for Text Evaluation and Classification

Abstract

Talk to us

Similar Papers

More From: Mathematics