Unexpectedness as a Measure of Semantic Learning When Training Transformer Models

R.A Calix,L Javadpour

doi:10.1142/s0218213023500070

Abstract

Many problems in NLP such as language translation and sentiment analysis have shown a lot of improvement in recent years. As simpler language problems are solved or better understood, the focus shifts to more complex problems such as semantic analysis and understanding. Unfortunately, a lot of studies in the literature suffer from a too much specificity problem. The algorithms and datasets are too domain specific. In this study, we analyze and elaborate on this notion of generality. Instead of selecting a highly specialized data set for semantic analysis, we take a generic and possibly dry data set, and we study how a plain vanilla Transformer performs in learning higher level semantic patterns beyond what was obvious or expected. We tune our Transformer model on a classic language task to ensure correct performance. Once tuned, the goal is to select sentences with specific key words and study whether higher level semantic patterns may have been learned by our model. We believe that we obtained promising results. The average BLEU score for sentences less than 25 words is equal to 39.79. Our initial qualitative analysis of possible semantic content of interest shows a 17 percent rate in finding interesting semantic patterns. We provide discussion of data driven results of unexpectedness as a measure of semantic learning.

Full Text