Cue Phrases Research Articles

Much recent effort has been devoted to creating large-scale language models. Nowadays, the most prominent approaches are based on deep neural networks, such as BERT. However, they lack transparency and interpretability, and are often seen as black boxes. This affects not only their applicability in downstream tasks but also the comparability of different architectures or even of the same model trained using different corpora or hyperparameters. In this paper, we propose a set of intrinsic evaluation tasks that inspect the linguistic information encoded in models developed for Brazilian Portuguese. These tasks are designed to evaluate how different language models generalise information related to grammatical structures and multiword expressions (MWEs), thus allowing for an assessment of whether the model has learned different linguistic phenomena. The dataset that was developed for these tasks is composed of a series of sentences with a single masked word and a cue phrase that helps in narrowing down the context. This dataset is divided into MWEs and grammatical structures, and the latter is subdivided into 6 tasks: impersonal verbs, subject agreement, verb agreement, nominal agreement, passive and connectors. The subset for MWEs was used to test BERTimbau Large, BERTimbau Base and mBERT. For the grammatical structures, we used only BERTimbau Large, because it yielded the best results in the MWE task. In both cases, we evaluated the results considering the best candidates and the top ten candidates. The evaluation was done both automatically (for MWEs) and manually (for grammatical structures). The results obtained for MWEs show that BERTimbau Large surpassed both the other models in predicting the correct masked element. However, the average accuracy of the best model was only 52% when only the best candidates were considered for each sentence, going up to 66% when the top ten candidates were taken into account. As for the grammatical tasks, the results presented better prediction, but also varied depending on the type of morphosyntactic agreement. On the one hand, cases such as connectors and impersonal verbs, which do not require any agreement in the produced candidates, had precision of 100% and 98.78% among the best candidates. On the other hand, tasks that require morphosyntactic agreement had results consistently below 90% overall precision, with the lowest scores being reported for nominal agreement and verb agreement, both having scores below 80% in overall precision among the best candidates. Therefore, we identified that a critical and widely adopted resource for Brazilian Portuguese NLP presents issues concerning MWE vocabulary and morphosyntactic agreement, even if it is prolific in most cases. These models are a core component in many NLP systems, and our findings demonstrate the need of additional improvements in these models and the importance of widely evaluating computational representations of language.

Read full abstract

Discourse Markers are one of an uninvestigated aspect of language in old and modern Kurdish linguistics, that has not been given due attention, neither by native nor non-native researchers. On this ground, it is hoped that the present study sheds light on this almost entirely ignored aspect of the language and this study is meant to be a systematic treatment of this group of lexical items known as Discourse Markers (henceforth, DMs), more specifically one category of them; Adversative DMs.   DMs are words, phrases and even clauses that enhance discourse coherence and are found in all languages, as tapped on by researches and investigations. Numerous terminologies are utilized to refer to such group of markers by different researchers in English and other languages, such as &lsquo;Discourse Particles, Cue Phrases, Small Words, Pragmatic Markers, Discourse Connectives&hellip; and even they are defined differently.   It is postulated that DMs are meaningless and lay outside the domain of sentence structure. Likewise, lexical expressions that have different grammatical functions such as &lsquo;and, also, but, or, simultaneously, at the same moment &hellip;etc, can also function as DMs to connect the previous utterance with the upcoming discourse segment.   The current investigation endeavors to answer certain specific questions: first, the extents to which DMs are operated in literary texts; second, discourse functions DMs implement. Thirdly, the word categories DMs are derived from, and to which extent Halliday and Hassan (1976)&rsquo;s framework is applicable to Kurdish DMs?   For achieving the aims, the researchers analyzed one of the contemporary novels of a famous novelist entitled &lsquo;Xezlen&ucirc;s w B&acirc;xek&acirc;ni Xej&acirc;ł&rdquo;. By applying Halliday and Hasan&rsquo;s (1976) framework and also by taking insights from Fraser (2009), DMs are categorized into different classes. One of which is Adversative DMs, which are the concern of the present study.  For obtaining the frequency of each marker, the data are scrutinized manually, since there are no corpus analysis tools that can facilitate such measurements.  The study concludes that Adversative DMs are frequently used in selected Kurdish literary texts and that they are similar to those found in English in terms of derived grammatical categories, taxonomy, and they have different characteristics in terms of form, position and discourse functions. Withal, it has been arrived that Adversative DMs are of different kinds analogous to those investigated in English by Halliday and Hassan (1976).

Read full abstract

Cue Phrases Research Articles

Related Topics

Articles published on Cue Phrases

Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese

Signalling conditional relations

Investigating disagreement in the scientific literature.

Phrase Embedding and Clustering for Sub-Feature Extraction From Online Data

Why-Type Question to Query Reformulation for Efficient Document Retrieval

Predictions of Citations of a Scholarly Paper

KGGCN: Knowledge-Guided Graph Convolutional Networks for Distantly Supervised Relation Extraction

Creating a Disaster Chain Diagram from Japanese Newspaper Articles Using Mechanical Methods

Signaling of Causal Relations in Spanish: Variety, Functionality, and Specificity

Contributions of Voice Expectations to Talker Selection in Younger and Older Adults With Normal Hearing.

Evaluation of Content Compaction in Assamese Language

An attention-based neural framework for uncertainty identification on social media texts

The congruent, the incongruent, and the unexpected: Event-related potentials unveil the processes involved in schematic encoding

Subjectivity in Spanish causal connectives

Causal relation extraction and network construction of web events

Causal relation extraction and network construction of web events

The linguistic marking of coherence relations

Adversative Discourse Markers in Kurdish Literary Texts

Smart Enough to Talk With Us? Foundations and Challenges for Dialogue Capable AI Systems

Cognitive complexity and the linguistic marking of coherence relations: A parallel corpus study

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Cue Phrases Research Articles

Related Topics

Articles published on Cue Phrases

Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese

Signalling conditional relations

Investigating disagreement in the scientific literature.

Phrase Embedding and Clustering for Sub-Feature Extraction From Online Data

Why-Type Question to Query Reformulation for Efficient Document Retrieval

Predictions of Citations of a Scholarly Paper

KGGCN: Knowledge-Guided Graph Convolutional Networks for Distantly Supervised Relation Extraction

Creating a Disaster Chain Diagram from Japanese Newspaper Articles Using Mechanical Methods

Signaling of Causal Relations in Spanish: Variety, Functionality, and Specificity

Contributions of Voice Expectations to Talker Selection in Younger and Older Adults With Normal Hearing.

Evaluation of Content Compaction in Assamese Language

An attention-based neural framework for uncertainty identification on social media texts

The congruent, the incongruent, and the unexpected: Event-related potentials unveil the processes involved in schematic encoding

Subjectivity in Spanish causal connectives

Causal relation extraction and network construction of web events

Causal relation extraction and network construction of web events

The linguistic marking of coherence relations

Adversative Discourse Markers in Kurdish Literary Texts

Smart Enough to Talk With Us? Foundations and Challenges for Dialogue Capable AI Systems

Cognitive complexity and the linguistic marking of coherence relations: A parallel corpus study