Natural Text Research Articles

In the context of the development of text generation technologies, the opposition “naturalness − unnaturalness of text” has been transformed into a new dichotomy: “naturalness – artificiality”. The aim of this article is to investigate the phenomenon of naturalness in this context from two perspectives: analyzing the linguistic characteristics of a natural text against a generated (artificial) text and systematizing introspective perceptions of Russian native speaker informants as to what a “natural” text should be like and how it should differ from a generated text. The material for the study was a parallel corpus of film reviews in Russian, consisting of two subcorpora: reviews written by people and those generated by a large language model based on prompts, which are the beginnings of reviews, from the first subcorpus. The following methods were applied for the comparative analysis of the two subcorpora: computer-assisted text processing for calculating the values of 130 metrics of text linguistic complexity, psycholinguistic experiment, expert text analysis, contrastive analysis. As a result, it was determined that from the point of view of their own linguistic characteristics, “natural” texts differ from generated texts mainly by greater flexibility of syntactic structure, allowing both omission or reduction of structures and redundancy, as well as by slightly greater lexical variability. Naturalness as a psycholinguistic category is related to the informants’ autostereotypical ideas about the cognitive characteristics of people as a species. The analysis of texts erroneously attributed by informants (generated, labelled as natural and vice versa) showed that a number of characteristics of this autostereotype are overestimated by informants, while others, in general, correlate with the linguistic specificity of texts from the subcorpus of written reviews. In conclusion, we formulate definitions of naturalness as a textual and psycholinguistic category.

Read full abstract

Context:Eye-tracking is an increasingly popular instrument to study how programmers process and comprehend source code. While most studies are conducted in controlled environments with lab-grade hardware, it would be desirable to simplify and scale participation in experiments for users sitting remotely, leveraging home equipment. Objective:This study investigates the possibility of performing eye-tracking studies remotely using open-source algorithms and consumer-grade webcams. It establishes the technology’s current limitations and evaluates the quality of the data collected by it. We conclude by recommending ways forward to address the shortcomings and make remote code-reading studies in support of eye-tracking feasible in the future. Method:We gathered eye-gaze data remotely from 40 participants performing a code reading experiment on a purpose-built web application. The utilized eye-tracker worked client-side and used ridge regression to generate x- and y-coordinates in real-time predicting the participants’ on-screen gaze points without the need to collect and save video footage. We processed and analysed the collected data according to common practices for isolating eye-movement events and deriving metrics used in software engineering eye-tracking studies. In response to the lack of an algorithm explicitly developed for detecting oculomotor fixation events in low-frequency webcam data, we also introduced a dispersion threshold algorithm for that purpose. The quality of the collected data was subsequently assessed to determine the adequacy and validity of the methodology for eye-tracking. Results:The collected data was found to be of varying quality despite extensive calibration and graphical user guidance. We present our results highlighting both the negative and positive observations from which the community hopefully can learn. Both accuracy and precision were low and ultimately deemed insufficient for drawing valid conclusions in a high-precision empirical study. We nonetheless contribute to identifying critical limitations to be addressed in future research. Apart from the overall challenge of vastly diverse equipment, setup, and configuration, we found two main problems with the current webcam eye-tracking technology. The first was the absence of a validated algorithm to isolate fixations in low-frequency data, compromising the assurance of the accuracy of the data derived from it. The second problem was the lack of algorithmic support for head movements when predicting gaze location. Unsupervised participants do not always keep their heads still, even if instructed to do so. Consequently, we frequently observed spatial shifts that corrupted many collected datasets. Three encouraging observations resulted from the study. Even when shifted, gaze points were consistently dispersed in patterns resembling both the shape and size of the stimuli without extreme deviations. We could also distinguish recognizable reading patterns. Linearity was significantly different when participants were reading source code compared to natural text, and we could detect the expected left-to-right and top-to-bottom reading directions for participants reading natural text snippets. Conclusion:The accuracy and precision levels were not sufficient for a word-by-word analysis of code reading but could be adequate for a broader, coarse-grained precision study. Additionally we identified two main issues compromising the collected data validity and contributed a fixation detection algorithm to approach one of these issues. With suitable solutions to the identified issues, remote eye-tracking studies with webcams on code reading could eventually be feasible.

Read full abstract

Natural Text Research Articles

Related Topics

Articles published on Natural Text

Generating Ontology-Learning Training-Data through Verbalization

A Method for Measuring Word Sequence Complexity of Text

Soft set-based MSER end-to-end system for occluded scene text detection, recognition and prediction

Taking relations as known conditions: A tagging based method for relational triple extraction

Features and constituting of musical reality

Sense through time: diachronic word sense annotations for word sense induction and Lexical Semantic Change Detection

Distinct neural pathway and its information flow for blind individual's Braille reading

MODEL IMPLEMENTATION AND COMPREHENSIVE STUDY ON VISUAL AIR TYPING SYSTEMS

Use of digital technologies in making and processing texts of a new nature in modern education: A theoretical review

A vision-language foundation model for the generation of realistic chest X-ray images.

English Prepositions Expressing Temporal Limits: Semantic Features and Functioning

Functions of idioms in English as lingua franca: An appraisal system account

Linguistic steganalysis via multi-task with crossing generative-natural domain

A corpus-based study of live grammatical metaphor in English academic writing

Written vs generated text: “naturalness” as a textual and psycholinguistic category

Understanding and Framing Change in Islamic Law: Potentials and Possible Pitfalls of the Concepts of Canonization and Codification

A Multi-Scale Natural Scene Text Detection Method Based on Attention Feature Extraction and Cascade Feature Fusion.

On current limitations of online eye-tracking to study the visual processing of source code

METHODS OF TEACHING ENGLISH USING NATIONAL RESOURCES IN EXTRACURRICULAR ACTIVITIES

Identifying Main Themes in Diabetes Management Interviews Using Natural Language Processing-Based Text Mining.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Natural Text Research Articles

Related Topics

Articles published on Natural Text

Generating Ontology-Learning Training-Data through Verbalization

A Method for Measuring Word Sequence Complexity of Text

Soft set-based MSER end-to-end system for occluded scene text detection, recognition and prediction

Taking relations as known conditions: A tagging based method for relational triple extraction

Features and constituting of musical reality

Sense through time: diachronic word sense annotations for word sense induction and Lexical Semantic Change Detection

Distinct neural pathway and its information flow for blind individual's Braille reading

MODEL IMPLEMENTATION AND COMPREHENSIVE STUDY ON VISUAL AIR TYPING SYSTEMS

Use of digital technologies in making and processing texts of a new nature in modern education: A theoretical review

A vision-language foundation model for the generation of realistic chest X-ray images.

English Prepositions Expressing Temporal Limits: Semantic Features and Functioning

Functions of idioms in English as lingua franca: An appraisal system account

Linguistic steganalysis via multi-task with crossing generative-natural domain

A corpus-based study of live grammatical metaphor in English academic writing

Written vs generated text: “naturalness” as a textual and psycholinguistic category

Understanding and Framing Change in Islamic Law: Potentials and Possible Pitfalls of the Concepts of Canonization and Codification

A Multi-Scale Natural Scene Text Detection Method Based on Attention Feature Extraction and Cascade Feature Fusion.

On current limitations of online eye-tracking to study the visual processing of source code

METHODS OF TEACHING ENGLISH USING NATIONAL RESOURCES IN EXTRACURRICULAR ACTIVITIES

Identifying Main Themes in Diabetes Management Interviews Using Natural Language Processing-Based Text Mining.