Text Corpus Research Articles

Since the beginning of the COVID-19 pandemic, >1 million studies have been collected within the COVID-19 Open Research Dataset, a corpus of manuscripts created to accelerate research against the disease. Their related abstracts hold a wealth of information that remains largely unexplored and difficult to search due to its unstructured nature. Keyword-based search is the standard approach, which allows users to retrieve the documents of a corpus that contain (all or some of) the words in a target list. This type of search, however, does not provide visual support to the task and is not suited to expressing complex queries or compensating for missing specifications. This study aims to consider small graphs of concepts and exploit them for expressing graph searches over existing COVID-19-related literature, leveraging the increasing use of graphs to represent and query scientific knowledge and providing a user-friendly search and exploration experience. We considered the COVID-19 Open Research Dataset corpus and summarized its content by annotating the publications' abstracts using terms selected from the Unified Medical Language System and the Ontology of Coronavirus Infectious Disease. Then, we built a co-occurrence network that includes all relevant concepts mentioned in the corpus, establishing connections when their mutual information is relevant. A sophisticated graph query engine was built to allow the identification of the best matches of graph queries on the network. It also supports partial matches and suggests potential query completions using shortest paths. We built a large co-occurrence network, consisting of 128,249 entities and 47,198,965 relationships; the GRAPH-SEARCH interface allows users to explore the network by formulating or adapting graph queries; it produces a bibliography of publications, which are globally ranked; and each publication is further associated with the specific parts of the query that it explains, thereby allowing the user to understand each aspect of the matching. Our approach supports the process of query formulation and evidence search upon a large text corpus; it can be reapplied to any scientific domain where documents corpora and curated ontologies are made available.

Read full abstract

Introduction. Digital transformation of education involves the use of digital tools in managing the educational process as a whole and in the learning process in individual disciplines. The specifics of teaching a foreign language require special functionality from digital tools. The purpose of the article is to explore the possibilities of a corpus of student texts in a foreign language for analyzing the results of students’ educational work and forming an adapted learning path within the discipline related to the study of a foreign language. Materials and methods. The material for the study was the Petrozavodsk Annotated Corpus of Texts (PACT), containing written works in German and French written by students from 2019 to 2023. Texts are accompanied by attributive information about the author, writing conditions, evaluation and error corrections. A web application has been developed to work with the corpus. The application includes personal accounts for teachers and students. To analyze the state of the learning process and make subsequent decisions on managing the learning process, the web application contains several tools for visualizing error statistics at different levels of detail with the ability to select texts according to different conditions. Results of the study. As part of the study, patterns were identified that connect the types of errors with the genres of texts, with the severity of the errors, and with the emotional and physiological state of the student. Russian-speaking students studying German as a foreign language make the most mistakes in the choice of lexemes, spelling, punctuation and the place of the verb in a sentence. At the same time, genre differences can be observed to distribute the number of errors by their type. There is an increase in the relative number of gross errors by 1.5 times for senior students compared to the first year. Errors in the choice of lexeme are critical for understanding the content of the text. The severity of the error does not depend on the genre of the text. Conclusion. The results of the study can be used to build corpora of student texts with educational analytics functions in dashboard format. The results obtained can be useful in developing teaching materials for the “Foreign Language” discipline.

Read full abstract

Text Corpus Research Articles

Related Topics

Articles published on Text Corpus

Optimizing large language models in digestive disease: strategies and challenges to improve clinical outcomes.

ASPECTS OF SPORT METAPHORS TRANSFER

Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation.

Developing corpus literacy: A perspective of Latgalian language and cultural studies

The Passive Voice in the Ancient Egyptian Pyramid Texts I

„Protestovali před a v budově řeckého ministerstva financí.“ Krátké nahlédnutí do problematiky zeugmatu

In search of Zion: reconsidering the political category of Zionist utopias

Advancing language models through domain knowledge integration: a comprehensive approach to training, evaluation, and optimization of social scientific neural word embeddings

Abusive Language Detection in Khasi Social Media Comments

Exploring Semanticity for Content and Function Word Distinction in Catalan

Europe's Largest Research Infrastructure for Curated Medical Data Models with Semantic Annotations.

Analysis of Determinant Factors Customer Loyalty Towards Brand in The Telecommunication Industry With The Digitalization Paradigm

Exploring the evolution of research topics during the COVID-19 pandemic

Anotacja socjolingwistyczna w Korpusie dawnych polskich tekstów dramatycznych (1772–1939)

A new inscribed Aramaic potsherd from Tell al-Assara, Jordan

Visualization of educational data in a German-language corpus of student texts

Exploring corpus linguistics via computational tool analysis: key finding review

Semantic prosody, semantic transfer and semantic change

Fast, Simple, and Accurate Time Series Analysis with Large Language Models: An Example of Mean-motion Resonances Identification

Understanding lexico-semantic opposition empty/full in official business texts: Quantitative and qualitative research

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Text Corpus Research Articles

Related Topics

Articles published on Text Corpus

Optimizing large language models in digestive disease: strategies and challenges to improve clinical outcomes.

ASPECTS OF SPORT METAPHORS TRANSFER

Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation.

Developing corpus literacy: A perspective of Latgalian language and cultural studies

The Passive Voice in the Ancient Egyptian Pyramid Texts I

„Protestovali před a v budově řeckého ministerstva financí.“ Krátké nahlédnutí do problematiky zeugmatu

In search of Zion: reconsidering the political category of Zionist utopias

Advancing language models through domain knowledge integration: a comprehensive approach to training, evaluation, and optimization of social scientific neural word embeddings

Abusive Language Detection in Khasi Social Media Comments

Exploring Semanticity for Content and Function Word Distinction in Catalan

Europe's Largest Research Infrastructure for Curated Medical Data Models with Semantic Annotations.

Analysis of Determinant Factors Customer Loyalty Towards Brand in The Telecommunication Industry With The Digitalization Paradigm

Exploring the evolution of research topics during the COVID-19 pandemic

Anotacja socjolingwistyczna w Korpusie dawnych polskich tekstów dramatycznych (1772–1939)

A new inscribed Aramaic potsherd from Tell al-Assara, Jordan

Visualization of educational data in a German-language corpus of student texts

Exploring corpus linguistics via computational tool analysis: key finding review

Semantic prosody, semantic transfer and semantic change

Fast, Simple, and Accurate Time Series Analysis with Large Language Models: An Example of Mean-motion Resonances Identification

Understanding lexico-semantic opposition empty/full in official business texts: Quantitative and qualitative research