Abstract
Over the last decade, discourse relations, also referred to as rhetorical or coherence relations, have been used to improve a range of natural language processing applications. Researchers have devised several theories, including rhetorical structure theory and cross-document structure theory, to examine relations between generic text units in single and multiple documents, respectively. In this paper, we propose a cross-article structure theory (CAST), that extends the benefit of discourse relations to multi-scientific article applications. It is based on the rhetorical structure theory (RST) and the cross-document structure theory (CST). The insight that underpins CAST is to consider both intra-section and cross-section relations. At the outset, these relations are classified based on the structural features of the article (that is, their appearance within each section type) and then the relations between text portions across multiple articles are classified. The practicality of the theory is showcased by solving a problem that consists to identify the types of relations which exist between each pair of sentences in related sections of different articles. A CAST bank was created and the k-nearest neighbors algorithm was used to develop two classifiers based on CAST and CST, respectively. The performance results obtained markedly demonstrate the role of the specific relations to scientific articles in CAST. Other applications of CAST could address the redundancy and readability problems, which represent main issues for different tasks, such as the summarization of multiple articles.
Highlights
As rich and reliable sources of information, scientific articles play an essential role in various fields
In this paper, cross-document structure theory (CST) and rhetorical structure theory (RST) are exploited to expand the use of discourse relations with multiple scientific articles, as a result of which a cross-article structure theory (CAST) is proposed
Both intra-section and cross-section relations are considered in CAST
Summary
As rich and reliable sources of information, scientific articles play an essential role in various fields. To improve the information retrieval process and to aid in promoting high-quality, efficient, and effective research, it is worthwhile to understand how text portions within or between articles relate to one another. These relations could be used in weighting sentences or even articles by classifying text into important and less-important text. The purpose of a research article is to report on original work, whether theoretical or empirical. Such articles are regularly produced in academic fields, such as the natural or social sciences. Other readers may be seeking specific information about the methodological aspects of the study; in which case, they can turn immediately to the corresponding section of the scientific article
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have