Abstract

This article introduces AI2D-RST, a multimodal corpus of 1000 English-language diagrams that represent topics in primary school natural sciences, such as food webs, life cycles, moon phases and human physiology. The corpus is based on the Allen Institute for Artificial Intelligence Diagrams (AI2D) dataset, a collection of diagrams with crowdsourced descriptions, which was originally developed to support research on automatic diagram understanding and visual question answering. Building on the segmentation of diagram layouts in AI2D, the AI2D-RST corpus presents a new multi-layer annotation schema that provides a rich description of their multimodal structure. Annotated by trained experts, the layers describe (1) the grouping of diagram elements into perceptual units, (2) the connections set up by diagrammatic elements such as arrows and lines, and (3) the discourse relations between diagram elements, which are described using Rhetorical Structure Theory (RST). Each annotation layer in AI2D-RST is represented using a graph. The corpus is freely available for research and teaching.

Highlights

  • Diagrams are a common feature of many everyday media: they can be found everywhere from scientific publications and instruction manuals to newspapers and school textbooks

  • The results show that annotators consistently agree on common Rhetorical Structure Theory (RST) relations such as CYCLIC SEQUENCE, which is used to annotate recurring cycles formed by diagram elements, and PREPARATION, which is used to describe the relationship between a title and an entire diagram

  • The development of AI2D-RST revealed various challenges discussed above, we argue that the corpus is still a valuable resource for studying how the diagrammatic mode is used in the domain of primary school natural sciences and beyond

Read more

Summary

Introduction

Diagrams are a common feature of many everyday media: they can be found everywhere from scientific publications and instruction manuals to newspapers and school textbooks. Our approach to multimodality is linguistically-inspired and semiotically-oriented, that is, we seek to systematically describe how expressive resources such as natural language, illustrations, line art, photographs, lines, arrows and layout are combined in diagrams to make and exchange meanings. Wildfeuer et al 2020) Despite their growing influence in various fields of study broadly concerned with human communication, many approaches to multimodality remain without adequate empirical support. The second part of the name, RST, refers to Rhetorical Structure Theory, a theory of discourse structure which we use to describe how diagrams combine multiple expressive resources to fulfil their communicative goals (Mann and Thompson 1988; Taboada and Mann 2006; Hiippala and Orekhova 2018). The AI2D-RST corpus is intended to serve a dual purpose: to support empirical research on the multimodality of diagrams and their computational processing

Developing multimodal resources for diagrams research
Layout segmentation
The AI2D-RST annotation schema
AI2D-RST grouping graph
Grouping
Macro-grouping
Connectivity
Discourse structure
AI2D-RST connectivity graph
AI2D-RST discourse structure graph
Annotators and training
The annotation tool
Measuring the reliability of the annotation
Modelling annotator reliability
On the reliability and reproducibility of the AI2D-RST annotation schema
Exploring the AI2D-RST corpus
Findings
Discussion
Concluding remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call