Abstract

The goal of this chapter is to outline the main stages in multimodal data management, starting with the capture of multimodal raw data in instrumented spaces. The capture of multimodal corpora requires complex settings such as instrumented lecture and meeting rooms, containing capture devices for each of the modalities that are intended to be recorded, but also, most challengingly, requiring hardware and software for digitizing and synchronizing the acquired signals. The resolution of the capture devices—mainly cameras and microphones—has a determining influence on the quality of the resulting corpus, with apparently more trivial factors such as the position of these devices in the environment. The number of devices is also important: A larger number provides more information to help define the ground truth for a given annotation dimension. Annotations is the time-dependent information that is abstracted from input signals, and which includes low-level mono or multimodal features, as well as higher-level phenomena, abstracted or not from the low-level features. Conversely metadata provides the static information about an entire unit of data capture, which is not involved in a time-dependent relation to its content, i.e., which is generally constant for the entire unit.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call