Getting Along with Relational Databases

Martin Holmes

doi:10.4000/jtei.3874

Abstract

Both relational databases (RDBs) and XML have strengths and weaknesses as data storage and modeling systems. Most researchers working with historical and literary data in the humanities would argue for the superiority of XML, since it allows unlimited nesting, linking, and complexity. Relational database proponents claim superior querying and processing speed, although recent advances in XML languages and tools have eroded that advantage. Nevertheless, RDBs remain popular and are widely used, particularly in the early stages of projects where resources and metadata are being collected, and projects may end up with both an RDB and an XML document collection. Programmers must then integrate these distinct forms of data when building project outputs. This article discusses the Digital Victorian Periodical Poetry (DVPP) project, where metadata on about 15,000 poems from nineteenth-century periodicals is captured in a MySQL database, and periodically exported to create a TEI file for each poem. Many of the poems are then transcribed and encoded. The canonical source of metadata is the RDB, while the canonical source of textual data is the TEI file. Metadata in the TEI files must be periodically updated from the RDB, without disturbing the textual encoding. Changes to the RDB data may result in changes to the id and filename of the related TEI file, so any existing TEI data is migrated to a new file, and the Subversion repository must be appropriately updated. All of this is done with XSLT and Ant.

Full Text