Abstract

A common problem for TEI software development is that projects develop their own custom software stack to address the semantic intricacies present in a deeply-encoded TEI corpus. This article describes the design of version 4 of the PhiloLogic corpus query engine, which is designed to handle heterogeneous TEI encoding through its redesigned abstract data model. We show that such an architecture has substantial benefits for software reuse, allowing for powerful TEI applications to be adapted to new corpora with a minimum of custom programming, and we discuss the more general and theoretical implications of abstraction as a TEI processing technique.

Highlights

  • By providing an endto-end TEI processing library, we allow developers to use the same abstract data model at all stages in the application and permit formatting software to work on a wide variety of TEI constructs and schemata

  • This enables developers to leverage their knowledge of their own corpora and the TEI Guidelines rather than delving into the minutiae of web frameworks and relational databases

  • We currently don't include any such modules in the standard PhiloLogic distribution, we are greatly encouraged by the emergence of complementary standards such as the W3C Open Annotation Data Model (W3C 2013), ISO MAF (ISO 2012), and LMF (ISO 2008), among others, and look forward to providing comprehensive support for annotation features in a future release

Read more

Summary

Data Model

1. Document: jointly describes the contents of the TEI Header and the element itself. 2. Div-like: includes the component and division model classes, or other large container units. 3. Paragraph/Chunk: typically includes , , or similar elements. 4. Sentence/Phrase: by default, sentence objects are generated by the tokenizer. 6. Parallel objects: spans of text between milestone elements forming a "parallel hierarchy" of pages, lines, or other components

Parser Design
Query Engine
Client API
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.