The traditional way of publishing in PDF makes it difficult to retrospectively convert the legacy literature into data. This presentation will discuss pre-publication tagging as an alternative solution for publishing FAIR (Findable, Accessible, Interoperable, Resuable) biodiversity data. The Metotaxa-Metostem workflow Тhe MetoTaxa project aims to create a new digital production chain for the European Journal of Taxonomy, which enables the pre-publication semantic structuring of text, automatic tagging and semantic enrichment (annotation). The system is based on a single-source publishing model, where the development of an XML file enables technical editors to automatically enrich text and produce multiple digital outputs. This makes it possible to structure generic or domain-specific sections of articles (e.g., Introduction; Material and methods; Taxon names or Мaterial examined). Thanks to the GoldenGate API developed by Plazi, the Text Encoding Intiative (TEI) XML source file is automatically annotated with JATS TaxPub tags: taxon names are labeled and each authorship can be checked via Catalogue of Life, each element of the material examined is parsed thanks to the preformatting of the text (Chester et al. 2019). Also, each bibliographic reference is parsed into Journal Article Tag Suite (JATS) elements (author names, title, journal, etc.), which automatically links references to their in-text citations. Pre-publication tagging will be carried out by the technical editors and then checked by the authors before publication, and will be sent to databases such as Global Biodiversity Information Facility (GBIF) or Biodiversity Literature Repository (BLR) as soon as the article is published. We will also briefly present MetoStem, which offers a technical solution for the digital transformation of monographs, and particularly floras. The tools and methods developed by this project will enable advanced publication of interoperable structured text and data. ARPHA Publishing Platform Launched in 2010 by Pensoft, ARPHA (Penev et al. 2010) is the first ever scholarly publishing platform to support pre-publication semantic tags and enhancements to entities (e.g., taxon treatments, taxon names, sequences) in the JATS TaxPub XML format developed by Plazi, which are then embedded into the HTML version of the article. Having proved advantageous for biodiversity scientists, Pensoft’s pre-publication tagging workflow has since been adopted by over 30 biodiversity journals hosted on ARPHA. The second development stage of ARPHA was marked by the launch of ARPHA Writing Tool (AWT)*1 and Biodiversity Data Journal in 2013. AWT supports import of Darwin Core structured data from GBIF, Barcode of Life Data Systems (BOLD) and Integrated Digitized Biocollections (iDigBio) directly into manuscripts. These are also exported automatically as published material citations to GBIF. AWT also provides several other unique tools encompassed within the ARPHA-BioDiv toolbox (Penev et al. 2017). Currently, AWT is being redeveloped into a standalone, freely accessible installation (AWT 2.0), based on a micro-service architecture. It enables new semantic enhancements during the authoring process, which can be confirmed by the authors before manuscript submission. Such enhancements include the in-text citations context by CiTO ontology; automated tagging of taxon names and linking to their identifiers in authoritative sources; annotator tool; nanopublication module; automated search and import of references; treatment citation module; export/import to/from JATS TaxPub; and internal communication tool for contributors.
Read full abstract