Web Annotation Research Articles

This paper presents an attempt to provide a generic named-entity recognition and disambiguation module (NERD) called entity-fishing as a stable online service that demonstrates the possible delivery of sustainable technical services within DARIAH, the European digital research infrastructure for the arts and humanities. Deployed as part of the national infrastructure Huma-Num in France, this service provides an efficient state-of-the-art implementation coupled with standardised interfaces allowing an easy deployment on a variety of potential digital humanities contexts. Initially developed in the context of the FP9 EU project CENDARI, the software was well received by the user community and continued to be further developed within the H2020 HIRMEOS project where several open access publishers have integrated the service to their collections of published monographs as a means to enhance retrieval and access. entity-fishing implements entity extraction as well as disambiguation against Wikipedia and Wikidata entries. The service is accessible through a REST API which allows easier and seamless integration, language independent and stable convention and a widely used service-oriented architecture (SOA) design. Input and output data are carried out over a query data model with a defined structure providing flexibility to support the processing of partially annotated text or the repartition of text over several queries. The interface implements a variety of functionalities, like language recognition, sentence segmentation and modules for accessing and looking up concepts in the knowledge base. The API itself integrates more advanced contextual parametrisation or ranked outputs, allowing for the resilient integration in various possible use cases. The entity-fishing API has been used as a concrete use case to draft the experimental stand-off proposal, which has been submitted for integration into the TEI guidelines. The representation is also compliant with the Web Annotation Data Model (WADM). In this paper we aim at describing the functionalities of the service as a reference contribution to the subject of web-based NERD services. In this paper, we detail the workflow from input to output and unpack each building box in the processing flow. Besides, with a more academic approach, we provide a transversal schema of the different components taking into account non-functional requirements in order to facilitate the discovery of bottlenecks, hotspots and weaknesses. We also describe the underlying knowledge base, which is set up on the basis of Wikipedia and Wikidata content. We conclude the paper by presenting our solution for the service deployment: how and which the resources where allocated. The service has been in production since Q3 of 2017, and extensively used by the H2020 HIRMEOS partners during the integration with the publishing platforms.

Read full abstract

The taxonomic literature is one of the largest resources of information on biodiversity, both current and in the past. Unlike many scientific disciplines this literature remains perpetually relevant as successive taxonomic work builds upon those earlier foundations. Projects such as the Biodiversity Heritage Library (BHL) have greatly increased access to that literature, as have numerous independent digitisation efforts by museums, herbaria, and publishers. But the focus of this access has been human readers, with limited use of text mining tools, mostly focussed on extracting taxonomic names. This talk explores other kinds of data that can be extracted from text on BHL and elsewhere, focusing on taxonomic names, geographic localities and specimen codes in the context of the BioStor project (https://biostor.org, Page 2011). The problem of finding taxonomic names in text has been well studied (e.g., Akella et al. 2012), and new BHL content is continuously indexed by names. Despite this, there is only weak linkage between taxonomic name databases and BHL. Even projects that create these links (e.g., BioNames, Page 2013) do not enable links in the reverse direction. In other words, a BHL reader is unaware whether the appearance of a name on a page is the first publication of that name, nor are they told of the fate of a name in subsequent research. The absence of these links reduces the value of BHL to working taxonomists. In addition to taxonomic names, a typical taxonomic paper often contains specimen codes. Extracting these from text and linking them to digital representations, such as occurrence records in GBIF, opens up the possibility to provide detailed provenance for occurrence data, as well as citation-based metrics for the utility of natural history collections. Taxonomic papers are also often rich in geographic information. A simple method for extracting locality information from text is to search for latitude and longitude coordinates, and BioStor currently does this. To date some 83,000 individual point localities have been extracted (Fig. 1 ). These are used to provide a simple geographic search interface in BioStor, and are also harvested by JournalMap (Karl et al. 2013). But these localities are not linked to the original location in the source text, nor are they linked to any associated specimens, so they cannot be interpreted as occurrences that could be harvested by GBIF. If the goal is to contribute to GBIF then we need tools that can parse locality information and link that to associated specimens. A general framework for handling data on taxonomic names, specimens, and geographic localities in text is to treat them as annotations (Batista-Navarro et al. 2017). By modelling annotations using the Web Annotation Data Model (https://www.w3.org/TR/annotation-model/ ) we can incorporate these annotations into biodiversity knowledge graphs (Page 2016). We can also combine these annotations with new standards for describing digitised content, such as the International Image Interoperability Framework (IIIF, https://iiif.io). The implications of this approach for developing new interfaces to the biodiversity literature will be discussed.

Read full abstract

Web Annotation Research Articles

Related Topics

Articles published on Web Annotation

Smart Visualization for Online Aids Image Retrieval

PANNZER-A practical tool for protein function prediction.

Entity-fishing: A DARIAH Entity Recognition and Disambiguation Service

Assessment of Annotation Needs of Botanists

Semantic Annotation of Web of Things Using Entity Linking

Reading and connecting: using social annotation in online classes

Blast2Fish: a reference-based annotation web tool for transcriptome analysis of non-model teleost fish

Things2Vec: Semantic Modeling in the Internet of Things With Graph Representation Learning

The role of social annotation in facilitating collaborative inquiry-based learning

Civic Writing on Digital Walls

Improving spaCy dependency annotation and PoS tagging web service using independent NER services.

Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm

Text-mining BHL: towards new interfaces to the biodiversity literature

Open Web annotation as collaborative learning

Upgrading security and protection in ear biometrics

Integrating an Ontology of Radiology Differential Diagnosis with ICD-10-CM, RadLex, and SNOMED CT.

Semantic Web Annotation using Deep Learning with Arabic Morphology

Configurable web-services for biomedical document annotation

SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes

OpenPVSignal: Advancing Information Search, Sharing and Reuse on Pharmacovigilance Signals via FAIR Principles and Semantic Web Technologies.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Web Annotation Research Articles

Related Topics

Articles published on Web Annotation

Smart Visualization for Online Aids Image Retrieval

PANNZER-A practical tool for protein function prediction.

Entity-fishing: A DARIAH Entity Recognition and Disambiguation Service

Assessment of Annotation Needs of Botanists

Semantic Annotation of Web of Things Using Entity Linking

Reading and connecting: using social annotation in online classes

Blast2Fish: a reference-based annotation web tool for transcriptome analysis of non-model teleost fish

Things2Vec: Semantic Modeling in the Internet of Things With Graph Representation Learning

The role of social annotation in facilitating collaborative inquiry-based learning

Civic Writing on Digital Walls

Improving spaCy dependency annotation and PoS tagging web service using independent NER services.

Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm

Text-mining BHL: towards new interfaces to the biodiversity literature

Open Web annotation as collaborative learning

Upgrading security and protection in ear biometrics

Integrating an Ontology of Radiology Differential Diagnosis with ICD-10-CM, RadLex, and SNOMED CT.

Semantic Web Annotation using Deep Learning with Arabic Morphology

Configurable web-services for biomedical document annotation

SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes

OpenPVSignal: Advancing Information Search, Sharing and Reuse on Pharmacovigilance Signals via FAIR Principles and Semantic Web Technologies.