Abstract

When scientists are searching for information, they generally have a precise objective in mind. Instead of looking for documents “about a topic T”, they try to answer specific questions such as finding the definition of a concept, finding results for a particular problem, checking whether an idea has already been tested, or comparing the scientific conclusions of two articles. Answering these precise or complex queries on a corpus of scientific documents requires precise modelling of the full content of the documents. In particular, each document element must be characterised by its discourse type (hypothesis, definition, result, method, etc.). In this paper, we present a scientific document model (SciAnnotDoc ontology), developed from an empirical study conducted with scientists, that models the discourse types. We developed an automated process that analyses documents effectively identifying the discourse types of each element. Using syntactic rules (patterns), we evaluated the process output in terms of precision and recall using a previously annotated corpus in Gender Studies. We chose to annotate documents in Humanities, as these documents are well known to be less formalised than those in “hard science”. The process output has been used to create a SciAnnotDoc representation of the corpus on top of which we built a faceted search interface. Experiments with users show that searches using with this interface clearly outperform standard keyword searches for precise or complex queries.

Highlights

  • In their work, scientists need to gather very specific information with different goals and tasks in mind

  • We computed the average responses for the three tasks, and we tested the difference between the participants who had to evaluate the FSAD versus the keyword search, using an analysis of the variance (Anova) tests

  • With the growth of publications in scientific domains, scientists need more precise information retrieval systems that are able to search by metadata, but which allow users to create complex queries such as “retrieve all the findings that women have a tendency to drop their academic career after their first child more than men, using qualitative and quantitative methodologies”

Read more

Summary

Introduction

Scientists need to gather very specific information with different goals and tasks in mind. When reading documents with these purposes in mind, they need to be exhaustive (they want to retrieve all the new findings in their domain, or all the definitions for this concept), but they are not necessarily interested in reading other parts of the documents (background, hypothesis, etc.). They face many challenges to accomplish these kinds of tasks. Systems supporting electronic scientific publications have introduced some improvements compared with simple searches of printed edition indices, but the potential benefits of electronic publishing, for example, systems that could find a bit of information, or summarise scientific documents, or combine them, have not been realised yet

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call