BCS-IRSG Workshop on Corpus Profiling - Index

Anne De Roeck,Dawei Song,Udo Kruschwitz

doi:10.14236/ewic/irsg2008.0

Abstract

We aim to bring together people from different research communities interested in exploring how corpus characteristics affect the behaviour of techniques in information retrieval and natural language processing, and to set out a roadmap for a shared research agenda. It is well known in NLP and IR that the effectiveness of a technique depends on both the data on which it is deployed and its match with the task at hand. In 1973, Spärck-Jones attributed differing degrees of success at automatic classification to differences in dataset characteristics. Since Croft and Harper (1979), IR performance has repeatedly been related to collection size and other features, though no upper bound has been found. The importance of data and task dependencies has been highlighted in IR, anaphora resolution, automatic summarization and recently, in word sense disambiguation. Many web/enterprise web retrieval systems rely on URL properties, link graph properties, click streams, and so on, with performance dependent on the degree to which this evidence is present and meaningful in a particular corpus. This conference was sponsored by BCS IRSG The Workshop on Corpus Profiling for Information Retrieval and Natural Language Processing took place in London, in October 2008, in conjunction with IIiX2008. Our aim was to bring together people from different research communities interested in exploring how specific properties of a corpus or collection affect the behaviour of techniques in Information Retrieval (IR) and Natural Language Processing (NLP), and to start mapping out a shared research agenda. These eWiCs Proceedings capture the final versions of papers presented at the workshop.

Highlights

It is well known in Natural Language Processing (NLP) and Information Retrieval (IR) that the effectiveness of a technique depends on both the data on which it is deployed and its match with the task at hand
We aim to bring together people from different research communities interested in exploring how corpus characteristics affect the behaviour of techniques in information retrieval and natural language processing, and to set out a roadmap for a shared research agenda
It is well known in NLP and IR that the effectiveness of a technique depends on both the data on which it is deployed and its match with the task at hand

Summary

Introduction

It is well known in NLP and IR that the effectiveness of a technique depends on both the data on which it is deployed and its match with the task at hand. We aim to bring together people from different research communities interested in exploring how corpus characteristics affect the behaviour of techniques in information retrieval and natural language processing, and to set out a roadmap for a shared research agenda.

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BCS-IRSG Workshop on Corpus Profiling - Index

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2008
Citations: 1	License type: CC BY 4.0

Similar Papers

Introduction to the special issue on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL)
Philipp Mayr ... Muthu Kumar Chandrasekaran
International Journal on Digital Libraries | VOL. 19
Philipp Mayr, et. al.Philipp Mayr ... Muthu Kumar Chandrasekaran
09 Nov 2017
International Journal on Digital Libraries | VOL. 19

Graph-Based Natural Language Processing and Information Retrieval Rada Mihalcea and Dragomir Radev (University of North Texas and University of Michigan) Cambridge, UK: Cambridge University Press, 2011, viii+192 pp; hardbound, ISBN 978-0-521-89613-9, $65.00
Chris Biemann
Computational Linguistics | VOL. 38
Chris BiemannChris Biemann
01 Mar 2012
Computational Linguistics | VOL. 38

Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2018)
Muthu Kumar Chandrasekaran ... Kokil Jaidka
-
Muthu Kumar Chandrasekaran, et. al.Muthu Kumar Chandrasekaran ... Kokil Jaidka
27 Jun 2018
27 Jun 2018

Proceedings of the ACL-2000 workshop on Recent advances in natural language processing and information retrieval held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics -
-
-
--
01 Jan 1999
01 Jan 1999

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BCS-IRSG Workshop on Corpus Profiling - Index

Abstract

Highlights

Summary

Talk to us

Similar Papers