Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries

Antonio Rico-Sulayes

doi:10.19053/01211129.3161

Antonio Rico-Sulayes

Open Access

https://doi.org/10.19053/01211129.3161

Copy DOI

Abstract

<p align="justify">This article proposes the architecture for a system that uses previously learned weights to sort query results from unstructured data bases when building specialized dictionaries. A common resource in the construction of dictionaries, unstructured data bases have been especially useful in providing information about lexical items frequencies and examples in use. However, when building specialized dictionaries, whose selection of lexical items does not rely on frequency, the use of these data bases gets restricted to a simple provider of examples. Even in this task, the information unstructured data bases provide may not be very useful when looking for specialized uses of lexical items with various meanings and very long lists of results. In the face of this problem, long lists of hits can be rescored based on a supervised learning model that relies on previously helpful results. The allocation of a vast set of high quality training data for this rescoring system is reported here. Finally, the architecture of sucha system,an unprecedented tool in specialized lexicography, is proposed.</p>

Highlights

The final goal of this article is describing a route to build a system that reorganizes the results given by unstructured data bases using information about previously helpful hits
The new materials here collected will have a two-fold contribution, as they will be used to train a supervised rescoring system that improves the subsequent interaction with unstructured data bases
If the resulting system is successful in improving the search of new lexical items in unstructured data bases, it would be an unprecedented tool and a strong contribution to specialized dictionary making

Summary

Introduction

The final goal of this article is describing a route to build a system that reorganizes the results given by unstructured data bases using information about previously helpful hits. The context where such a system is being proposed is the construction of a dictionary, of a substandard language dictionary. Given the diverse situations where substandard language is used, the use of frequencies or other simple distributional information is not very helpful to identify and work with this kind of vocabulary in large unstructured data bases. The new materials here collected will have a two-fold contribution, as they will be used to train a supervised rescoring system that improves the subsequent interaction with unstructured data bases. This article describes a proposal to build such a system, which has the potential to become a strong contribution to specialized dictionary making

Unstructured data bases and dictionary making

Applications of unstructured data bases in lexicography

Collecting data for a supervised rescoring system

Extraction and validation of secondary data in a dictionary project

New training data from unstructured data bases

A system architecture proposal to improve specialized dictionary building

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: REVISTA FACULTAD DE INGENIERÍA	Publication Date: Dec 28, 2014
Citations: 15	License type: cc-by

R Discovery Prime

R Discovery Prime

Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: REVISTA FACULTAD DE INGENIERÍA

Lead the way for us

Similar Papers

Using theSurviveprinciple for deriving coordinate (a)symmetries
John R Te Velde
-
John R Te VeldeJohn R Te Velde
01 Jan 2009
01 Jan 2009

Style in the Print Media: Perspectives from the Editorials of a Ghanaian Newspaper
Amma Abrafi Adjei
Namibian Journal for Research, Science and Technology | VOL. 2
Amma Abrafi AdjeiAmma Abrafi Adjei
11 Dec 2020
Namibian Journal for Research, Science and Technology | VOL. 2

Options and Functions in the English Clause (1969)

-

01 Jan 2004
Options and Functions in the English Clause (1969)

Phonological Variation and Lexical Frequency

-

01 Feb 2008
01 Feb 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards a supervised rescoring system for unstructured data bases used to build specialized dictionaries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: REVISTA FACULTAD DE INGENIERÍA