Abstract
We present LINSPECTOR WEB , an open source multilingual inspector to analyze word representations. Our system provides researchers working in low-resource settings with an easily accessible web based probing tool to gain quick insights into their word embeddings especially outside of the English language. To do this we employ 16 simple linguistic probing tasks such as gender, case marking, and tense for a diverse set of 28 languages. We support probing of static word embeddings along with pretrained AllenNLP models that are commonly used for NLP downstream tasks such as named entity recognition, natural language inference and dependency parsing. The results are visualized in a polar chart and also provided as a table. LINSPECTOR WEB is available as an offline tool or at https://linspector.ukp.informatik.tu-darmstadt.de.
Highlights
Natural language processing (NLP) has seen great progress after the introduction of continuous, dense, low dimensional vectors to represent text
Datasets for either of those tasks do not exist for many languages, word similarity tests do not necessarily correlate well with downstream tasks and evaluating embeddings on downstream tasks can be too computationally demanding for low-resource scenarios
BiaffineDependencyParser and CrfTagger are highlighted as the default choice for dependency parsing and named entity recognition by (Gardner et al, 2018), while ESIM was picked as one of two available natural language inference models, and SimpleTagger support was added as the entry level AllenNLP classifier to solve tasks like partsof-speech tagging
Summary
Natural language processing (NLP) has seen great progress after the introduction of continuous, dense, low dimensional vectors to represent text. The field has witnessed the creation of various word embedding models such as monolingual (Mikolov et al, 2013), contextualized (Peters et al, 2018), multi-sense (Pilehvar et al, 2017) and dependency-based (Levy and Goldberg, 2014); as well as adaptation and design of neural network architectures for a wide range of NLP tasks Despite their impressive performance, interpreting, analyzing and evaluating such black-box models have been shown to be challenging, which even led to a set of workshop series (Linzen et al, 2018). Unlike most studies, Kohn (2015) introduced a set of multilingual probing tasks, its scope has been limited to syntactic tests and 7 languages More importantly it is not accessible as a web application and the source code does not have support to probe pretrained downstream NLP models out of the box. To the best of our knowledge, this is the first web application that (a) performs online probing; (b) enables users to upload their pretrained downstream task models to automatically analyze different layers and epochs; and (c) has support for 28 languages with some of them being extremely low-resource such as Quechuan
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.