Subword Representations Research Articles

Abstract This study extends the idea of decoding word-evoked brain activations using a corpus-semantic vector space to multimorphemic words in the agglutinative Finnish language. The corpus-semantic models are trained on word segments, and decoding is carried out with word vectors that are composed of these segments. We tested several alternative vector-space models using different segmentations: no segmentation (whole word), linguistic morphemes, statistical morphemes, random segmentation, and character-level 1-, 2- and 3-grams, and paired them with recorded MEG responses to multimorphemic words in a visual word recognition task. For all variants, the decoding accuracy exceeded the standard word-label permutation-based significance thresholds at 350–500 ms after stimulus onset. However, the critical segment-label permutation test revealed that only those segmentations that were morphologically aware reached significance in the brain decoding task. The results suggest that both whole-word forms and morphemes are represented in the brain and show that neural decoding using corpus-semantic word representations derived from compositional subword segments is applicable also for multimorphemic word forms. This is especially relevant for languages with complex morphology, because a large proportion of word forms are rare and it can be difficult to find statistically reliable surface representations for them in any large corpus.

Read full abstract

ObjectiveThis article describes an ensembling system to automatically extract adverse drug events and drug related entities from clinical narratives, which was developed for the 2018 n2c2 Shared Task Track 2.Materials and MethodsWe designed a neural model to tackle both nested (entities embedded in other entities) and polysemous entities (entities annotated with multiple semantic types) based on MIMIC III discharge summaries. To better represent rare and unknown words in entities, we further tokenized the MIMIC III data set by splitting the words into finer-grained subwords. We finally combined all the models to boost the performance. Additionally, we implemented a featured-based conditional random field model and created an ensemble to combine its predictions with those of the neural model.ResultsOur method achieved 92.78% lenient micro F1-score, with 95.99% lenient precision, and 89.79% lenient recall, respectively. Experimental results showed that combining the predictions of either multiple models, or of a single model with different settings can improve performance.DiscussionAnalysis of the development set showed that our neural models can detect more informative text regions than feature-based conditional random field models. Furthermore, most entity types significantly benefit from subword representation, which also allows us to extract sparse entities, especially nested entities.ConclusionThe overall results have demonstrated that the ensemble method can accurately recognize entities, including nested and polysemous entities. Additionally, our method can recognize sparse entities by reconsidering the clinical narratives at a finer-grained subword level, rather than at the word level.

Read full abstract

Subword Representations Research Articles

Related Topics

Articles published on Subword Representations

Subword Representations Successfully Decode Brain Responses to Morphologically Complex Written Words

Transmorph: a transformer based morphological disambiguator for Turkish

Idiomatic Expression Identification using Semantic Compatibility

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

An ensemble of neural models for nested adverse drug events and medication extraction with subwords.

Restoration of temporal information in off-line arabic handwriting

Lexical priming from partial-word previews.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Subword Representations Research Articles

Related Topics

Articles published on Subword Representations

Subword Representations Successfully Decode Brain Responses to Morphologically Complex Written Words

Transmorph: a transformer based morphological disambiguator for Turkish

Idiomatic Expression Identification using Semantic Compatibility

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

An ensemble of neural models for nested adverse drug events and medication extraction with subwords.

Restoration of temporal information in off-line arabic handwriting

Lexical priming from partial-word previews.