Textual Similarity Task Research Articles

Semantic representation is a way of expressing the meaning of a text that can be processed by a machine to serve a particular natural language processing (NLP) task that usually requires meaning comprehension such as text summarisation, question answering or machine translation. In this paper, we present a semantic parsing model based on neural networks to obtain semantic representation of a given sentence. We utilise semantic representation of each sentence to generate semantically informed sentence embeddings for extrinsic evaluation of the proposed semantic parser, in particular for the semantic textual similarity task. Our neural parser utilises self-attention mechanism to learn semantic relations between words in a sentence to generate semantic representation of a sentence in UCCA (Universal Conceptual Cognitive Annotation) semantic annotation framework (Abend and Rappoport, 2013), which is a cross-linguistically applicable graph-based semantic representation. The UCCA representations are conveyed into a Siamese Neural Network built on top of two Recursive Neural Networks (Siamese-RvNN) to derive semantically informed sentence embeddings which are evaluated on semantic textual similarity task. We conduct both single-lingual and cross-lingual experiments with zero-shot and few-shot learning, which have shown superior performance even in low-resource scenario. The experimental results show that the proposed self-attentive neural parser outperforms the other parsers in the literature on English and German, and shows significant improvement in the cross-lingual setting for French which has comparatively low sources. Moreover, the results obtained from other downstream tasks such as sentiment analysis confirm that semantically informed sentence embeddings provide higher-quality embeddings compared to other pre-trained models such as SBERT (Reimers et al., 2019) or SimCSE (Gao et al., 2021), which do not utilise such structured information.

Read full abstract

BackgroundSemantic textual similarity is a common task in the general English domain to assess the degree to which the underlying semantics of 2 text segments are equivalent to each other. Clinical Semantic Textual Similarity (ClinicalSTS) is the semantic textual similarity task in the clinical domain that attempts to measure the degree of semantic equivalence between 2 snippets of clinical text. Due to the frequent use of templates in the Electronic Health Record system, a large amount of redundant text exists in clinical notes, making ClinicalSTS crucial for the secondary use of clinical text in downstream clinical natural language processing applications, such as clinical text summarization, clinical semantics extraction, and clinical information retrieval.ObjectiveOur objective was to release ClinicalSTS data sets and to motivate natural language processing and biomedical informatics communities to tackle semantic text similarity tasks in the clinical domain.MethodsWe organized the first BioCreative/OHNLP ClinicalSTS shared task in 2018 by making available a real-world ClinicalSTS data set. We continued the shared task in 2019 in collaboration with National NLP Clinical Challenges (n2c2) and the Open Health Natural Language Processing (OHNLP) consortium and organized the 2019 n2c2/OHNLP ClinicalSTS track. We released a larger ClinicalSTS data set comprising 1642 clinical sentence pairs, including 1068 pairs from the 2018 shared task and 1006 new pairs from 2 electronic health record systems, GE and Epic. We released 80% (1642/2054) of the data to participating teams to develop and fine-tune the semantic textual similarity systems and used the remaining 20% (412/2054) as blind testing to evaluate their systems. The workshop was held in conjunction with the American Medical Informatics Association 2019 Annual Symposium.ResultsOf the 78 international teams that signed on to the n2c2/OHNLP ClinicalSTS shared task, 33 produced a total of 87 valid system submissions. The top 3 systems were generated by IBM Research, the National Center for Biotechnology Information, and the University of Florida, with Pearson correlations of r=.9010, r=.8967, and r=.8864, respectively. Most top-performing systems used state-of-the-art neural language models, such as BERT and XLNet, and state-of-the-art training schemas in deep learning, such as pretraining and fine-tuning schema, and multitask learning. Overall, the participating systems performed better on the Epic sentence pairs than on the GE sentence pairs, despite a much larger portion of the training data being GE sentence pairs.ConclusionsThe 2019 n2c2/OHNLP ClinicalSTS shared task focused on computing semantic similarity for clinical text sentences generated from clinical notes in the real world. It attracted a large number of international teams. The ClinicalSTS shared task could continue to serve as a venue for researchers in natural language processing and medical informatics communities to develop and improve semantic textual similarity techniques for clinical text.

Read full abstract

Textual Similarity Task Research Articles

Related Topics

Articles published on Textual Similarity Task

Slovak morphological tokenizer using the Byte-Pair Encoding algorithm

Soft cosine and extended cosine adaptation for pre-trained language model semantic vector analysis

Simple Data Transformations for Mitigating the Syntactic Similarity to Improve Sentence Embeddings at Supervised Contrastive Learning

SEBGM: Sentence Embedding Based on Generation Model with multi-task learning

DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning

Contrastive sentence representation learning with adaptive false negative cancellation

Grouped Contrastive Learning of Self-Supervised Sentence Representation

Refined SBERT: Representing sentence BERT in manifold space

Contrastive Learning Models for Sentence Representations

Parameter-efficient feature-based transfer for paraphrase identification

A Siamese Neural Network for Learning Semantically-Informed Sentence Embeddings

SGL: Symbolic Goal Learning in a Hybrid, Modular Framework for Human Instruction Following

A novel locality-sensitive hashing relational graph matching network for semantic textual similarity measurement

A Quantum Language-Inspired Tree Structural Text Representation for Semantic Analysis

TA-SBERT: Token Attention Sentence-BERT for Improving Sentence Representation

Sentence transition matrix: An efficient approach that preserves sentence semantics

Efficient natural language classification algorithm for detecting duplicate unsupervised features

Sense representations for Portuguese: experiments with sense embeddings and deep neural language models

Exploiting Syntactic and Semantic Information for Textual Similarity Estimation

The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Textual Similarity Task Research Articles

Related Topics

Articles published on Textual Similarity Task

Slovak morphological tokenizer using the Byte-Pair Encoding algorithm

Soft cosine and extended cosine adaptation for pre-trained language model semantic vector analysis

Simple Data Transformations for Mitigating the Syntactic Similarity to Improve Sentence Embeddings at Supervised Contrastive Learning

SEBGM: Sentence Embedding Based on Generation Model with multi-task learning

DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning

Contrastive sentence representation learning with adaptive false negative cancellation

Grouped Contrastive Learning of Self-Supervised Sentence Representation

Refined SBERT: Representing sentence BERT in manifold space

Contrastive Learning Models for Sentence Representations

Parameter-efficient feature-based transfer for paraphrase identification

A Siamese Neural Network for Learning Semantically-Informed Sentence Embeddings

SGL: Symbolic Goal Learning in a Hybrid, Modular Framework for Human Instruction Following

A novel locality-sensitive hashing relational graph matching network for semantic textual similarity measurement

A Quantum Language-Inspired Tree Structural Text Representation for Semantic Analysis

TA-SBERT: Token Attention Sentence-BERT for Improving Sentence Representation

Sentence transition matrix: An efficient approach that preserves sentence semantics

Efficient natural language classification algorithm for detecting duplicate unsupervised features

Sense representations for Portuguese: experiments with sense embeddings and deep neural language models

Exploiting Syntactic and Semantic Information for Textual Similarity Estimation

The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.