Abstract

This paper focuses on capturing the meaning of Natural Language Understanding (NLU) text features to detect the duplicate unsupervised features. The NLU features are compared with lexical approaches to prove the suitable classification technique. The transfer-learning approach is utilized to train the extraction of features on the Semantic Textual Similarity (STS) task. All features are evaluated with two types of datasets that belong to Bosch bug and Wikipedia article reports. This study aims to structure the recent research efforts by comparing NLU concepts for featuring semantics of text and applying it to IR. The main contribution of this paper is a comparative study of semantic similarity measurements. The experimental results demonstrate the Term Frequency–Inverse Document Frequency (TF-IDF) feature results on both datasets with reasonable vocabulary size. It indicates that the Bidirectional Long Short Term Memory (BiLSTM) can learn the structure of a sentence to improve the classification.

Highlights

  • Humans can exchange all kinds of information through language

  • The Receiver Operating Characteristic (ROC) curves adjust the threshold value in 0:01 steps starting at T = 0 till T = 1

  • The preprocessing steps stemming and stop word removal aim to reduce the sparsity of the word representations

Read more

Summary

Introduction

Humans can exchange all kinds of information through language. Language is used to talk about activities, discuss abstract concepts, and even helps to determine the sentimental state of other human beings. Language is used frequently in all kinds of communication, like e-mails, reports, conversations, and scientific papers. This usage generates large language datasets, which contain a lot of information. 80% of all relevant corporate data is text-heavy unstructured data [1]. The main part of that information cannot be used for further automatic analysis. The lack of such methods makes it difficult to find relevant information in large text datasets

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call