Abstract

In a text matching similarity task, a model takes two sequence of text as an input and predicts a category or scale value to show their relationship. A developed model is to measure the similarity - one of relationship between those two text. The model is SIAMESE network that implement two copies of same network of CNN, it takes text_1 and text_2 as the inputs respectively for two CNN networks. The output of each CNN network is features vector of the corresponding text input, both outputs are then fed by a loss function to calculate the value of loss (i.e. similarity). This research implemented two types of loss functions, i.e. Triplet loss and Contrastive loss. The usage purpose of these two types of loss functions was to see the influence toward the measurement results of similarity between two text being compared. The metrices used for this comparison are precision, recall, and F1-score. Based on the experimental results done on 1500 pairs of sentences, and varied on the epoch value starting from 10 until 200 with an increment of 10, showed the best result was for epoch value of 180 with precision 0.8004, recall 0.6780, and F1-score 0.6713 for Triplet loss function; and epoch value of 160 with precision 0.6463, recall 0.6440, and F1-score 0.6451 for Contrastive loss function gave the best performance. So that, the Triplet loss function gave better influence than Contrastive loss function in measuring similarity between two given sentences.

Highlights

  • The very fast growth of information nowdays causes a particular problem, such as an overwhelming of information [21]. It is very likely among those collections of huge of information found some similar ones, so that, they can be grouped into several classes based on their similarity

  • Text similarity approach will ease people to find relevance information. It has a great support in successness for text mining operations such as, searching and information retrieval (IR), text classification, information extraction (IE), document clustering [8], sentiment analysis [4] [10] [16][3] [13], machine translation, text summarization, and natural language processing (NLP)

  • Text similarity measurement may be done by comparing text - text matching

Read more

Summary

Introduction

The very fast growth of information nowdays causes a particular problem, such as an overwhelming of information [21]. A text similarity measurements is one of text mining approach that capable of coping with the information overwhelming. This process begins with finding similar word for sentece, paragraph, and document [6]. Text similarity approach will ease people to find relevance information It has a great support in successness for text mining operations such as, searching and information retrieval (IR), text classification, information extraction (IE), document clustering [8], sentiment analysis [4] [10] [16][3] [13], machine translation, text summarization, and natural language processing (NLP). In order to make the alignment process fully used, model must take many external syntaxtical features or aligment as additional inputs at alignment layer [5] [7], adopt a complex alignment mechanism [17], or build a big number of post-process layers to analyze alignment results [7]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call