Building siamese attention-augmented recurrent convolutional neural networks for document similarity scoring

Sifei Han,Lingyun Shi,Russell Richie,Fuchiang R Tsui

doi:10.1016/j.ins.2022.10.032

Abstract

Automatically measuring document similarity is imperative in natural language processing, with applications ranging from recommendation to duplicate document detection. State-of-the-art approach in document similarity commonly involves deep neural networks, yet there is little study on how different architectures may be combined. Thus, we introduce the Siamese Attention-augmented Recurrent Convolutional Neural Network (S-ARCNN) that combines multiple neural network architectures. In each subnetwork of S-ARCNN, a document passes through a bidirectional Long Short-Term Memory (bi-LSTM) layer, which sends representations to local and global document modules. A local document module uses convolution, pooling, and attention layers, whereas a global document module uses last states of the bi-LSTM. Both local and global features are concatenated to form a single document representation. Using the Quora Question Pairs dataset, we evaluated S-ARCNN, Siamese convolutional neural networks (S-CNNs), Siamese LSTM, and two BERT models. While S-CNNs (82.02% F1) outperformed S-ARCNN (79.83% F1) overall, S-ARCNN slightly outperformed S-CNN on duplicate question pairs with more than 50 words (39.96% vs. 39.42% accuracy). With the potential advantage of S-ARCNN for processing longer documents, S-ARCNN may help researchers identify collaborators with similar research interests, help editors find potential reviewers, or match resumes with job descriptions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information Sciences	Publication Date: Oct 7, 2022
Citations: 8	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Building siamese attention-augmented recurrent convolutional neural networks for document similarity scoring

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Similar Papers

A Hybrid BLSTM-C Neural Network Proposed for Chinese Text Classification
Xutao Wang ... Pengjian Xu
-
Xutao Wang, et. al.Xutao Wang ... Pengjian Xu
01 Aug 2018
01 Aug 2018

DETECTION OF NETWORK ANOMALIES WITH NEURAL NETWORKS ALGORITHMS
H I Haidur
Telecommunication and Information Technologies | VOL. 78
H I HaidurH I Haidur
01 Jan 2023
Telecommunication and Information Technologies | VOL. 78

INTELLIGENT MODEL FOR CLASSIFYING HEMODYNAMIC PATTERNS OF BRAIN ACTIVATION TO IDENTIFY NEUROCOGNITIVE MECHANISMS OF SPATIAL-NUMERICAL ASSOCIATIONS
R G Asadullaev ... M A Sitnikova
Vestnik komp'iuternykh i informatsionnykh tekhnologii | VOL. -
R G Asadullaev, et. al.R G Asadullaev ... M A Sitnikova
01 Jan 2024
Vestnik komp'iuternykh i informatsionnykh tekhnologii | VOL. -

ACR-SA: attention-based deep model through two-channel CNN and Bi-RNN for sentiment analysis.
Marjan Kamyab ... Abdur Rasool
PeerJ Computer Science | VOL. 8
Marjan Kamyab, et. al.Marjan Kamyab ... Abdur Rasool
17 Mar 2022
PeerJ Computer Science | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Building siamese attention-augmented recurrent convolutional neural networks for document similarity scoring

Abstract

Talk to us

Similar Papers

More From: Information Sciences