Estimation of Cross-Lingual News Similarities Using Text-Mining Methods

Zhouhao Wang,Kota Tsubouchi,Kiyoshi Izumi,Tatsuo Yamashita,Tomoki Ito,Enda Liu,Hiroki Sakaji

doi:10.3390/jrfm11010008

Zhouhao Wang, Kota Tsubouchi + Show 5 more

Open Access

https://doi.org/10.3390/jrfm11010008

Copy DOI

Abstract

In this research, two estimation algorithms for extracting cross-lingual news pairs based on machine learning from financial news articles have been proposed. Every second, innumerable text data, including all kinds news, reports, messages, reviews, comments, and tweets are generated on the Internet, and these are written not only in English but also in other languages such as Chinese, Japanese, French, etc. By taking advantage of multi-lingual text resources provided by Thomson Reuters News, we developed two estimation algorithms for extracting cross-lingual news pairs from multilingual text resources. In our first method, we propose a novel structure that uses the word information and the machine learning method effectively in this task. Simultaneously, we developed a bidirectional Long Short-Term Memory (LSTM) based method to calculate cross-lingual semantic text similarity for long text and short text, respectively. Thus, when an important news article is published, users can read similar news articles that are written in their native language using our method.

Highlights

Text similarity, as its name suggests, refers to how similar a given text query is to others
The fundamental objective is to develop algorithms for estimation of semantic similarity for the given two pieces of text written in different languages, applicable for both long text and short text, by taking advantage the untapped vast suppository of text resources from Thomson Reuters economics news reports
We developed a new recurrent structure inspired by Manhattan LSTM (MaLSTM), by modifying the Siamese Long Short-Term Memory (LSTM) modules to “unbalanced” ones, and adding a full-connect neural network layer following the output of LSTM modules, which is more flexible and effective than a text similarity task

Summary

Introduction

As its name suggests, refers to how similar a given text query is to others. The text could be in the form of character level, word level, sentence level, paragraph level, or even longer, document level. We mainly discuss text that is in the form of sentences (i.e., short text) and documents (i.e., long text). The fundamental objective is to develop algorithms for estimation of semantic similarity for the given two pieces of text written in different languages, applicable for both long text and short text, by taking advantage the untapped vast suppository of text resources from Thomson Reuters economics news reports. We excavate cross-lingual resources from the enormous database of Thomson Reuters News and build an effective cross-lingual system by taking advantage of this un-developed treasure

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Risk and Financial Management	Publication Date: Jan 31, 2018
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Estimation of Cross-Lingual News Similarities Using Text-Mining Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Risk and Financial Management

Lead the way for us

Similar Papers

Multilabel Text Classification in News Articles Using Long-Term Memory with Word2Vec
Winda Kurnia Sari ... Dian Palupi Rini
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) | VOL. 4
Winda Kurnia Sari, et. al. Winda Kurnia Sari ... Dian Palupi Rini
19 Apr 2020
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) | VOL. 4

Multilabel Classification for News Article Using Long Short-Term Memory
Winda Kurnia Sari ... Reza Firsandaya Malik
Sriwijaya Journal of Informatics and Applications | VOL. 1
Winda Kurnia Sari, et. al.Winda Kurnia Sari ... Reza Firsandaya Malik
09 Jul 2020
Sriwijaya Journal of Informatics and Applications | VOL. 1

Explainable stock prices prediction from financial news articles using sentiment analysis.
Shilpa Gite ... Shilpi Srivastava
PeerJ Computer Science | VOL. 7
Shilpa Gite, et. al.Shilpa Gite ... Shilpi Srivastava
28 Jan 2021
PeerJ Computer Science | VOL. 7

Stock Prices Prediction from Financial News Articles Using LSTM and XAI
Shilpa Gite ... Priyam Maheshwari
-
Shilpa Gite, et. al.Shilpa Gite ... Priyam Maheshwari
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Estimation of Cross-Lingual News Similarities Using Text-Mining Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Risk and Financial Management