BERT Embeddings Research Articles

In the knowledge discovery field of the Big Data domain the analysis of geographic positioning and mobility information plays a key role. At the same time, in the Natural Language Processing (NLP) domain pre-trained models such as BERT and word embedding algorithms such as Word2Vec enabled a rich encoding of words that allows mapping textual data into points of an arbitrary multi-dimensional space, in which the notion of proximity reflects an association among terms or topics. The main contribution of this paper is to show how analytical tools, traditionally adopted to deal with geographic data to measure the mobility of an agent in a time interval, can also be effectively applied to extract knowledge in a semantic realm, such as a semantic space of words and topics, looking for latent trajectories that can benefit the properties of neural network latent representations. As a case study, the Scopus database was queried about works of highly cited researchers in recent years. On this basis, we performed a dynamic analysis, for measuring the Radius of Gyration as an index of the mobility of researchers across scientific topics. The semantic space is built from the automatic analysis of the paper abstracts of each author. In particular, we evaluated two different methodologies to build the semantic space and we found that Word2Vec embeddings perform better than the BERT ones for this task. Finally, The scholars’ trajectories show some latent properties of this model, which also represent new scientific contributions of this work. These properties include (i) the correlation between the scientific mobility and the achievement of scientific results, measured through the H-index; (ii) differences in the behavior of researchers working in different countries and subjects; and (iii) some interesting similarities between mobility patterns in this semantic realm and those typically observed in the case of human mobility.

Read full abstract

Family and Domestic violence (FDV) is a global problem with significant social, economic, and health consequences for victims including increased health care costs, mental trauma, and social stigmatization. In Australia, the estimated annual cost of FDV is $22 billion, with one woman being murdered by a current or former partner every week. Despite this, tools that can predict future FDV based on the features of the person of interest (POI) and victim are lacking. The New South Wales Police Force attends thousands of FDV events each year and records details as fixed fields (e.g., demographic information for individuals involved in the event) and as text narratives which describe abuse types, victim injuries, threats, including the mental health status for POIs and victims. This information within the narratives is mostly untapped for research and reporting purposes. After applying a text mining methodology to extract information from 492,393 FDV event narratives (abuse types, victim injuries, mental illness mentions), we linked these characteristics with the respective fixed fields and with actual mental health diagnoses obtained from the NSW Ministry of Health for the same cohort to form a comprehensive FDV dataset. These data were input into five deep learning models (MLP, LSTM, Bi-LSTM, Bi-GRU, BERT) to predict three FDV offense types (“hands-on,” “hands-off,” “Apprehended Domestic Violence Order (ADVO) breach”). The transformer model with BERT embeddings returned the best performance (69.00% accuracy; 66.76% ROC) for “ADVO breach” in a multilabel classification setup while the binary classification setup generated similar results. “Hands-off” offenses proved the hardest offense type to predict (60.72% accuracy; 57.86% ROC using BERT) but showed potential to improve with fine-tuning of binary classification setups. “Hands-on” offenses benefitted least from the contextual information gained through BERT embeddings in which MLP with categorical embeddings outperformed it in three out of four metrics (65.95% accuracy; 78.03% F1-score; 70.00% precision). The encouraging results indicate that future FDV offenses can be predicted using deep learning on a large corpus of police and health data. Incorporating additional data sources will likely increase the performance which can assist those working on FDV and law enforcement to improve outcomes and better manage FDV events.

Read full abstract

BERT Embeddings Research Articles

Related Topics

Articles published on BERT Embeddings

BI-RADS BERT and Using Section Segmentation to Understand Radiology Reports.

FiNCAT-2: An enhanced Financial Numeral Claim Analysis Tool

Integrated BERT embeddings, BiLSTM-BiGRU and 1-D CNN model for binary sentiment classification analysis of movie reviews

Event-related microblog retrieval in Turkish

Hate speech detection on Twitter using transfer learning

Topic2Labels: A framework to annotate and classify the social media data through LDA topics and deep learning models for crisis response

Graph-based joint pandemic concern and relation extraction on Twitter

Comparative analysis of contextual and context-free embeddings in disaster prediction from Twitter data

Mobility in Unsupervised Word Embeddings for Knowledge Extraction—The Scholars’ Trajectories across Research Topics

Named Entity Recognition for Addresses: An Empirical Study

A Novel Deep Learning Approach Using Contextual Embeddings for Toponym Resolution

Finding light in dark archives: using AI to connect context and content in email

Let’s PlayMono-Poly: BERT Can Reveal Words’ Polysemy Level and Partitionability into Senses

Decoding EEG Brain Activity for Multi-Modal Natural Language Processing.

Varying Abstractions: a conceptual vs. distributional view on prepositional polysemy

Circles are like Ellipses, or Ellipses are like Circles? Measuring the Degree of Asymmetry of Static and Contextual Word Embeddings and the Implications to Representation Learning

Integrating Extracted Information from Bert and Multiple Embedding Methods with the Deep Neural Network for Humour Detection

Stance detection with BERT embeddings for credibility analysis of information on social media.

Analysis of Text Feature Extractors using Deep Learning on Fake News

Utilizing Text Mining, Data Linkage and Deep Learning in Police and Health Records to Predict Future Offenses in Family and Domestic Violence.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

BERT Embeddings Research Articles

Related Topics

Articles published on BERT Embeddings

BI-RADS BERT and Using Section Segmentation to Understand Radiology Reports.

FiNCAT-2: An enhanced Financial Numeral Claim Analysis Tool

Integrated BERT embeddings, BiLSTM-BiGRU and 1-D CNN model for binary sentiment classification analysis of movie reviews

Event-related microblog retrieval in Turkish

Hate speech detection on Twitter using transfer learning

Topic2Labels: A framework to annotate and classify the social media data through LDA topics and deep learning models for crisis response

Graph-based joint pandemic concern and relation extraction on Twitter

Comparative analysis of contextual and context-free embeddings in disaster prediction from Twitter data

Mobility in Unsupervised Word Embeddings for Knowledge Extraction—The Scholars’ Trajectories across Research Topics

Named Entity Recognition for Addresses: An Empirical Study

A Novel Deep Learning Approach Using Contextual Embeddings for Toponym Resolution

Finding light in dark archives: using AI to connect context and content in email

Let’s PlayMono-Poly: BERT Can Reveal Words’ Polysemy Level and Partitionability into Senses

Decoding EEG Brain Activity for Multi-Modal Natural Language Processing.

Varying Abstractions: a conceptual vs. distributional view on prepositional polysemy

Circles are like Ellipses, or Ellipses are like Circles? Measuring the Degree of Asymmetry of Static and Contextual Word Embeddings and the Implications to Representation Learning

Integrating Extracted Information from Bert and Multiple Embedding Methods with the Deep Neural Network for Humour Detection

Stance detection with BERT embeddings for credibility analysis of information on social media.

Analysis of Text Feature Extractors using Deep Learning on Fake News

Utilizing Text Mining, Data Linkage and Deep Learning in Police and Health Records to Predict Future Offenses in Family and Domestic Violence.