Deep Learning for Period Classification of Historical Hebrew Texts

Chaya Liebeskind,Shmuel Liebeskind

doi:10.46298/jdmdh.5864

Abstract

In this study, we address the interesting task of classifying historical texts by their assumed period of writ-ing. This task is useful in digital humanity studies where many texts have unidentified publication dates.For years, the typical approach for temporal text classification was supervised using machine-learningalgorithms. These algorithms require careful feature engineering and considerable domain expertise todesign a feature extractor to transform the raw text into a feature vector from which the classifier couldlearn to classify any unseen valid input. Recently, deep learning has produced extremely promising re-sults for various tasks in natural language processing (NLP). The primary advantage of deep learning isthat human engineers did not design the feature layers, but the features were extrapolated from data witha general-purpose learning procedure. We investigated deep learning models for period classification ofhistorical texts. We compared three common models: paragraph vectors, convolutional neural networks (CNN) and recurrent neural networks (RNN), and conventional machine-learning methods. We demon-strate that the CNN and RNN models outperformed the paragraph vector model and the conventionalsupervised machine-learning algorithms. In addition, we constructed word embeddings for each timeperiod and analyzed semantic changes of word meanings over time.

Highlights

The aim of preserving and rendering cultural heritage more accessible motivates the digitization of historical texts in the last decade
We focus on neural language models for the period classification of historical texts
Our research focuses on the period classification of historical texts from the Responsa project1

Summary

INTRODUCTION

The aim of preserving and rendering cultural heritage more accessible motivates the digitization of historical texts in the last decade. In recent years, considerable research has been devoted to diachronic lexical resources, which comprise terms from different language periods [Borin and Forsberg, 2011, Liebeskind et al, 2013, Riedl et al, 2014] These resources are primarily used for studying language changes and supporting searches in historical domains, bridging the lexical gap between modern and ancient languages. Supervised machine-learning algorithms use the training data of the input examples with their desired output to study a function. Most conventional supervised machine-learning algorithms for the period classification of historical texts are either rule-based or corpus-based. Their efficiency depends on the prior feature engineering.

Diachronic data and tasks

The Responsa corpus and diachronic tasks

SUPERVISED MACHINE LEARNING FRAMEWORK

Conventional Machine-Learning models

Deep-Learning models

Word Embeddings

Convolutional Neural Networks

Recurrent Neural Networks

EVALUATION

Period Classification

Evaluation measures

Neural Networks Architectures

Conventional Machine-learning methods

Deep-learning methods

SEMANTIC CHANGES OF WORDS MEANING OVER TIME

Word Comparisons

Periods of Change

CONCLUSIONS AND FUTURE WORK

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Data Mining & Digital Humanities	Publication Date: Jun 13, 2020
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

Deep Learning for Period Classification of Historical Hebrew Texts

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Data Mining & Digital Humanities

Lead the way for us

Similar Papers

Text Sentiment Analysis Based on ResGCNN
Jie Qi ... Chengbo Liu
-
Jie Qi, et. al.Jie Qi ... Chengbo Liu
01 Nov 2019
01 Nov 2019

Hybrid Inception Recurrent Residual Convolutional Neural Network (HIRResCNN) with Harmony Search Optimization (HSO) for Early Breast Cancer Detection System
K Sangeetha ... S Prakash
NeuroQuantology | VOL. 19
K Sangeetha, et. al.K Sangeetha ... S Prakash
11 Aug 2021
NeuroQuantology | VOL. 19

Convolutional Recurrent Neural Networks for Text Classification
Lei Wang ... Tong Chen
-
Lei Wang, et. al.Lei Wang ... Tong Chen
01 Jul 2019
01 Jul 2019

Natural Language Processing using Deep Learning in Social Media
María Teresa Giménez Fayos
-
María Teresa Giménez FayosMaría Teresa Giménez Fayos
02 Sep 2021
02 Sep 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Learning for Period Classification of Historical Hebrew Texts

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Data Mining &amp; Digital Humanities

More From: Journal of Data Mining & Digital Humanities