Large-scale weighted sequence alignment for the study of intertextuality in Finnic oral folk poetry

Maciej Janicki

doi:10.46298/jdmdh.11390

Abstract

The digitization of large archival collections of oral folk poetry in Finland and Estonia has opened possibilities for large-scale quantitative studies of intertextuality. As an initial methodological step in this direction, I present a method for pairwise line-by-line comparison of poems using the weighted sequence alignment algorithm (a.k.a. ‘weighted edit distance’). The main contribution of the paper is a novel description of the algorithm in terms of matrix operations, which allows for much faster alignment of a poem against the entire corpus by utilizing modern numeric libraries and GPU capabilities. This way we are able to compute pairwise alignment scores between all pairs from among a corpus of over 280,000 poems. The resulting table of over 40 million pairwise poem similarities can be used in various ways to study the oral tradition. Some starting points for such research are sketched in the latter part of the article.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Large-scale weighted sequence alignment for the study of intertextuality in Finnic oral folk poetry

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of Data Mining & Digital Humanities

Lead the way for us

Journal: Journal of Data Mining & Digital Humanities	Publication Date: Aug 13, 2023
License type: CC BY 4.0

Similar Papers

String Resemblance Systems: A Unifying Framework for String Similarity with Applications to Literature and Music
Masayuki Takeda
-
Masayuki TakedaMasayuki Takeda
01 Jan 2001
01 Jan 2001

Combining pairwise structural similarity and deep learning interface contact prediction to estimate protein complex model accuracy in CASP15.
Raj S Roy ... Jian Liu
Proteins | VOL. 91
Raj S Roy, et. al.Raj S Roy ... Jian Liu
26 Jun 2023
Proteins | VOL. 91

Dig That Lick (DTL): Analyzing Large-Scale Data for Melodic Patterns in Jazz Performances
Chris Stover
Journal of the American Musicological Society | VOL. 74
Chris StoverChris Stover
01 Apr 2021
Journal of the American Musicological Society | VOL. 74

DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks.
Bin Liu ... Ke Yan
Briefings in Bioinformatics | VOL. 21
Bin Liu, et. al.Bin Liu ... Ke Yan
28 Oct 2019
Briefings in Bioinformatics | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Large-scale weighted sequence alignment for the study of intertextuality in Finnic oral folk poetry

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of Data Mining &amp; Digital Humanities

More From: Journal of Data Mining & Digital Humanities