Abstract

In this work, we apply authorship attribution to a large-scale corpus of song lyrics. As a sub-category of poetry, song lyrics embody cultural elements as well as stylistic attributes that are not present in prose. We draw attention to special characteristics such as repetitive sound patterns and rhyme based structures in lyrics that can be key to ownership, and present opportunities that cannot be employed for authorship attribution of other types of text such as tweets, emails, and blog posts. We first create a new balanced, large-scale data set of 12,000 song lyrics from 120 different artists. We propose CNN models for authorship attribution on this song lyric data set, in order to use structural information included in the lyrics, similarly to image classification. We conduct experiments at the character and sub-word levels that mostly reflect positional information. In addition, we use phoneme level features, which intrinsically involve attributes such as repetitions, rhyme, and meter, and represent elements unique to verse-based textual compositions. We attempt to discover idiosyncratic features and consequently author and genre associations by working with variants of CNN architectures that have been successfully used in other text classification domains. Our architecture choice results in a particular focus on lyric attributes residing in neighboring regions, since CNNs fail to apprehend long term textual dependencies. Finally, we empirically evaluate our results in comparison with the findings of previous test classification research from different domains.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call