Occupation profiling is a subtask of authorship profiling that is broadly defined as an analysis of individuals’ writing styles. Although the problem has been widely explored, no previous studies have attempted to identify Chinese classical poetry. Inspired by Trudgill’s seminal work on stylistic variation as a function of occupation, we present a novel Domain-Knowledge Transformer model to predict a poet’s occupation through their poems’ writing styles. Different from other Indo-European languages, Chinese has rarely used characters and two types of writing forms: traditional Chinese and simplified Chinese. To tackle these problems, we use the language-related component to standardize our input. We also use alphabetization to satisfy the restrictions on rhyming rules and tonal styles. As a special literal form, traditional domain knowledge, for example, named entities, themes, ages and the official career path, is valuable for poet occupation profiling. However, due to the lack of appropriate annotation datasets, it is difficult to recognize these features. Therefore, we proposed the domain knowledge component employing the latent Dirichletal location model to capture the extra theme information and establish named entity dictionaries to recognize the named entity of the datasets in this study. Finally, in the deep learning component, we combine Transformer with a convolutional neural network (CNN) model to perform occupation profiling. The experimental results suggest that our model is effective in this task. Moreover, the results demonstrate an account of other social attribution features of poetry style that are predictive of occupation in this domain.
Read full abstract