Abstract

Occupation profiling is a subtask of authorship profiling that is broadly defined as an analysis of individuals’ writing styles. Although the problem has been widely explored, no previous studies have attempted to identify Chinese classical poetry. Inspired by Trudgill’s seminal work on stylistic variation as a function of occupation, we present a novel Domain-Knowledge Transformer model to predict a poet’s occupation through their poems’ writing styles. Different from other Indo-European languages, Chinese has rarely used characters and two types of writing forms: traditional Chinese and simplified Chinese. To tackle these problems, we use the language-related component to standardize our input. We also use alphabetization to satisfy the restrictions on rhyming rules and tonal styles. As a special literal form, traditional domain knowledge, for example, named entities, themes, ages and the official career path, is valuable for poet occupation profiling. However, due to the lack of appropriate annotation datasets, it is difficult to recognize these features. Therefore, we proposed the domain knowledge component employing the latent Dirichletal location model to capture the extra theme information and establish named entity dictionaries to recognize the named entity of the datasets in this study. Finally, in the deep learning component, we combine Transformer with a convolutional neural network (CNN) model to perform occupation profiling. The experimental results suggest that our model is effective in this task. Moreover, the results demonstrate an account of other social attribution features of poetry style that are predictive of occupation in this domain.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.