Abstract

This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.

Highlights

  • There is research in the field of natural language processing that focuses on linguistic styles and characterizes utterances of confined groups categorized by some features like gender or age

  • We show that the expressions extracted using subword units are more interpretable than those using the original words for the extractions of linguistic speech patterns of fictional characters, which is the case where many words are not listed in the dictionary using data collected from publications on the internet

  • We proposed using subword units to segment dialogs of fictional characters

Read more

Summary

Introduction

There is research in the field of natural language processing that focuses on linguistic styles and characterizes utterances of confined groups categorized by some features like gender or age. Human characters have characterspecific linguistic speech patterns in novels, anime, and games. They are known as role language [1] and it is related to characterization; the role language shows what role the speaker plays, and sometimes it is different from real conversation. “僕, boku, I” is a first-person singular usually used for boys in novels, anime, and games, but it is used for men and boys in real life. In this study, we extracted and analyzed the linguistic speech patterns that characterize these characters using utterances of anime or game characters. Word segmentation and morphological analysis are widely performed using morphological analyzers like MeCab and Chasen and their

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call