Abstract

To analyse the characteristics of utterances in Japanese novels, several attributes (e.g., the speaker, listener, relationship between the speaker and listener, and gender of the speaker) were added to a randomly extracted Japanese novel corpus. A total of 887 data sets, with 5632 annotated utterances, were prepared. Based on the attribute annotated utterance corpus, the characteristics of utterance styles were extracted quantitatively. A chi-square test was used for particles and auxiliary verbs to extract utterance characteristics which reflected the genders of and relationships between the speakers and listeners. Results revealed that the use of imperative words was higher among male characters than their female counterparts, who used more particle verbs, and that auxiliaries of politeness were used more frequently for ‘coworkers’ and ‘superior authorities’. In addition, utterances varied between close and intimate relationships between the speaker and listener. Moreover, repeated factor analyses for 7576 data sets in BCCWJ speaker information corpus revealed ten typical utterance styles (neutral, frank, dialect, polite, feminine, crude, aged, interrogative, approval, and dandy). The factor scores indicated relationships between various utterance styles and fundamental attributes of speakers. Thus, results of this study would be utilisable in speaker identification tasks, automatic speech generation tasks, and scientific interpretation of stories and characters.

Highlights

  • To process story texts automatically using information technologies and artificial intelligence, it is necessary to identify the relationships between linguistic characteristics and attributes in the story

  • Writing styles are affected by various attributes such as genre, time and culture settings, social backgrounds, personalities of the characters, and the mood of a scene. Those characteristics of written styles have been utilised for text categorization and author identification tasks [1, 2]

  • To analyse relationships between attributes and conversational sentences, a tagged dialogue corpus of Japanese novels was employed [12]. This corpus is based on a random sampling of Japanese novel texts within the Balanced Corpus of Contemporary Written Japanese (BCCWJ) [13]

Read more

Summary

Introduction

To process story texts automatically using information technologies and artificial intelligence, it is necessary to identify the relationships between linguistic characteristics and attributes in the story. The distinct stylistic characteristics in each speaker’s manner of speech tend to be exaggerated in conversations between fictional characters ( in the entertainment content) These characteristics do not precisely reflect the conversational styles of real people [9], they seem to function effectively as common symbols between the writers and readers of fictional texts. Previous research on the characteristics of distinct conversational styles has clarified the types of implied attributes of story characters and investigated the historical and cultural origins of those styles [6] These results are based on individual interpretations of the researchers. It would clarify effective features for identifying characters’ personality It would become useful information in order to solve speaker identification problems in natural language processing based on the relationships between attributes of fictional characters and conversational styles. Because of limitation of included attributes, analysis about factor scores have been done only for gender and age attributes

Materials and Methods
Results and Discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call