Abstract

The role of stylometric methods in linguistics has received increased attention across a number of disciplines in recent years, particularly in forensic linguistics. This study assesses the value of correspondence analysis, a stylometric method, in Vietnamese text analysis. Based on a dataset extracted from VVC (VnExpress Viewpoint Corpus), a 1.3-million-token corpus of Vietnamese opinion articles, linguistic features examined are seven parts-of-speech features to seek relational features characterizing authorial styles. Our focus in the analysis is on feature effects, with the aim to shed light on whether linguistic features of writing styles are consistent across various genders and professions. Seven features altogether produce encouraging results to what is acknowledged to be a difficult problem for Vietnamese language. In addition, we find that when using correspondence analysis for seven linguistic features in the dataset based on authors’ gender, conjunctions and verbs perform best. Regarding authors’ profession, conjunctions and pronouns offer a striking improvement on stylometric investigation. The discriminating ability was particularly impressive, suggesting that, in a collective sense, parts-of-speech features provide a good set of markers.

Highlights

  • IntroductionThe link between stylometric analysis and linguistics has been at the center of much attention

  • During the last decade, the link between stylometric analysis and linguistics has been at the center of much attention

  • Regarding authors’ profession, conjunctions and pronouns offer a striking improvement on stylometric investigation

Read more

Summary

Introduction

The link between stylometric analysis and linguistics has been at the center of much attention. The innovative work of Barlow (2013) pioneered a new approach to examining linguistic features by using correspon-. Dinh dence analysis technique based on a specialized corpus, providing a reliable technique to identify a language user. He insisted that one consequence of relying on corpus data is that individual differences in usage tend to be obscured. To overcome this problem and investigate individual differences in spoken usage, he examined a corpus consisting of the spoken output of six White House press secretaries. The results provide strong evidence that within this one particular discourse context, the patterns of speech of each individual are clearly recognizable

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call