Text psycholinguistic features are a valuable source for various research topics since they are used to obtain psychological, social, and linguistic aspects from written texts using dictionary files. These files are structured in categories, which are defined as groups of dictionary words that tap a particular domain (e.g., negative emotion words). The Linguistic Inquiry Word Count (LIWC) is a vastly used and versatile computer-based language analysis tool designed for text psycholinguistic analysis. The most recent version of the default English dictionary is LIWC2015, as it was released with the 2015 version of the LIWC software. The literature has recently introduced the latest Brazilian Portuguese LIWC dictionary (BP-LIWC2015), developed with the same categories as the LIWC 2015 English dictionary. However, the literature has also reported the need to evaluate BP-LIWC2015. In this scenario, this work investigates three questions: (i) Since LIWC2015 shows consistent improvements over the English dictionary developed in 2007 (LIWC2007), does BP-LIWC2015 achieves better text classification results than the older Brazilian Portuguese dictionary (BP-LIWC2007)? (ii) What is the equivalence between BP-LIWC2015 and BP-LIWC2007 with LIWC2015? (iii) Are there significant differences between Brazilian Portuguese dictionaries? To answer these questions, we conducted text classification experiments with four datasets and seven classification algorithms to compare the two Brazilian Portuguese LIWC dictionaries reported in the literature (i.e., 2007 and 2015). Second, we used a bilingual Portuguese-English scientific news collection to analyze the correlation between LIWC2015 and Brazilian Portuguese LIWC dictionaries. The results indicate that BP-LIWC2015 outperforms the older version in Brazilian Portuguese text classification. Finally, we found a more significant correlation between BP-LIWC2015 and the original English dictionary than the older version.
Read full abstract