Abstract

In this paper, the methods for developing a database of Spanish writing that can be used for forensic linguistic research are presented, including our data collection procedures. Specifically, the main instrument used for data collection has been translated into Spanish and adapted from Chaski (2001). It consists of ten tasks, by means of which the subjects are asked to write formal and informal texts about different topics. To date, 93 undergraduates from Spanish universities have already participated in the study and prisoners convicted of gender-based abuse have participated. A twofold analysis has been performed, since the data collected have been approached from a semantic and a morphosyntactic perspective. Regarding the semantic analysis, psycholinguistic categories have been used, many of them taken from the LIWC dictionary (Pennebaker et al., 2001). In order to obtain a more comprehensive depiction of the linguistic data, some other ad-hoc categories have been created, based on the corpus itself, using a double-check method for their validation so as to ensure inter-rater reliability. Furthermore, as regards morphosyntactic analysis, the natural language processing tool ALIAS TATTLER is being developed for Spanish. Results shows that is it possible to differentiate non-abusers from abusers with strong accuracy based on linguistic features.

Highlights

  • This study is part of a project, which is the first study of the Spanish language associated with a specific type of criminals in Spain, by means of a natural language processing tool

  • The text analysis tools used in this study are LIWC: Linguistic Inquiry and Word Count (Pennebaker, Francis and Booth 2001) and ALIAS: Automated Linguistic Identification & Assessment System (Chaski 1997, 2001, 2005)

  • LIWC has provided an automated text analysis which we supplemented with manual analysis to present higher level semantic analysis

Read more

Summary

Introduction

This study is part of a project, which is the first study of the Spanish language associated with a specific type of criminals in Spain, by means of a natural language processing tool. The text analysis tools used in this study are LIWC: Linguistic Inquiry and Word Count (Pennebaker, Francis and Booth 2001) and ALIAS: Automated Linguistic Identification & Assessment System (Chaski 1997, 2001, 2005). This journal is published by the University Library System, University of Pittsburgh as part of its D-Scribe Digital Publishing Program and is cosponsored by the University of Pittsburgh Press.

Corpus linguistics
Aspects of analysis and tagging in corpus linguistics
Corpus Linguistics for forensic purposes
Data collection: compilation of our corpus
Experimental Group of Gender-based abusers
Control group
Challenges
Compliance with Ethical Principles for Human Subjects Research
Data analysis method
Semantic focus
Spelling variation and software
Can we separate the groups and how are they distinguished?
Results and discussion
Conclusions and further research
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call