Abstract

Anthropological, archaeological, and forensic studies situate enforced disappearance as a strategy associated with the Brazilian military dictatorship (1964–1985), leaving hundreds of persons without identity or cause of death identified. Their forensic reports are the only existing clue for people identification and detection of possible crimes associated with them. The exchange of information among institutions about the identities of disappeared people was not a common practice. Thus, their analysis requires unsupervised techniques, mainly due to the fact that their contextual annotation is extremely time-consuming, difficult to obtain, and with high dependence on the annotator. The use of these techniques allows researchers to assist in the identification and analysis in four areas: Common causes of death, relevant body locations, personal belongings terminology, and correlations between actors such as doctors and police officers involved in the disappearances. This paper analyzes almost 3000 textual reports of missing persons in São Paulo city during the Brazilian dictatorship through unsupervised algorithms of information extraction in Portuguese, identifying named entities and relevant terminology associated with these four criteria. The analysis allowed us to observe terminological patterns relevant for people identification (e.g., presence of rings or similar personal belongings) and automate the study of correlations between actors. The proposed system acts as a first classificatory and indexing middleware of the reports and represents a feasible system that can assist researchers working in pattern search among autopsy reports.

Highlights

  • The development and improvement over the last few decades of natural language processing algorithms, both in performance and precision in some relevant tasks, has allowed a more systematic coverage and application of these approaches to domains where large textual sources are common and highly specific terminology and discursive structures exist

  • We present an analysis based on unsupervised information extraction algorithms in Portuguese of a collection of about 3000 forensic reports of persons buried as NN during 1971–1975, the most repressive period of the dictatorship

  • It is common that the results provided by naturallanguage language processing processing (NLP) algorithms presents an output format in free text or, in some cases, in a textual-based structure output stored in formats such as XML, JSON, or similar

Read more

Summary

Introduction

The development and improvement over the last few decades of natural language processing (hereafter NLP) algorithms, both in performance and precision in some relevant tasks (named entity recognition, open information extraction, part-of-speech tagging, among others), has allowed a more systematic coverage and application of these approaches to domains where large textual sources are common and highly specific terminology and discursive structures exist. Information 2019, 10, 231 and a high volume of forensic reports In these contexts, professionals often require software-assisted treatment that allows them to jointly analyze a large volume of reports, looking for certain patterns in them. Professionals often require software-assisted treatment that allows them to jointly analyze a large volume of reports, looking for certain patterns in them In addition to this need (given by the high volume of documents), most of these reports contain information about unsolved cases, which makes accurate and reliable processing of high humanitarian importance. For these reasons, the textual analysis and processing of reports like these is crucial

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.