Abstract The study explores the potential of utilizing particle use data to differentiate between seven Estonian registers: everyday conversation, institutional interaction, printed media (newspapers), prose fiction, academic prose, instant messaging, and internet comments. The objective is to develop a simple yet effective model that enables researchers to comprehend the internal logic behind register differentiation based on particle use. Particles are considered promising differentiators due to their independence from text content. The article outlines the chosen method, the model creation, and the testing process. A key finding reveals that hierarchical relationships between particles within registers prove more reliable indicators than general use frequencies. The method involves establishing correspondences between particle pairs and register pairs, facilitating the measurement of distances between registers. During testing, the model demonstrates high accuracy across registers, encountering some difficulties in categorizing fiction and institutional interaction. Overall, the study confirms the efficacy of the proposed method in distinguishing registers based on particle use, underscoring the significance of particles in linguistic analysis.
Read full abstract