This paper presents a comparison of auditory and machine-based identification of linguistic origins. Two studies were conducted to assess the ability of lay listeners and a stateof-the-art machine approach to identify Slavic L1 from delexicalized speech samples. The first study involved 228 native speakers of the four Slavic languages (Bulgarian, Czech, Polish and Russian) who had not received any prior training in Slavic philology, phonetics, linguistics, or forensic science. Their task was to identify the linguistic origins of speakers when exposed to limited phonetic cues. The stimuli consisted of meaningless logatomes to control for the lexical information. The second study employed machine-based identification of a spoken language, based on two distinct approaches: (1) formant structure of phonetic signal and (2) a neural network and vector representation of speech samples. The data showed that Slavic native speakers, even when exposed to limited auditory cues, are able to identify speakers’ L1s. Interestingly, in the context of the Bulgarian language, the machine-based identification method performed better than the lay listeners. The results of the experiments provide insight into the advantages of hybrid approaches in investigations related to LADO (Language Analyses for the Determination of Origin). Furthermore, the outcomes of this comparison may contribute to the debate on the involvement of native speakers in L1 identification procedures for closely related languages.
Read full abstract