Abstract

Recently several studies have shown that word frequency estimation based on subtitle files explains better the variance in word recognition performance than traditional words frequency estimates did. The present study aims to show this frequency estimate in Albanian from more than 2M words coming from film subtitles. Our results show high correlation between the RT from a LD study (120 stimuli) and the SUBTLEX- AL, as well as, high correlation between this and the unique existing frequency list of a hundred more frequent Albanian words. These findings suggest that SUBTLEX-AL it is good frequency estimation, furthermore, this is the first database of frequency estimation in Albanian larger than 100 words.

Highlights

  • Several studies have shown that word frequency estimation based on subtitle files explains better the variance in word recognition performance than traditional words frequency estimates did

  • Our main objective of this investigation was the creation of a first frequency measure in Albanian language, which would serve as for further investigations, as mentioned in the introduction

  • In spite of the Brysbeart and New (2009) profess that the optimal corpus size for a reliable estimation of low frequency words should be at least of a 16 million words whereas for high frequency words a corpus size of one million reaches a stable level, and even though our small corpus, we carried out the analysis because of the necessity of this measure in Albanian

Read more

Summary

Introduction

Several studies have shown that word frequency estimation based on subtitle files explains better the variance in word recognition performance than traditional words frequency estimates did. Any study involving word processing, as memory, reading, writing, speaking, and all other basic psychological processes have to consider this variable, either in normal samples (adults or children), or patients (aphasia, Alzheimer's dementia, dyslexia, Parkinsons disease) For this reason, researchers require a good estimation of frequency, which allows them to select words for their experimental or clinical purposes. The first frequencies corpus made by using online Usernet groups was done by Burgess and Livesay (1998) called Hyperspace Analog to Language (HAL) with more than 130 million words In these groups, the Internet users participate in discussions on a variety of topics without much supervision or editing. Word use on the Internet is more varied than the formal language used in edited texts

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.