Croatian Language N-Gram System

Bruno Blašković Šandor Dembitz

doi:10.3233/978-1-61499-105-2-696

Croatian Language N-Gram System

Bruno Blašković Šandor Dembitz

https://doi.org/10.3233/978-1-61499-105-2-696

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2012

#N-gram #Post-processing Phase #Language #Development Of Database System #Language Modeling #Issue In Languages #Process Of Collection #Popular Service #Average #Paper

Abstract
Full-Text
Similar Papers

Abstract

Listen

Large-scale n-gram models are available for a small number of languages. So far, Croatian was not one of them. The research presented in this paper describes the development of n-gram database system suitable for large-scale language modeling in Croatian. The process of n-gram collection relies on Croatian academic online spellchecker Hascheck, which has been publicly available since 1993, and is today a popular language service, with average daily traffic exceeding million tokens. The approach demonstrated in this paper eliminated the need of n-gram data cleaning in the post-processing phase, which is a serious issue in other languages. The spellchecker dynamics allowed Heaps’ law modeling to be applied to Croatian n-grams, which enabled the prediction of n-gram count growth.

Full Text

Published Version

Check institute access

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.

R Discovery Prime

Croatian Language N-Gram System

Abstract

Published Version

Talk to us

Similar Papers

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Croatian Language N-Gram System

Abstract

Published Version

Talk to us

Similar Papers