Abstract

Artikkel kirjeldab eesti keele kitsenduste grammatika kohandamist internetikeelele. Selleks parsiti 19 809 sone suurune jututubade korpus eesti kirjakeele jaoks valjatootatud reeglistikuga. Korpuse margenduse kasitsi kontrollimisel leitud vigade pohjal tehti reeglistikku muudatusi neljas etapis: osalausepiiride tuvastamine, uhendverbide tuvastamine, pindsuntaktiline analuus ning soltuvussuntaktiline analuus. Too kaigus leiti, et internetikeele suntaksi olulisemateks erijoonteks on laialdane partiklite ja utete kasutus, vaiksem taiendite osakaal, lausete luhidus ja valjajatteliste lausete sage esinemine. Reeglistiku kohandamise tulemusel paranesid nii pind- kui soltuvussuntaktilise analuusi naitajad. Koige enam vigu tekkis subjektide, predikatiivide ja adverbiaalide funktsioonide margendamisel. Soltuvussuntaktilisel analuusil esines enim vigu adverbiaalide soltuvusmargendites. Syntactic analysis of Estonian netspeak using Constraint Grammar The paper provides an overview of an attempt to adapt the Estonian Constraint Grammar rule set for netspeak. The rule set has been developed by Kaili Muurisep and Tiina Puolakainen for shallow and dependency parsing of Estonian literary language, and it has previously been adapted for shallow parsing of spoken Estonian by Kaili Muurisep and Heli Uibo. First, in order to adapt the rules, a chatroom corpus was parsed with the existing rule set. The corpus was manually revised and based on the errors that were found, changes were made to the rule set. The changes regarded detection of clause boundaries and particle verbs, as well as assignment of syntactic tags and dependency relations. Extensive use of discourse particles and direct addresses, short sentence length, and small percentage of attributes among the syntactic functions used in text appeared to be the most distinctive features of netspeak, as well as the large amount of elliptical sentences from which, in addition to other syntactic functions, a predicate can be left out. As a result of adapting the rule set, the results of both shallow and dependency parsing improved. The most error-prone syntactic functions were subjects, predicatives, and adverbials. In dependency parsing, the largest number of errors was made in determining the governors of adverbials.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.