Internetikeele automaatne süntaktiline analüüs kitsenduste grammatikaga

Dage Särg

doi:10.5128/erya12.15

Abstract

Artikkel kirjeldab eesti keele kitsenduste grammatika kohandamist internetikeelele. Selleks parsiti 19 809 sone suurune jututubade korpus eesti kirjakeele jaoks valjatootatud reeglistikuga. Korpuse margenduse kasitsi kontrollimisel leitud vigade pohjal tehti reeglistikku muudatusi neljas etapis: osalausepiiride tuvastamine, uhendverbide tuvastamine, pindsuntaktiline analuus ning soltuvussuntaktiline analuus. Too kaigus leiti, et internetikeele suntaksi olulisemateks erijoonteks on laialdane partiklite ja utete kasutus, vaiksem taiendite osakaal, lausete luhidus ja valjajatteliste lausete sage esinemine. Reeglistiku kohandamise tulemusel paranesid nii pind- kui soltuvussuntaktilise analuusi naitajad. Koige enam vigu tekkis subjektide, predikatiivide ja adverbiaalide funktsioonide margendamisel. Soltuvussuntaktilisel analuusil esines enim vigu adverbiaalide soltuvusmargendites. Syntactic analysis of Estonian netspeak using Constraint Grammar The paper provides an overview of an attempt to adapt the Estonian Constraint Grammar rule set for netspeak. The rule set has been developed by Kaili Muurisep and Tiina Puolakainen for shallow and dependency parsing of Estonian literary language, and it has previously been adapted for shallow parsing of spoken Estonian by Kaili Muurisep and Heli Uibo. First, in order to adapt the rules, a chatroom corpus was parsed with the existing rule set. The corpus was manually revised and based on the errors that were found, changes were made to the rule set. The changes regarded detection of clause boundaries and particle verbs, as well as assignment of syntactic tags and dependency relations. Extensive use of discourse particles and direct addresses, short sentence length, and small percentage of attributes among the syntactic functions used in text appeared to be the most distinctive features of netspeak, as well as the large amount of elliptical sentences from which, in addition to other syntactic functions, a predicate can be left out. As a result of adapting the rule set, the results of both shallow and dependency parsing improved. The most error-prone syntactic functions were subjects, predicatives, and adverbials. In dependency parsing, the largest number of errors was made in determining the governors of adverbials.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Internetikeele automaatne süntaktiline analüüs kitsenduste grammatikaga

Abstract

Talk to us

Similar Papers

More From: Eesti Rakenduslingvistika Ühingu aastaraamat = Estonian Papers in Applied Linguistics

Lead the way for us

Journal: Eesti Rakenduslingvistika Ühingu aastaraamat = Estonian Papers in Applied Linguistics	Publication Date: May 4, 2016
License type: cc-by-nc

Similar Papers

A computer-assisted dictionary-making system for Chinese English learner's dictionary
Wenxin Xiong ... Guohua Chen
-
Wenxin Xiong, et. al.Wenxin Xiong ... Guohua Chen
01 Sep 2009
01 Sep 2009

Noun Phrase Chunking for Turkish Using a Dependency Parser
Mucahit Kutlu ... Ilyas Cicekli
-
Mucahit Kutlu, et. al.Mucahit Kutlu ... Ilyas Cicekli
04 Aug 2015
04 Aug 2015

Multi Task Learning Based Shallow Parsing for Indian Languages
Pruthwik Mishra ... Vandan Mujadia
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. -
Pruthwik Mishra, et. al.Pruthwik Mishra ... Vandan Mujadia
11 May 2024
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. -

Parser-based analysis of syntax-lexis interactions
Hans Martin Lehmann ... Gerold Schneider
-
Hans Martin Lehmann, et. al.Hans Martin Lehmann ... Gerold Schneider
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Internetikeele automaatne süntaktiline analüüs kitsenduste grammatikaga

Abstract

Talk to us

Similar Papers

More From: Eesti Rakenduslingvistika Ühingu aastaraamat = Estonian Papers in Applied Linguistics