Abstract

In this article we present some statistical data on the distribution of parts of speech and dependency relations in a large manually annotated Hungarian Treebank, the Szeged Dependency Treebank. We hypothesize that the domain of the text influences the distribution of the above elements, thus we pay special attention to differences between domains. We present the characteristic rank-frequency distributions of parts of speech and dependency relations in Hungarian and analyse the domain similarities and differences among sub-corpora as regards the above distributions. Our results reveal that the computer and newspaper texts are most similar to each other while the domains literature and compositions also exhibit some similarities. On the other hand, the business news and the law sub-corpora are unique, both having their own characteristics.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.