Abstract
Event corpora are imperative to train event extraction models. Currently, most existing event corpora suffer from being available only in English, and their construction is limited by high annotation costs. This paper aims to construct a corpus that concerns social security causality events in Chinese and proposes a faster and less expensive construction method. The contributions are as follows: (i) An event corpus SSECau for the social security field in Chinese is constructed. They are from 2,235 web texts and microblogs, with event causality annotated at the document level. (ii) A corpus construction method with manual annotation and machine pre-tagging is proposed to improve accuracy and speed. (iii) A pre-tagging method based on BiLSTM-CRF (bidirectional long short-term memory and conditional random field) is deployed to extract events automatically. The experimental results show the best consistency between automatic pre-tagging and manual annotation can reach up to 82 %; while the dynamic tagging process improves both the labeling speed and accuracy. The SSECau corpus can aid the development and evaluation of event extraction models for the social security field; annotated cause-effect relationships at the document level can potentially enhance the training of complex extraction models; the proposed dynamic process with pre-tagging can serve as a reference for future corpus construction.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.