Abstract
Expression-Level Information Extraction is a challenging Natural Language Processing (NLP) task that aims to retrieve important information from the linguistic documents. However, there still lacks the up-to-date data sources for accelerating the Expression-Level Information Extraction, especially in the field of Chinese financial high technology. To fill this gap, we present Fintech Key-Phrase: a human-annotated Chinese financial high technology field related key-phrase dataset, which contains more than 12K paragraphs together with the annotated domain-specific key-phrases. We extract the publicly released reports on Chinese Management’s Discussion and Analysis (CMD &A) from the well-known Chinese Research Data Services Platform (CNRDS) and then filter the Financial High-Tech related reports. The Financial High-Tech key-phrases are annotated through pre-defined philosophy guidelines to control the annotation quality. To demonstrate that our released Fintech Key-Phrase helps retrieve valuable information in the field of Chinese financial high technology, we adopt several superior Information Retrieval systems as representative baselines to validate its significance and report the performance statistics correspondingly. We hope this dataset can facilitate the scientific research and further exploration in the Chinese Financial High-Tech domain. We have made our Fintech Key-Phrase dataset and experimental code of the adopted baselines accessible at Github ( https://github.com/albert-jin/Fintech-Key-Phrase/ ). To motivate newcomers to get involved in the Information Retrieval of the Chinese financial high technology field, we have built an open website ( https://albert-jin.github.io/FintechKP-frontend/ ) and a real-time information retrieval API tool ( https://31863ew564.zicp.fun/information_retrieval/ ).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.