Abstract

Modern neural-based approaches, which usually rely on large volumes of training data, have presented magnificent progress in various fields of text processing. However, these approaches have not been studied adequately in low resource languages. In this paper we focus on title generation and keyphrase extraction in the Persian language. We build a large corpus of scientific Persian texts which enables us to train end-to-end neural models for generating titles and extracting keyphrases. We investigate the effect of input length on modeling Persian text in both tasks. Additionally, we compare subword-level processing with the word-level one and show that even a straightforward subword encoding method enhances results greatly on Persian as an agglutinative language. For keyphrase extraction we formulate the task in two different ways: training the model to output all keyphrases at once; training the model to output one keyphrase each time and then extract n-best keyphrases during decoding. The latter improves the performance greatly.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.