Preprocessing of Nepali News Corpus for Downstream Tasks

Sushil Awale,Birodh Rijal,Santa B Basnet,Suraj Prasai

doi:10.3126/nl.v35i01.46553

Preprocessing of Nepali News Corpus for Downstream Tasks

Sushil Awale, Birodh Rijal + Show 2 more

Open Access

https://doi.org/10.3126/nl.v35i01.46553

Copy DOI

Journal: Nepalese Linguistics

Publication Date: Jul 11, 2022

Affiliation: International Centre for Integrated Mountain Development

#Learning Task #Incorrect Outcomes + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Text collected from online resources introduce a lot of errors which results in incorrect learning outcomes in automatic language learning tasks. In this paper, we discuss a Nepali text preprocessing pipeline to generate clean corpus. This pipeline is tested using a language model to observe impact of each steps in learning task. The relevancy of this work lies in systematizing the procedure in the development of standard Nepali corpus.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Nepalese Linguistics

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.