Abstract
The rise of digital information in many languages, including Shona, highlights the significance of developing effective text summarizing techniques to promote information accessibility and usability. This work fills a major gap in the natural language processing (NLP) for the Shona language, which is widely spoken in Zimbabwe and its surrounding areas but has received little attention. The lack of pre-trained language models specifically designed for Shona, the intricacy of Shona's morphology, and the scarcity of annotated datasets provide the main obstacles to Shona text summarization.[1] The goal of this research is to create and modify contemporary machine learning methods for efficient Shona text summarizing in order to address these issues. By gathering and analyzing texts from a variety of sources, such as news stories, scholarly papers, and social media, we produced large annotated corpora. These datasets were utilized to fine-tune existing NLP models, such as Transformer-based architectures, ensuring they account for Shona’s specific language traits. To address the morphological and syntactic complexities of Shona, our solution combines statistical and rule-based techniques. When compared to baseline methods, the results show a significant improvement in the relevancy and accuracy of Shona text summaries. In addition to serving as a starting point for further NLP research in underrepresented languages, the generated models help Shona-speaking people in the areas of business, education, and media. By encouraging inclusivity and linguistic variety, showcasing the possibility for cross- lingual breakthroughs, and emphasizing the ethical development of technology, this research adds to the larger area of NLP.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Innovative Science and Research Technology (IJISRT)
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.