Abstract

Text corpus is important for assessment of language features and variation analysis. Machine learning techniques identify the language terms, features, text structures and sentiment from linguistic corpus. Sindhi language is one of the oldest languages of the world having proper script and complete grammar. Sindhi is remained less resourced language computationally even in this digital era. Viewing this problem of Sindhi language, Sindhi NLP toolkit is developed to solve the Sindhi NLP and computational linguistics problems. Therefore, this research work may be an addition to NLP. This research study has developed an own Sindhi sentimentally structured and analyzed corpus on the basis of accumulated results of Sindhi sentiment analysis tool. Corpus is normalized and analyzed for language features and variation analysis using DTM and TF-IDF techniques. DTM and TF-IDF analysis is performed using n-gram model. The supervised machine learning model is formulated using SVMs and K-NN techniques to perform analysis on Sindhi sentiment analysis corpus dataset. Precision, recall and f-score show better performance of machine learning technique than other techniques. Cross validation techniques is used with 10 folds to validate and evaluate data set randomly for supervised machine learning analysis. Research study opens doors for linguists, data analysts and decision makers to work more for sentiment summarization and visual tracking.

Highlights

  • Supervised classification is important and noteworthy technique of data mining [1], [2] to analyse the text

  • The frequency of grams show the significance of Sindhi corpus dataset, frequency is shown in form of document term matrix (DTM) and Term Frequency-Inverse Document Frequency (TF-IDF)

  • This study shows the comparative performance of supervised methods on Sindhi sentiment analysis corpus dara set

Read more

Summary

INTRODUCTION

Supervised classification is important and noteworthy technique of data mining [1], [2] to analyse the text. This research study has developed supervised machine learning model using SVMs. Random Forest and k-NN techniques to identify the true and false classified data from Sindhi structured and sentimental text corpus. The corpus is constructed on basis of accumulated results of Sindhi NLP tool for Sindhi text sentiment analysis. This study verifies the annotation accuracy of Sindhi NLP tool and assesses the performance of machine learning supervised classification model. Sindhi text is morphological rich and grammatically complex [7] and users of Sindhi language are settled all over the world [8] to work on Sindhi text corpus for sentiment analysis and structurization enable Sindhi users to express their reviews and opinions as well as provide organizations with information to evaluate the sentiments and opinions. Sentiment structurization [9] clarify the status and history of sentiments and helps in tracking the sentiments summaries

SINDHI TEXT STRUCTURIZATION FOR SENTIMENT ANALYSIS
MATERIAL AND METHODS
Sindhi Corpus Dataset
RESULT
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.