A Hybrid of Proposed Filtration and Feature Selections to Enhance the Model Performance

E Sujatha Sujatha,R Radha Radha

doi:10.17485/ijst/v14i24.2017

Abstract

Objectives: Toextract and identify the subjective information of social media user from the unstructured data. To overcome the high dimensionality and sparsity those are the two major challenges in sentiment analysis of text datasets. To increase the model performance by using possibly minimum feature sets in a text classification problem. Methods: We proposed a new filtration method which is applied for the removal of correlated features and zero importance features in addition to the various feature selection methods. The various feature selections such as Mutual Info, Lasso, Recursive Feature Elimination and dimensionality reduction, Principal Component Analysis (PCA) have been used along with the proposed filtration to find the compelling features. This approach was evaluated using three Indian Government Schemes and these tweets were classified using Random Forest classifier. The performance was evaluated using various metrics such as accuracy, precision, recall, f1_score, log loss and roc-auc. Findings: In this research, we proposed a model for selecting relevant and non-correlated feature subsets from the unstructured dataset. From this model, accuracy of 92% with the minimum log loss 0.22 was achieved through the minimum number of feature set. Improvements: This study proves that the performance of the model will be improved by overcoming those two problems (dimensionality and sparsity). Here various feature selection methods have been applied with the proposed filtration in order to minimize the number of features. The computing time and the model performance will be improved as a result of decreasing the features. And this will be more effective in case of large datasets. Even though Random Forest performs well in high dimensional datasets we need some more optimization. Keywords: Mutual Information (MI); Lasso (L1); Recursive Feature Elimination (RFE); Random Forest (RF); Principal Component Analysis (PCA)

Highlights

According to Digital 2020 Global Overview Report on January 2020, nearly 60% of world’s population is already active in social media and this will increase more than half of the world’s population by the middle of this year
In addition to the various metrics as evaluated in present works, log loss was analysed in the proposed work
The improvement of proposed model was analysed in terms of computational time for every feature selection

Summary

Introduction

According to Digital 2020 Global Overview Report on January 2020, nearly 60% of world’s population is already active in social media and this will increase more than half of the world’s population by the middle of this year. Between July and September 2020, more than 180 million people started using social media equating to an average of almost 2 million new users every day. The latest data indicates that more than two-thirds (68%) of world’s population are using social media. Using social media people share their opinions every day about different issues such as events, persons, products, services, politics etc.,. Sentiment analysis in social media plays a vital role in monitoring of public opinion on certain topics. Sentiment analysis has various challenges in which high dimensionality and sparsity are the two

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Hybrid of Proposed Filtration and Feature Selections to Enhance the Model Performance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Indian Journal of Science and Technology

Lead the way for us

Journal: Indian Journal of Science and Technology	Publication Date: Jun 25, 2021
License type: cc-by

Similar Papers

Differential Diagnosis of Prostate Cancer Grade to Augment Clinical Diagnosis Based on Classifier Models with Tuned Hyperparameters.
Saleh T Alanezi ... Niall Colgan
Cancers | VOL. 16
Saleh T Alanezi, et. al.Saleh T Alanezi ... Niall Colgan
06 Jun 2024
Cancers | VOL. 16

A SYSTEMATIC LITERATURE REVIEW: RECURSIVE FEATURE ELIMINATION ALGORITHMS
Arif Mudi Priyatno ... Triyanna Widiyaningtyas
JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer) | VOL. 9
Arif Mudi Priyatno, et. al.Arif Mudi Priyatno ... Triyanna Widiyaningtyas
01 Feb 2024
JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer) | VOL. 9

Analysis on the Impact of Feature Selection on Cloud Intrusion Detection
Wenxiang Xu ... Ye He
-
Wenxiang Xu, et. al.Wenxiang Xu ... Ye He
07 Apr 2023
07 Apr 2023

Landslide susceptibility assessment using feature selection-based machine learning models
...
Geomechanics and Engineering | VOL. 25
, et. al. ...
01 Jan 2020
Geomechanics and Engineering | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Hybrid of Proposed Filtration and Feature Selections to Enhance the Model Performance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Indian Journal of Science and Technology