Comparative study of term-weighting schemes for environmental big data using machine learning

Jungjin Kim,Han-Ul Kim,Jan Adamowski,Shadi Hatami,Hanseok Jeong

doi:10.1016/j.envsoft.2022.105536

Abstract

Widely-used term-weighting schemes and machine learning (ML) classifiers with default parameter settings were assessed for their performance when applied to environmental big data analysis. Five term-weighting schemes [term frequency (TF), TF–inverse document frequency (TF-IDF), Best Match 25 (BM25), TF–inverse gravity moment (TF-IGM), and TF–IDF–inverse class frequency (TF-IDF-ICF)] and five different ML classifiers [support vector machine (SVM), Naive Bayes (NB), logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost)] were tested. The optimal text-classification scheme and classifier were TF-IDF-ICF and LR, respectively. Based on evaluation criteria, their combination resulted in the best performance of all scheme and classifier combinations for the full environmental data analysis. Category classification performance differed according to the environmental section (climate, air, water, or waste/garbage), with the best performance being achieved for climate, and the poorest for water. This demonstrated the importance of selecting term-weighting schemes and ML classifiers in human-generated environmental big data analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparative study of term-weighting schemes for environmental big data using machine learning

Abstract

Talk to us

Similar Papers

More From: Environmental Modelling & Software

Lead the way for us

Journal: Environmental Modelling & Software	Publication Date: Sep 23, 2022
Citations: 6

Similar Papers

Prediction and feature selection of low birth weight using machine learning algorithms
Tasneem Binte Reza ... Nahid Salma
Journal of Health, Population and Nutrition | VOL. 43
Tasneem Binte Reza, et. al.Tasneem Binte Reza ... Nahid Salma
12 Oct 2024
Journal of Health, Population and Nutrition | VOL. 43

A machine learning tool for collecting and analyzing subjective road safety data from Twitter
Mohammad Majid Abedi ... Emanuele Sacchi
Expert Systems With Applications | VOL. 240
Mohammad Majid Abedi, et. al.Mohammad Majid Abedi ... Emanuele Sacchi
17 Nov 2023
Expert Systems With Applications | VOL. 240

A comparative study of supervised Machine Learning classifiers for Intrusion Detection in Internet of Things
Naveen Saran ... Nishtha Kesswani
Procedia Computer Science | VOL. 218
Naveen Saran, et. al.Naveen Saran ... Nishtha Kesswani
01 Jan 2023
Procedia Computer Science | VOL. 218

Comparison and Evaluation of Machine Learning-Based Classification of Hand Gestures Captured by Inertial Sensors
Ivo Stančić ... Josip Musić
Computation | VOL. 10
Ivo Stančić, et. al.Ivo Stančić ... Josip Musić
14 Sep 2022
Computation | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparative study of term-weighting schemes for environmental big data using machine learning

Abstract

Talk to us

Similar Papers

More From: Environmental Modelling &amp; Software

More From: Environmental Modelling & Software