Multiclass Classification with Imbalanced Datasets for Car Ownership Demand Model – Cost-Sensitive Learning

Patiphan Kaewwichian

doi:10.7307/ptt.v33i3.3728

Abstract

In terms of the travel demand prediction from the household car ownership model, if the imbalanced data were used to support the transportation policy via a machine learning model, it would negatively affect the algorithm training process. The data on household car ownership obtained from the study project for the expressway preparation in the Khon Kaen Province (2015) was an unbalanced dataset. In other words, the number of members of the minority class is lower than the rest of the answer classes. The result is a bias in data classification. Consequently, this research suggested balancing the datasets with cost-sensitive learning methods, including decision trees, k-nearest neighbors (kNN), and naive Bayes algorithms. Before creating the 3-class model, a k-folds cross-validation method was applied to classify the datasets to define true positive rate (TPR) for the model’s performance validation. The outcome indicated that the kNN algorithm demonstrated the best performance for the minority class data prediction compared to other algorithms. It provides TPR for rural and suburban area types, which are region types with very different imbalance ratios, before balancing the data of 46.9% and 46.4%. After balancing the data (MCN1), TPR values were 84.4% and 81.4%, respectively.

Highlights

Data classification is an analysis method used to define data patterns, classification models, and classification rules
The findings indicated that the k-nearest neighbors (kNN) algorithm provided a high true positive rate (TPR) with a higher accuracy rate in classifying the dataset in the minority class (Class 0) in every imbalanced ratio (Figure 3a)
false negative rate (FNR) was close to 100%; for instance, the decision tree (DT) model in the suburban area showed imbalance ratio (IR) = 5.20, whereas the kNN algorithm gave the lowest FNR in every IR depending on each area type

Summary

Introduction

Data classification is an analysis method used to define data patterns, classification models, and classification rules This method predicts different data types, either present or future, such as travel demand predictions. The selection for a high performing technique should rely on the parameters indicating the data classification performance, e.g. accuracy, precision, recall, F1-score. Still, these techniques do not work well on every dataset. The imbalanced data has courses with a different number of datasets At this point, the imbalanced data classification becomes a thought-provoking issue because some of the minority classes include either significant or outstanding data. For more effective data analysis, the model’s performance to classify the minority class needs to be improved before algorithm training with suitable parameters for the imbalanced data [5, 6]

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Promet - Traffic&Transportation	Publication Date: May 31, 2021
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multiclass Classification with Imbalanced Datasets for Car Ownership Demand Model – Cost-Sensitive Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Promet - Traffic&Transportation

Lead the way for us

Similar Papers

Smote vs. Random Undersampling for Imbalanced Data - Car Ownership Demand Model
Wuttikrai Chaipanha ... Patiphan Kaewwichian
Communications - Scientific letters of the University of Zilina | VOL. 24
Wuttikrai Chaipanha, et. al.Wuttikrai Chaipanha ... Patiphan Kaewwichian
25 Mar 2022
Communications - Scientific letters of the University of Zilina | VOL. 24

CAR OWNERSHIP DEMAND MODELING USING MACHINE LEARNING: DECISION TREES AND NEURAL NETWORKS
Patiphan Kaewwichian
International Journal of GEOMATE | VOL. 17
Patiphan KaewwichianPatiphan Kaewwichian
01 Oct 2019
International Journal of GEOMATE | VOL. 17

A Comparison Study of Cost-Sensitive Learning and Sampling Methods on Imbalanced Data Sets
Jin Wei Zhang ... Yi Lu
Advanced Materials Research | VOL. 271-273
Jin Wei Zhang, et. al.Jin Wei Zhang ... Yi Lu
01 Jul 2011
Advanced Materials Research | VOL. 271-273

Cost-sensitive learning for imbalanced medical data: a review
Imane Araf ... Ikram Chairi
Artificial Intelligence Review | VOL. 57
Imane Araf, et. al.Imane Araf ... Ikram Chairi
01 Mar 2024
Artificial Intelligence Review | VOL. 57

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multiclass Classification with Imbalanced Datasets for Car Ownership Demand Model – Cost-Sensitive Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Promet - Traffic&amp;Transportation

More From: Promet - Traffic&Transportation