Supervised Learning for Binary Classification on US Adult Income

Li‐Pang Chen

doi:10.32732/jmo.2021.13.2.80

Abstract

In this project, various binary classification methods have been used to make predictions about US adult income level in relation to social factors including age, gender, education, and marital status. We first explore descriptive statistics for the dataset and deal with missing values. After that, we examine some widely used classification methods, including logistic regression, discriminant analysis, support vector machine, random forest, and boosting. Meanwhile, we also provide suitable R functions to demonstrate applications. Various metrics such as ROC curves, accuracy, recall and F-measure are calculated to compare the performance of these models. We find the boosting is the best method in our data analysis due to its highest AUC value and the highest prediction accuracy. In addition, among all predictor variables, we also find three variables that have the largest impact on the US adult income level.

Highlights

1.1 ObjectiveThe inequality of wealth and income is a huge concern around the globe, and governments in different countries are using different interventions to address income inequality
Extreme Gradient Boosting (XGBOOST) for prediction tasks; [5] implemented Principal Component Analysis (PCA) to generate and evaluate income prediction data based on the current population survey provided by the U.S Census Bureau. [6] tried to replicate Bayesian networks, decision tree induction and lazy classifier for the dataset and presented a comparative analysis of the predictive performances
In addition to the existing approaches, there are a lot of machine learning strategies that might be suitable to analyze this dataset, such as discriminant analysis, support vector machine (SVM), random forest, and neural nets [7,8,9]

Summary

Objective

Our strategy is to train a binary classifier, denoted as Y, to predict the whether a person earns more than $50K or not per year based on the social factors and to find out what factors influence the income level the most

Description of Dataset and Challenges of the project

Project-related literature

Main Contributions

Data cleaning

Split training set and testing set

Exploratory data analysis

Categorical variables

Numerical variables

Linear discriminant analysis

Quadratic discriminant analysis

Random forest

Model introduction

Model comparison

Boosting prediction

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Supervised Learning for Binary Classification on US Adult Income

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Modeling and Optimization

Lead the way for us

Journal: Journal of Modeling and Optimization	Publication Date: Dec 15, 2021
License type: CC BY 4.0

Similar Papers

Obstructive sleep apnea predicts 10-year cardiovascular disease-related mortality in the Sleep Heart Health Study: a machine learning approach.
Ao Li ... Linda S Powers
Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine | VOL. 18
Ao Li, et. al.Ao Li ... Linda S Powers
26 Aug 2021
Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine | VOL. 18

Predicting Chemical Carcinogens Using a Hybrid Neural Network Deep Learning Method.
Sarita Limbu ... Sivanesan Dakshanamurthy
Sensors (Basel, Switzerland) | VOL. 22
Sarita Limbu, et. al.Sarita Limbu ... Sivanesan Dakshanamurthy
26 Oct 2022
Sensors (Basel, Switzerland) | VOL. 22

Modelling labour productivity using SVM and RF: a comparative study on classifiers performance
Mohammed Hamza Momade ... Abdulhakim Tahir Umar
International Journal of Construction Management | VOL. 22
Mohammed Hamza Momade, et. al.Mohammed Hamza Momade ... Abdulhakim Tahir Umar
28 Mar 2020
International Journal of Construction Management | VOL. 22

A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction
Qi Li ... Haijun Zhai
Journal of the American Medical Informatics Association : JAMIA | VOL. 20
Qi Li, et. al.Qi Li ... Haijun Zhai
01 Sep 2013
Journal of the American Medical Informatics Association : JAMIA | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Supervised Learning for Binary Classification on US Adult Income

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Modeling and Optimization