Abstract
In this project, various binary classification methods have been used to make predictions about US adult income level in relation to social factors including age, gender, education, and marital status. We first explore descriptive statistics for the dataset and deal with missing values. After that, we examine some widely used classification methods, including logistic regression, discriminant analysis, support vector machine, random forest, and boosting. Meanwhile, we also provide suitable R functions to demonstrate applications. Various metrics such as ROC curves, accuracy, recall and F-measure are calculated to compare the performance of these models. We find the boosting is the best method in our data analysis due to its highest AUC value and the highest prediction accuracy. In addition, among all predictor variables, we also find three variables that have the largest impact on the US adult income level.
Highlights
1.1 ObjectiveThe inequality of wealth and income is a huge concern around the globe, and governments in different countries are using different interventions to address income inequality
Extreme Gradient Boosting (XGBOOST) for prediction tasks; [5] implemented Principal Component Analysis (PCA) to generate and evaluate income prediction data based on the current population survey provided by the U.S Census Bureau. [6] tried to replicate Bayesian networks, decision tree induction and lazy classifier for the dataset and presented a comparative analysis of the predictive performances
In addition to the existing approaches, there are a lot of machine learning strategies that might be suitable to analyze this dataset, such as discriminant analysis, support vector machine (SVM), random forest, and neural nets [7,8,9]
Summary
Our strategy is to train a binary classifier, denoted as Y, to predict the whether a person earns more than $50K or not per year based on the social factors and to find out what factors influence the income level the most
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have