Abstract

In a data set with many categorical variables and several continuous valuables, the relationship between continuous random variables may differ from category to category for a given categorical variable. To study how categorical variables may affect the dependent structure of continuous variables, we proposed two splitting criteria constructed based on copula entropy to build decision trees serving for different purposes. One type of tree can be used to identify the attributes or combinations of them under which the continuous variables have a strong relationship. The other type of tree is used to classify regions with different strength of relationship. Applying these methods to the survey data on the status of poor families of Sichuan province, it is found that the method successfully evaluated the effectiveness of the poverty alleviation policies.

Highlights

  • AND MOTIVATIONIn 2018, the Chinese government launched a survey on the status of poor households after getting rid of poverty

  • The method we proposed is similar to a decision tree, but the difference is that this time the target variable is copulas or relationship between random variables, not a single variable

  • Attribute C was chosen to split the data since it has the biggest information gain (InfGain)

Read more

Summary

INTRODUCTION

In 2018, the Chinese government launched a survey on the status of poor households after getting rid of poverty. Both figures showed that the families which answered that industry poverty alleviation is helpless (red) are with relatively low income This positive relationship between income and answers does not implies the effectiveness of policies. We do the same calculation for operational income of the families that answered whether industry poverty alleviation helps, as well as the wage income, so that a total of four confidence intervals are obtained and listed in table 2. When employment alleviation policies work better, there may be a shift of family members from originally being engaged in business activities to obtaining employment, which can lead to a decrease in household industrial income, or a significant difference in the effects of policies within different groups of people. The two split criteria constructed in this article are both used to study the relationship between variables

DECISION TREE FOR RELATIONSHIP
2) SIMULATION RESULTS
REAL DATA EXAMPLE
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.