Abstract

Feature selection is a highly relevant task in any data-driven knowledge discovery project. The present research focuses on analysing the advantages and disadvantages of using mutual information (MI) and data-based sensitivity analysis (DSA) for feature selection in classification problems, by applying both to a bank telemarketing case. A logistic regression model is built on the tuned set of features identified by each of the two techniques as the most influencing set of features on the success of a telemarketing contact, in a total of 13 features for MI and 9 for DSA. The latter performs better for lower values of false positives while the former is slightly better for a higher false-positive ratio. Thus, MI becomes a better choice if the intention is reducing slightly the cost of contacts without risking losing a high number of successes. However, DSA achieved good prediction results with less features.

Highlights

  • Customer targeting (CT) is a classical problem addressed by business intelligence (BI) methods and techniques

  • A comparison was conducted between two renowned feature selection methods, mutual information (MI) and the data-based sensitivity analysis (DSA)

  • The advantages of applying the information theory concepts in order to eliminate redundant features were translated in a small subset of highly relevant features which enabled modelling faster and more accurately the outcome of clients subscribing or not a deposit

Read more

Summary

Introduction

Customer targeting (CT) is a classical problem addressed by business intelligence (BI) methods and techniques. It involves finding the right target customers within the context of a marketing campaign for selling the campaign product or service [1]. Entropy and MI are well-known concepts in Communications and Information Theory. They were originally introduced by Shannon [19] in a seminal paper, in order to find the optimal coding of a source on one hand and a noisy channel on the other hand. Entropy is related to the uncertainty or information content of a random variable.

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.