Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation

Zne-Jung Lee,Chou-Yuan Lee,Li-Yun Chang,Natsuki Sano

doi:10.3390/sym13091557

Abstract

To beat competition and obtain valuable information, decision-makers must conduct in-depth machine learning or data mining for data analytics. Traditionally, clustering and classification are two common methods used in machine mining. For clustering, data are divided into various groups according to the similarity or common features. On the other hand, classification refers to building a model by given training data, where the target class or label is predicted for the test data. In recent years, many researchers focus on the hybrid of clustering and classification. These techniques have admirable achievements, but there is still room to ameliorate performances, such as distributed process. Therefore, we propose clustering and classification based on distributed automatic feature engineering (AFE) for customer segmentation in this paper. In the proposed algorithm, AFE uses artificial bee colony (ABC) to select valuable features of input data, and then RFM provides the basic data analytics. In AFE, it first initializes the number of cluster k. Moreover, the clustering methods of k-means, Wald method, and fuzzy c-means (FCM) are processed to cluster the examples in variant groups. Finally, the classification method of an improved fuzzy decision tree classifies the target data and generates decision rules for explaining the detail situations. AFE also determines the value of the split number in the improved fuzzy decision tree to increase classification accuracy. The proposed clustering and classification based on automatic feature engineering is distributed, performed in Apache Spark platform. The topic of this paper is about solving the problem of clustering and classification for machine learning. From the results, the corresponding classification accuracy outperforms other approaches. Moreover, we also provide useful strategies and decision rules from data analytics for decision-makers.

Highlights

We propose the most up-to-date issue of clustering and classification based on distributed automatic feature engineering for customer segmentation
We propose clustering and classification based on distributed automatic feature engineering (AFE)
The proposed clustering and classification based on distributed AFE is processed in Spark

Summary

Introduction

With the rapid changes of the marketing environment, it is fiercely competitive and is becoming more and more complicated for decision-making [1,2]. The decision-makers always want to process data analytics to pursue the maximum profits. The primary task is to discover the useful information of customers. A dataset obtained from marketing always includes much raw data, such as products and items. It is hard for machine mining to find the useful information between customer, transaction log, and purchase behavior

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Symmetry	Publication Date: Aug 24, 2021
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

Automatic Machine Learning Method for Hyper-parameter Search
Minglan Su ... Chao Xiang
Journal of Physics: Conference Series | VOL. 1802
Minglan Su, et. al.Minglan Su ... Chao Xiang
01 Mar 2021
Journal of Physics: Conference Series | VOL. 1802

A hybrid biogeography-based optimization and fuzzy C-means algorithm for image segmentation
Minxia Zhang ... Weixuan Jiang
Soft Computing | VOL. 23
Minxia Zhang, et. al.Minxia Zhang ... Weixuan Jiang
04 Dec 2017
Soft Computing | VOL. 23

ILivSpot: Secure Biometric System based on Iris Liveliness Detection
Sunil Kumar ... Vijay Kumar Lamba
International Journal of Engineering and Advanced Technology | VOL. 9
Sunil Kumar, et. al.Sunil Kumar ... Vijay Kumar Lamba
30 Dec 2020
International Journal of Engineering and Advanced Technology | VOL. 9

빔서치 기반 전력산업용 머신러닝 자동화 파이프라인 시스템
Gwangseon Jang ... Myeong-Ha Hwang
The transactions of The Korean Institute of Electrical Engineers | VOL. 70
Gwangseon Jang, et. al.Gwangseon Jang ... Myeong-Ha Hwang
31 Dec 2021
The transactions of The Korean Institute of Electrical Engineers | VOL. 70

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Clustering and Classification Based on Distributed Automatic Feature Engineering for Customer Segmentation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry