Abstract

Gender information is very important for the recommendation system in the online shopping website. However, gender data often face label missing and incorrect labelling problems caused by consumers’ unwillingness to actively disclose personal information, which leads to gender estimation results that cannot meet the needs of the product recommendation system. To discover the customers’ gender information, we explore the customers’ online shopping behavior, especially the items viewed in the shopping session, from the dataset provided by Vietnam FPT Group. The dataset is very imbalanced while the number of female samples is <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$3\times $ </tex-math></inline-formula> of the male samples. To address the imbalance issue, we cluster the female samples into three subsets and then train a two-layer classifier model to estimate the customers’ gender. Experimental results demonstrate that our proposed method could achieve a combined accuracy 78% on average, and takes less than 6 seconds on average. As a data mining model for gender prediction, our approach has a lightweight network structure and less time consumption.

Highlights

  • We have witnessed the rapid growth of the online shopping in the recent decade

  • We propose a data mining model, consisting of clustering, decision tree, and random forest models, to overcome these challenges and discover the customers’ gender information from the customers’ behavior, especially which products are viewed in a shopping session

  • We discover the correlation between personality diversity and gender in online shopping behavior, and explain the characteristics of customer shopping behavior in a specific web browsing log data set

Read more

Summary

INTRODUCTION

We have witnessed the rapid growth of the online shopping in the recent decade. As COVID-19 hit the world, more and more customers prefer shopping online instead of visiting the stores in person. If a shopping website wants to show the performance of its privacy protection mechanism, it could run our design on their customers’ data. We discover the correlation between personality diversity and gender in online shopping behavior, and explain the characteristics of customer shopping behavior in a specific web browsing log data set. These features are combined into feature combinations as candidate combinations for gender classification. We use personality diversity and data visualization to solve the problem of sample imbalance in the FTP group’s online shopping behavior dataset. The results prove the lightweight and high-efficiency of the proposed gender classification model

RELATED WORK
CLUSTERING BASED ON PERSONALITY DIVERSITY
A TWO-LAYER GENDER CLASSIFICATION MODEL
FEATURE COMBINATION SELECTION
CONCLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.