
The proliferation of Online shopping has been increasing in the past decades. Different online shopping companies investigate on precise shopping recommendation system based on the customers online viewing log and purchase log data. Even though the online shopping recommendation has been investigated for several years, both industrial and academia could not propose a generalized and efficient model to predict customers' shopping demand. Recently, the customers' gender information attract people's attention since the gender information reflects the customers' shopping behavior and preference. Nevertheless, the gender information collected from online shopping system are neither intact nor fake since customers don't want to leak their privacy. Hence, the estimation of customers' gender becomes critical for the online shopping recommendation system. This paper focuses on gender estimation based on customers' online viewing log collected by the FTP group, a leading information and communication enterprise in Vietnam. Given the imbalanced (population of female is 3 times of male) and ambiguous data, we propose our approach to estimate the gender with 75% accuracy. Specifically, we observe that the female samples naturally form 3 clusters when we select duration of session, number of items viewed, and average time spent on each item as the features. Then, we naturally divide the female set into 3 subsets and merge them with male set to generate the 3 training sets, which don't have imbalance issue. 3 individual models are trained from these 3 training sets and a new classifier is used to make the final decision based on the output of these 3 models. Our experimental results show that we can achieve 75% accuracy while the running time is less than 7 seconds.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call