Abstract

Product classification is the key issue in e-commerce domains. Many products are released to the market rapidly and to select the correct category in taxonomy for each product has become a challenging task. The application of classification model is useful to precisely classify the products. The study proposed a method to apply clustering prior to classification. This study has used a large-scale real-world data set to identify the efficiency of clustering technique to improve the classification model. The conventional text classification procedures are used in the study such as preprocessing, feature extraction and feature selection before applying the clustering technique. Results show that clustering technique improves the accuracy of the classification model. The best classification model for all three approaches which are classification model only, classification with hierarchical clustering and classification with K-means clustering is K-Nearest Neighbor (KNN) model. Even though the accuracy of the KNN models are the same across different approaches but the KNN model with K-means clustering had the shortest time of execution. Hence, applying K-means clustering prior to KNN model helps in reducing the computation time.

Highlights

  • Online commerce has rapidly grown since the past decade

  • The data used for this study have been collected from Tesco online stores using prototype web scrapers developed under STATSBDA project namely Price Intelligence (PI) by Department of Statistics Malaysia (DOSM)

  • The study has evaluated four classification models which are formed with two clustering algorithms known as Hierarchical and K-means clustering algorithms

Read more

Summary

Introduction

Online commerce has rapidly grown since the past decade. The experience of purchasing goods from physical stores and via online shopping. There are millions of products on e-commerce websites such as Amazon, e-Bay, 11street, and Lazada sold by thousands of sellers. The ability of the websites to quickly and accurately retrieve the desired products for the consumers is the key component of being successful [1]. Each product is commonly represented by metadata such as its title, description, category, image, price and so on, where most of them are assigned manually by human sellers. Unlike the title and price, it is possible to automatically classify the product categories from the metadata. The automatic product categorization can reduce the time and economic costs as well as improves the accuracy of category assignment of the same product listed by different sellers [2]. Precisely categorizing products emerged as a key issue in e-commerce domains

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call