With the proliferation of social platforms for online shopping, accurately predict- ing item categories from multilingual reviews has become crucial for informed decision-making. This paper addresses the significant challenge of categoriz- ing reviews across diverse languages by enhancing Transformer models for multilingual review classification, addressing key challenges such as efficiency, scalability, and interpretability. To improve model efficiency, we integrate sparse attention mechanisms using mBert, XLM-RoBERTa, and model distillation via DistilBERT, thus balancing performance with reduced computational cost. For data augmentation, we employ back-translation to enrich the training data, thereby enhancing model robustness and generalization across diverse languages. Additionally, to enhance model interpretability, we employ Local Interpretable Model-Agnostic Explanations to provide clear and actionable insights regarding model predictions. The proposed methods are applied to multilingual reviews sourced from products listed on Amazon covering the Spanish, English, German, Hindi, Chinese, Japanese, and French languages. The model achieves a classifica- tion accuracy of 88% across 32 product categories, demonstrating its effectiveness in solving the multilingual multiclass categorization problem in the retail sector. This work illustrates the potential of combining advanced natural language pro- cessing techniques with innovative approaches to improve the efficiency, accuracy, and interpretability of classification models, thereby facilitating better decision- making in online shopping platforms. With continued research, these models will offer increasingly robust solutions for processing and understanding multilingual data.
Read full abstract