Nowadays, predicting customer churn is essential for the success of any company. Loyal customers generate continuous revenue streams, resulting in long-term success and growth. Moreover, companies are increasingly prioritizing the retention of existing customers due to the higher costs associated with attracting new ones. Consequently, there has been a growing demand for advanced methods aimed at enhancing customer loyalty and satisfaction, as well as predicting churners. In our work, we focused on building a robust churn prediction model for the telecommunications industry based on large embeddings from large language models and logistic regression to accurately identify churners. We conducted extensive experiments using a range of embedding techniques, including OpenAI Text-embedding, Google Gemini Text Embedding, bidirectional encoder representations from transformers (BERT), Sentence-Transformers, Sent2vec, and Doc2vec, to extract meaningful features. Additionally, we tested various classifiers, including logistic regression, support vector machine, random forest, K-nearest neighbors, multilayer perceptron, naive Bayes, decision tree, and zero-shot classification, to build a robust model capable of making accurate predictions. The best-performing model in our experiments is the logistic regression classifier, which we trained using the extracted feature from the OpenAI Text-embedding-ada-002 model, achieving an accuracy of 89%. The proposed model demonstrates a high discriminative ability between churning and loyal customers.
Read full abstract