Abstract Research purpose. During the last decades, e-commerce sales have been rocketing, and this tendency is expected to increase over the following years. Due to the digital nature of e-commerce, one actual item can be sold on various e-commerce platforms, which leads to the exponential growth of the number of propositions. At the same time, the title and description of this item might differ. All these facts make more complicated for customers the process of searching on online platforms and change business approaches to the development of competitive strategy by e-commerce companies. The research question is how we can apply a machine learning algorithm to detect, based on the product information such as title and description, whether the items are actually relevant to the same product. Methodology. We suggest an approach that is based on a flexible textual data pipeline and the usage of a machine-learning model ensemble. Each step of the data processing is adjustable in dependence on domain issues and data features because we can achieve better results in solving the item-matching task. Findings. The item-matching model is developed. The proposed model is based on the semantic closeness of text descriptions of items and the usage of the core of keywords to present the reference item. Practical implications. We suggest an approach to improving the item searching process on different e-commerce platforms by dividing the process into two steps. The first step is searching for the related items among the set of reference items according to user preferences. The reference item description is created based on our item-matching model. The second step is surfing proposals of similar items on chosen e-commerce platforms. This approach can benefit buyers and sellers in various aspects, such as a low-price guarantee, a flexible strategy of similar products shown, and appropriate category-choosing recommendations.
Read full abstract