ABSTRACT In order to measure the similarity of non-equal length and non-numerical sequence effectively, in this paper, the access sequence similarity calculation method was proposed based on the characteristics of e-commerce user access sequence. The sliding window method was improved by increasing the similarity calculation of nodes and optimizing the sliding similarity calculation method. The key factor of Edit Distance on Real Sequences was optimized. It mainly includes the calculation method of increasing the similarity of nodes and optimizing the calculation method of sliding similarity; the calculation method of subcost in the editing distance of real sequences is optimized. Then, the optimized Edit Distance on Real Sequences was embedded into the improved sliding window method to replace the original distance calculation method. Based on the access sequence similarity calculation results, the clustering algorithm was used to get the e-commerce users type. The experimental results showed the following facts: The improved access sequence similarity algorithm can measure the similarity of non-numerical and non-equal length sequences more accurately; based on the similarity of access sequences, it is possible to divide the types of e-commerce users more effectively, besides the e-commerce users are mainly composed of young men, users’ online time shows obvious fragmentation characteristics, their online browsing behavior obeys long tail distribution, they still primarily buy hot items, and the e-commerce users can be divided into six categories.
Read full abstract