Abstract

Sequence pattern mining is a core task of data mining in various fields. Clickstream pattern mining, a variant of sequence pattern mining, is very useful in e-commerce, and is used to analyze, evaluate and predict online customer behaviors, attracting the interest of many researchers. Clickstream pattern mining has become important due to its wide range of applications. However, most previous approaches are not suitable to apply when clickstreams are inserted into a database, because this task is too time consuming. The challenge of this problem is how to find a solution to minimize the runtimes and reduce the number of times the original databases are scanned to reduce the computational cost in the mining process on incremental databases. In this paper, we propose two effective methods for mining clickstream patterns from incremental databases, named inCMUB and Eff-inCMUB, based on the pre-large concept. inCMUB inserts new clickstreams from an inserted database into the existing tree and mines all frequent clickstream patterns, while Eff-inCMUB is a new approach, and builds a new tree from the inserted database to find pre-large 1-patterns and then it updates the pre-large clickstream patterns mined from the original database to extract frequent clickstream patterns. The experiments show that our proposed methods outperform the SMUB algorithm in terms of runtimes, memory usage and scalability on real-word clickstream databases.

Highlights

  • Sequential pattern mining is an essential technique of data mining, and has been widely applied in various fields such as market analysis [1], stocks [2], weather prediction [3], medical treatment [4]–[6], mobile robots [7], and so on

  • Many methods based on sequential pattern mining were developed, such as those used to analyze intrusion detection systems (IDS) [8], generate sitemaps [9], extract all recency-based sequential patterns (RF- Miner) [10], and carry out Vehicle Trajectory Prediction (VTP) [11]

  • The server will track how many pages are served to the visitor, how long it takes each page to load, how much data is transmitted before the user moves on, etc

Read more

Summary

INTRODUCTION

Sequential pattern mining is an essential technique of data mining, and has been widely applied in various fields such as market analysis [1], stocks [2], weather prediction [3], medical treatment [4]–[6], mobile robots [7], and so on. Most existing clickstream pattern mining methods [12]–[16] only focus on static sequence databases, ignoring incremental databases This is despite the fact that many databases are updated incrementally, such as customer online transaction databases in e-commerce, which grow because new transactions are appended into the existing databases daily when new customers or existing customers buy goods, the same as happens with stock price sequences which grow incrementally over time. Previous approaches are not suitable for handling this situation because the result mined from the old database is no longer valid on the updated database, and it is extremely inefficient to mine the updated databases The challenge of this problem is how to find a solution to increase the runtime and reduce the number of times the original.

RELATED WORKS
BACKGROUND
CLICKSTREAM PATTERN MINING ON INCREMENTAL DATABASES
PRE-LARGE CONCEPT
SPPC-TREE
PROPOSED METHODS FOR CLICKSTREAM PATTERN MINING IN INCREMENTAL DATABASES
EXPERIMENTAL EVALUATION
THE RUNTIME
THE MEMORY USAGE
SCALABILITY ON LARGE DATABASE
CONCLUSIONS AND FUTURE
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call