Abstract

In the Internet of Things (IoT) era, information is collected by sensor devices, resulting in data loss or uncertain data and other consequences. We need to represent the uncertain data collected using probabilities to extract the useful information for production and application from a huge indeterminate data warehouse. The data in the database has a particular order in time or space, so the High-Utility Probability Sequential Pattern Mining (HUPSPM) has become a new investigation and analysis topic in data processing. After the progress of timestamp, many efficient algorithms for sequential mining have been developed. However, these algorithms have a limitation: they can only be executed in a stand-alone environment and are only suitable for small datasets. Therefore, introducing an advanced graph framework for processing large datasets addresses the shortcomings of the existing methods. The proposed algorithm can avoid repeated database searching, splitting the database, and improve the parallel computing capability. The initial database is pruned according to the existing pruning strategy to effectively reduce the number of candidate sets effectively. Experiments show that the algorithm presented in this paper has excellent advantages in mining high-utility probability sequences in large datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call