Abstract

The sequential pattern mining was widely used to solve various business problems, including frequent user click pattern, customer analysis of buying product, gene microarray data analysis, etc. Many studies were going on these pattern mining to extract insightful data. All the studies were mostly concentrated on high utility sequential pattern mining (HUSP) with positive values without a distributed approach. All the ex-isting solutions are centralized which incurs greater computation and communication costs. In this paper, we introduce a novel algorithm for mining HUSPs including negative item values in support of a distributed approach. We use the Hadoop map reduce algorithms for processing the data in parallel. Various pruning techniques have been proposed to minimize the search space in a distributed environment, thus reducing the expense of processing. To our understanding, no algorithm was proposed to mine High Utility Sequential Patterns with negative item values in a distributed environment. So, we design a novel algorithm called DHUSP-N (Distributed High Utility Sequential Pattern mining with Negative values). DHUSP-N can mine high utility sequential patterns considering the negative item utilities from Bigdata.

Highlights

  • These days we can’t imagine the volume of data that is produced every day in the form of sequences [14] [15]

  • Utility was introduced to mine frequent patterns to resolve this issue by considering the profit and quantity of products. This introduce a novel field of study, namely, high utility itemset mining and high utility sequential pattern mining (HUSP), these are able to mine insightful knowledge, given a minimum utility defined by the user instead of minimum support

  • As this is the first of this kind there is no suitable algorithm to compare with DHUSP-N. The generic algorithm such as USPAN [23] is not appropriate to compare with DHUSP-N because it does not use negative values and it is a centralized approach

Read more

Summary

Introduction

These days we can’t imagine the volume of data that is produced every day in the form of sequences [14] [15]. Utility was introduced to mine frequent patterns to resolve this issue by considering the profit (quality) and quantity of products This introduce a novel field of study, namely, high utility itemset mining and high utility sequential pattern mining (HUSP), these are able to mine insightful knowledge, given a minimum utility defined by the user instead of minimum support. High utility sequential pattern (HUSP) mining [2] [23] is used to extract profitable and more beneficial sequential patterns from databases It considers a business intention such as profit, user interests, value, etc. We came up with a new method for mining sequential patterns with high utility that includes negative item values using a distributed approach. We suggest few pruning strategies to eliminate unpromising items that leads to minimize the search space in distributed circumstances

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.