Fast Frequent Patterns Mining by Multiple Sampling With Tight Guarantee Under Bayesian Statistics.

Zhongjie Zhang,Jian Huang

doi:10.1109/tcyb.2021.3125196

Zhongjie Zhang, Jian Huang

Open Access

https://doi.org/10.1109/tcyb.2021.3125196

Copy DOI

Abstract

Sampling from large dataset is commonly used in the frequent patterns (FPs) mining. To tightly and theoretically guarantee the quality of the FPs obtained from samples, current methods theoretically stabilize the supports of all the patterns in random samples, despite only FPs do matter, so they always overestimate the sample size. We propose an algorithm called multiple sampling-based FPs mining (MSFP). The MSFP first generates the set of approximate frequent items (AFI), and uses the AFI to form the set of approximate FPs without supports ( AFP*), where it does not stabilize the value of any item's or pattern's support, but only stabilizes the relationship ≥ or < between the support and the minimum support, so the MSFP can use small samples to successively obtain the AFI and AFP*, and can successively prune the patterns not contained by the AFI and not in the AFP*. Then, the MSFP introduces the Bayesian statistics to only stabilize the values of supports of AFP*'s patterns. If a pattern's support in the original dataset is unknown, the MSFP regards it as random, and keeps updating its distribution by its approximations obtained from the samples taken in the progressive sampling, so the error probability can be bound better. Furthermore, to reduce the I/O processes in the progressive sampling, the MSFP stores a large enough random sample in memory in advance. The experiments show that the MSFP is reliable and efficient.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fast Frequent Patterns Mining by Multiple Sampling With Tight Guarantee Under Bayesian Statistics.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on cybernetics

Lead the way for us

Journal: IEEE transactions on cybernetics	Publication Date: May 1, 2023
License type: publisher-specific, author manuscript

Similar Papers

Closed frequent similar pattern mining: Reducing the number of frequent similar patterns without information loss
Ansel Y Rodríguez-González ... Enrique Munoz De Cote
Expert Systems With Applications | VOL. 96
Ansel Y Rodríguez-González, et. al.Ansel Y Rodríguez-González ... Enrique Munoz De Cote
09 Dec 2017
Expert Systems With Applications | VOL. 96

다중 최소 임계치 기반 빈발 패턴 마이닝의 성능분석
Heungmo Ryang ... Unil Yun
Journal of Korean Society for Internet Information | VOL. 14
Heungmo Ryang, et. al.Heungmo Ryang ... Unil Yun
31 Dec 2014
Journal of Korean Society for Internet Information | VOL. 14

Fault Tolerance Patterns Mining in Dynamic Databases
Delvi Ester ... Guanling Lee
-
Delvi Ester, et. al.Delvi Ester ... Guanling Lee
01 Jan 2015
01 Jan 2015

Mining interesting subgraphs by output space sampling
Mohammad Al Hasan
ACM SIGKDD Explorations Newsletter | VOL. 12
Mohammad Al HasanMohammad Al Hasan
09 Nov 2010
ACM SIGKDD Explorations Newsletter | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast Frequent Patterns Mining by Multiple Sampling With Tight Guarantee Under Bayesian Statistics.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on cybernetics