Efficient Parallel Boolean Expression Matching

  • Abstract
  • References
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Boolean expression matching plays an important role in many applications. However, existing solutions still show efficiency and scalability limitations. For example, existing solutions often exhibit degraded performance when applied to high-dimensional and diverse workloads, and existing algorithms rarely consider supporting concurrent matching and index updating under multicore environments. To overcome these limitations, in this article, we first design the PS-Tree data structure to efficiently index Boolean expressions in one dimension. By dividing predicates into disjoint predicate spaces, PS-Tree achieves high matching performance and good expressiveness. Based on the PS-Tree , we propose a Boolean expression matching algorithm called PSTDynamic . By dynamically adjusting the index and efficiently filtering out a large proportion of unmatching expressions, PSTDynamic achieves high matching performance under high-dimensional and diverse workloads. For multicore environment, we further extend the PSTDynamic algorithm to PSTParallel to achieve scalability with lower matching latency and higher matching throughput. We run experiments on both synthetic and real-world datasets. The experiments verify that our proposed algorithms show high efficiency and parallelism. Moreover, they also achieve fast index construction and a small memory footprint. Comprehensive experiments show that our solutions drastically outperform state-of-the-art methods.

ReferencesShowing 10 of 35 papers
  • Open Access Icon
  • Cite Count Icon 34
  • 10.1145/1807167.1807171
Efficiently evaluating complex boolean expressions
  • Jun 6, 2010
  • Marcus Fontoura + 6 more

  • Open Access Icon
  • Cite Count Icon 506
  • 10.1007/bf01231606
The TV-tree: An index structure for high-dimensional data
  • Oct 1, 1994
  • The VLDB Journal
  • King-Ip Lin + 2 more

  • Open Access Icon
  • Cite Count Icon 83
  • 10.14778/1687627.1687633
Indexing Boolean expressions
  • Aug 1, 2009
  • Proceedings of the VLDB Endowment
  • Steven Euijong Whang + 6 more

  • Cite Count Icon 20
  • 10.1109/tkde.2015.2421331
Safe Distribution and Parallel Execution of Data-Centric Workflows over the Publish/Subscribe Abstraction
  • Oct 1, 2015
  • IEEE Transactions on Knowledge and Data Engineering
  • Mohammad Sadoghi + 4 more

  • Open Access Icon
  • Cite Count Icon 1845
  • 10.1007/3-540-49257-7_15
When Is “Nearest Neighbor” Meaningful?
  • Jan 1, 1999
  • Kevin Beyer + 3 more

  • Cite Count Icon 5575
  • 10.1145/602259.602266
R-trees
  • Jan 1, 1984
  • Antonin Guttman

  • Open Access Icon
  • Cite Count Icon 55
  • 10.14778/2732296.2732298
An efficient publish/subscribe index for e-commerce databases
  • Apr 1, 2014
  • Proceedings of the VLDB Endowment
  • Dongxiang Zhang + 2 more

  • Cite Count Icon 75
  • 10.5555/1251086.1251089
Client behavior and feed characteristics of RSS, a publish-subscribe system for web micronews
  • Oct 19, 2005
  • Hongzhou Liu + 2 more

  • Cite Count Icon 51
  • 10.14778/1453856.1453906
Scalable ranked publish/subscribe
  • Aug 1, 2008
  • Proceedings of the VLDB Endowment
  • Ashwin Machanavajjhala + 3 more

  • Open Access Icon
  • Cite Count Icon 886
  • 10.1145/321479.321481
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
  • Oct 1, 1968
  • Journal of the ACM
  • Donald R Morrison

Similar Papers
  • Research Article
  • Cite Count Icon 1
  • 10.7465/jkdi.2014.25.3.611
Nonparametric Bayesian estimation on the exponentiated inverse Weibull distribution with record values
  • May 31, 2014
  • Journal of the Korean Data and Information Science Society
  • Jung In Seo + 1 more

The inverse Weibull distribution (IWD) is the complementary Weibull distributionand plays an important role in many application areas. In Bayesian analysis, Soland’smethod can be considered to avoid computational complexities. One limitation of thisapproach is that parameters of interest are restricted to a nite number of values. Thispaper introduce nonparametric Bayesian estimator in the context of record statisticsvalues from the exponentiated inverse Weibull distribution (EIWD). In stead of Soland’sconjugate piror, stick-breaking prior is considered and the corresponding Bayesian esti-mators under the squared error loss function (quadratic loss) and LINEX loss functionare obtained and compared with other estimators. The results may be of interest espe-cially when only record values are stored.Keywords: Exponentiated inverse Weibull distribution, nonparametric Bayesian esti-mation, record statistics, stick-breaking prior. 1. Introduction The inverse Weibull distribution (IWD) is the complementary Weibull distribution andplays an important role in many applications including the dynamic components of dieselengines, the times to breakdown of an insulating uid subject to the action of constanttensioin and ood data (Nelson, 1982; Maswadah, 2003). Also, it has been used quite exten-sively when the data indicate a monotone hazard function beacuse of the exibility of thepdf and its corresponding hazard function. Studies for the inverse Weibull distribution wereconducted by many authors. Calabria and Pulcini (1994) studied Bayes 2-sample predictionfor the inverse Weibull distribution. Mahmoud et al. (2003) considered the order statisticsarising from the inverse Weibull distribution and derived the exact expression for the singlemoments of order statistics. They also obtained the variances and covariances based on themoments of order statistics.

  • Book Chapter
  • Cite Count Icon 3
  • 10.1007/978-3-030-29551-6_56
Building Chinese Legal Hybrid Knowledge Network
  • Jan 1, 2019
  • Sheng Bi + 4 more

Knowledge graphs play an important role in many applications, such as data integration, natural language understanding and semantic search. Recently, there has been some work on constructing legal knowledge graphs from legal judgments. However, they suffer from some problems. First, existing work follows the Western legal system, thus cannot be applied to other legal systems, such as Asian legal systems; Second, existing work intends to build a precise legal knowledge graph, which is often not effective, especially when constructing the precise relationship between legal terms. To solve these problems, in this paper, we propose a framework for constructing a legal hybrid knowledge network from Chinese encyclopedia and legal judgments. First, we construct a network of legal terms through encyclopedia data. Then, we build a legal knowledge graph through Chinese legal judgments which captures the strict logical connections in the legal judgments. Finally, we build a Chinese legal hybrid knowledge network by combining the network of legal terms and the legal knowledge graph. We also evaluate the algorithms which are used to build the legal hybrid knowledge network on a real-world dataset. Experimental results demonstrate the effectiveness of these algorithms.

  • Research Article
  • Cite Count Icon 5
  • 10.1609/aaai.v36i4.20356
Anisotropic Additive Quantization for Fast Inner Product Search
  • Jun 28, 2022
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Jin Zhang + 5 more

Maximum Inner Product Search (MIPS) plays an important role in many applications ranging from information retrieval, recommender systems to natural language processing and machine learning. However, exhaustive MIPS is often expensive and impractical when there are a large number of candidate items. The state-of-the-art approximated MIPS is product quantization with a score-aware loss, which weighs more heavily on items with larger inner product scores. However, it is challenging to extend the score-aware loss for additive quantization due to parallel-orthogonal decomposition of residual error. Learning additive quantization with respect to this loss is important since additive quantization can achieve a lower approximation error than product quantization. To this end, we propose a quantization method called Anisotropic Additive Quantization to combine the score-aware anisotropic loss and additive quantization. To efficiently update the codebooks in this algorithm, we develop a new alternating optimization algorithm. The proposed algorithm is extensively evaluated on three real-world datasets. The experimental results show that it outperforms the state-of-the-art baselines with respect to approximate search accuracy while guaranteeing a similar retrieval efficiency.

  • Conference Article
  • Cite Count Icon 6
  • 10.1145/3447548.3467441
Online Additive Quantization
  • Aug 14, 2021
  • Qi Liu + 5 more

Approximate nearest neighbor search (ANNs) plays an important role in many applications ranging from information retrieval, recommender systems to machine translation. Several ANN indexes, such as hashing and quantization, have been designed to update for the evolving database, but there exists a remarkable performance gap between them and retrained indexes on the entire database. To close the gap, we propose an online additive quantization algorithm (online AQ) to dynamically update quantization codebooks with the incoming streaming data. Then we derive the regret bound to theoretically guarantee the performance of the online AQ algorithm. Moreover, to improve the learning efficiency, we develop a randomized block beam search algorithm for assigning each data to the codewords of the codebook. Finally, we extensively evaluate the proposed online AQ algorithm on four real-world datasets, showing that it remarkably outperforms the state-of-the-art baselines.

  • Research Article
  • 10.1145/3723008
Disentangling User Interest and Geographical Context for POI Recommendations
  • May 19, 2025
  • ACM Transactions on Intelligent Systems and Technology
  • Wenhui Meng + 4 more

POI recommendation plays an important role in many applications, such as mobility prediction and location-based advertisements. Existing POI recommendation methods mainly capture the observed patterns in user visits for recommendations, without a comprehensive consideration of the underlying reasons behind the visits. Therefore, different causes of a visit, i.e., users’ interest and geographical context, are entangled. When the underlying causes change (e.g., when a user moves to a new place), the robustness of the recommendations cannot be guaranteed. To address the above challenges, we propose DUIG, a novel user interest and geographical influences disentanglement framework for POI recommendations. We first design a personalized disentanglement strategy to divide check-ins through geographical influence. Specifically, the colliding effect of causality is leveraged to the divide cause-specific check-ins, such that user interest and geographical influence can be properly disentangled in user and POI embeddings. Through this mechanism, even if the underlying reasons that affect a user’s preference change, intervention can be conducted upon the causes to make recommendations generalized to the new scenario. In addition, a geographical-aware negative sampling strategy is proposed to utilize hard negatives to regularize the embedding and disentanglement in the latent space, where a larger sampling probability is introduced for negative samples containing more geographic information. Extensive experiments on two real-world POI recommendation datasets demonstrate the superior performance of DUIG.

  • Single Report
  • 10.2172/820273
High-performance combinatorial algorithms
  • Oct 31, 2003
  • Ali Pinar

Combinatorial algorithms have long played an important role in many applications of scientific computing such as sparse matrix computations and parallel computing. The growing importance of combinatorial algorithms in emerging applications like computational biology and scientific data mining calls for development of a high performance library for combinatorial algorithms. Building such a library requires a new structure for combinatorial algorithms research that enables fast implementation of new algorithms. We propose a structure for combinatorial algorithms research that mimics the research structure of numerical algorithms. Numerical algorithms research is nicely complemented with high performance libraries, and this can be attributed to the fact that there are only a small number of fundamental problems that underlie numerical solvers. Furthermore there are only a handful of kernels that enable implementation of algorithms for these fundamental problems. Building a similar structure for combinatorial algorithms will enable efficient implementations for existing algorithms and fast implementation of new algorithms. Our results will promote utilization of combinatorial techniques and will impact research in many scientific computing applications, some of which are listed.

  • Research Article
  • Cite Count Icon 39
  • 10.1007/s11042-012-1202-1
Real-time eye-gaze estimation using a low-resolution webcam
  • Aug 14, 2012
  • Multimedia Tools and Applications
  • Yu-Tzu Lin + 3 more

Eye detection and gaze estimation play an important role in many applications, e.g., the eye-controlled mouse in the assisting system for disabled or elderly persons, eye fixation and saccade in psychological analysis, or iris recognition in the security system. Traditional research usually achieves eye tracking by employing intrusive infrared-based techniques or expensive eye trackers. Nowadays, there are more and more needs to analyze user behaviors from tracking eye attention in general applications, in which users usually use a consumer-grade computer or even laptop with an inexpensive webcam. To satisfy the requirements of rapid developments of such applications and reduce the cost, it is no more practical to apply intrusive techniques or use expensive/specific equipment. In this paper, we propose a real-time eye-gaze estimation system by using a general low-resolution webcam, which can estimate eye-gaze accurately without expensive or specific equipment, and also without an intrusive detection process. An illuminance filtering approach is designed to remove the influence from light changes so that the eyes can be detected correctly from the low-resolution webcam video frames. A hybrid model combining the position criterion and an angle-based eye detection strategy are also derived to locate the eyes accurately and efficiently. In the eye-gaze estimation stage, we employ the Fourier Descriptor to describe the appearance-based features of eyes compactly. The determination of eye-gaze position is then carried out by the Support Vector Machine. The proposed algorithms have high performances with low computational complexity. The experiment results also show the feasibility of the proposed methodology.

  • Research Article
  • Cite Count Icon 84
  • 10.1109/tnnls.2015.2475750
RBoost: Label Noise-Robust Boosting Algorithm Based on a Nonconvex Loss Function and the Numerically Stable Base Learners.
  • Sep 22, 2015
  • IEEE Transactions on Neural Networks and Learning Systems
  • Qiguang Miao + 5 more

AdaBoost has attracted much attention in the machine learning community because of its excellent performance in combining weak classifiers into strong classifiers. However, AdaBoost tends to overfit to the noisy data in many applications. Accordingly, improving the antinoise ability of AdaBoost plays an important role in many applications. The sensitiveness to the noisy data of AdaBoost stems from the exponential loss function, which puts unrestricted penalties to the misclassified samples with very large margins. In this paper, we propose two boosting algorithms, referred to as RBoost1 and RBoost2, which are more robust to the noisy data compared with AdaBoost. RBoost1 and RBoost2 optimize a nonconvex loss function of the classification margin. Because the penalties to the misclassified samples are restricted to an amount less than one, RBoost1 and RBoost2 do not overfocus on the samples that are always misclassified by the previous base learners. Besides the loss function, at each boosting iteration, RBoost1 and RBoost2 use numerically stable ways to compute the base learners. These two improvements contribute to the robustness of the proposed algorithms to the noisy training and testing samples. Experimental results on the synthetic Gaussian data set, the UCI data sets, and a real malware behavior data set illustrate that the proposed RBoost1 and RBoost2 algorithms perform better when the training data sets contain noisy data.

  • Book Chapter
  • 10.1007/978-3-030-16181-1_74
On the Practicality of Subspace Tracking in Information Systems
  • Jan 1, 2019
  • Noor Ahmed + 1 more

Modeling and characterizing information systems’ observation data (i.e., logs) is fundamental for proper system configuration, security analysis, and monitoring system status. Due to the underlying dynamics of such systems, observations can be viewed as high–dimensional, time–varying, multivariate data. One broad class for concisely modeling systems with such data points is low–rank modeling where the observations manifest themselves in a lower-dimensional subspace. Subspace Tracking plays an important role in many applications, such as signal processing, image tracking and recognition, and machine learning. However, it is not well understood which tracker is suitable for a given information system in a practical setting. In this paper, we present a comprehensive comparative analysis of three state-of-the-art low–rank modeling approaches; GROUSE, PETRELS, and RankMin. These algorithms will be compared in terms of their convergence and stability, parameter sensitivity, and robustness in dealing with missing data for synthetic and real information systems data sets, and then summarize our findings.

  • Conference Article
  • Cite Count Icon 233
  • 10.1145/2339530.2339636
USpan
  • Aug 12, 2012
  • Junfu Yin + 2 more

Sequential pattern mining plays an important role in many applications, such as bioinformatics and consumer behavior analysis. However, the classic frequency-based framework often leads to many patterns being identified, most of which are not informative enough for business decision-making. In frequent pattern mining, a recent effort has been to incorporate utility into the pattern selection framework, so that high utility (frequent or infrequent) patterns are mined which address typical business concerns such as dollar value associated with each pattern. In this paper, we incorporate utility into sequential pattern mining, and a generic framework for high utility sequence mining is defined. An efficient algorithm, USpan, is presented to mine for high utility sequential patterns. In USpan, we introduce the lexicographic quantitative sequence tree to extract the complete set of high utility sequences and design concatenation mechanisms for calculating the utility of a node and its children with two effective pruning strategies. Substantial experiments on both synthetic and real datasets show that USpan efficiently identifies high utility sequences from large scale data with very low minimum utility.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1007/s10994-023-06488-6
LogicDT: a procedure for identifying response-associated interactions between binary predictors
  • Dec 22, 2023
  • Machine Learning
  • Michael Lau + 2 more

Interactions between predictors play an important role in many applications. Popular and successful tree-based supervised learning methods such as random forests or logic regression can incorporate interactions associated with the considered outcome without specifying which variables might interact. Nonetheless, these algorithms suffer from certain drawbacks such as limited interpretability of model predictions and difficulties with negligible marginal effects in the case of random forests or not being able to incorporate interactions with continuous variables, being restricted to additive structures between Boolean terms, and not directly considering conjunctions that reveal the interactions in the case of logic regression. We, therefore, propose a novel method called logic decision trees (logicDT) that is specifically tailored to binary input data and helps to overcome the drawbacks of existing methods. The main idea consists of considering sets of Boolean conjunctions, using these terms as input variables for decision trees, and searching for the best performing model. logicDT is also accompanied by a framework for estimating the importance of identified terms, i.e., input variables and interactions between input variables. This new method is compared to other popular statistical learning algorithms in simulations and real data applications. As these evaluations show, logicDT is able to yield high prediction performances while maintaining interpretability.

  • Research Article
  • Cite Count Icon 17
  • 10.14778/3424573.3424580
DeltaPQ
  • Sep 1, 2020
  • Proceedings of the VLDB Endowment
  • Runhui Wang + 1 more

High dimensional data is ubiquitous and plays an important role in many applications. However, the size of high dimensional data is usually excessively large. To alleviate this problem, in this paper, we develop novel techniques to compress and search high dimensional data. Specifically, we first apply vector quantization, a classical lossy data compression method. It quantizes a high dimensional vector to a sequence of small integers, namely the quantization code. Then, we propose a novel lossless compression algorithm, DeltaPQ, to further compress the quantization codes. DeltaPQ organizes the quantization codes in a tree structure and stores the differences between two quantization codes rather than the original codes. Among the exponential number of possible tree structures, we develop an efficient algorithm, whose time and space complexity are linear to the number of codes, to find the one with optimal compression ratio. The approximate nearest neighbor search query can be processed directly on the compressed data with small space overhead in a few bytes. Many similarity measures can be supported, such as inner product, cosine similarity, Euclidean distance, and Lp-norm. Experimental results on five large-scale real-world datasets show that DeltaPQ achieves a compression ratio of up to 5 (and often greater than 2) on the quantization codes whereas the state-of-art general-purpose lossless compression algorithms barely work.

  • Research Article
  • Cite Count Icon 31
  • 10.1016/j.neucom.2016.03.017
Unsupervised spectral feature selection with l1-norm graph
  • Mar 26, 2016
  • Neurocomputing
  • Xiaodong Wang + 4 more

Unsupervised spectral feature selection with l1-norm graph

  • Conference Article
  • Cite Count Icon 15
  • 10.1109/dslw51110.2021.9523399
Online Non-linear Topology Identification from Graph-connected Time Series
  • Jun 5, 2021
  • Rohan Money + 2 more

Estimating the unknown causal dependencies among graph-connected time series plays an important role in many applications, such as sensor network analysis, signal processing over cyber-physical systems, and finance engineering. Inference of such causal dependencies, often know as topology identification, is not well studied for non-linear non-stationary systems, and most of the existing methods are batch-based which are not capable of handling streaming sensor signals. In this paper, we propose an online kernel-based algorithm for topology estimation of non-linear vector autoregressive time series by solving a sparse online optimization framework using the composite objective mirror descent method. Experiments conducted on real and synthetic data sets show that the proposed algorithm outperforms the state-of-the-art methods for topology estimation.

  • Research Article
  • Cite Count Icon 9
  • 10.1109/tmm.2020.3029941
Parametric Shape Estimation of Human Body Under Wide Clothing
  • Oct 9, 2020
  • IEEE Transactions on Multimedia
  • Yucheng Lu + 3 more

The shape of the human body plays an important role in many applications, such as those involving personal healthcare and virtual clothing try-ons. However, accurate body shape measurements typically require the user to be wearing a minimal amount of clothing, which is not practical in many situations. To resolve this issue using deep learning techniques, we need a paired dataset of ground-truth naked human body shapes and their corresponding color images with clothes. As it is practically impossible to collect enough of this kind of data from real-world environments to train a deep neural network, in this paper, we present the Synthetic dataset of Human Avatars under wiDE gaRment (SHADER). The SHADER dataset consists of 300,000 paired ground-truth naked and dressed images of 1,500 synthetic humans with different body shapes, poses, garments, skin tones, and backgrounds. To take full advantage of SHADER, we propose a novel silhouette confidence measure and show that our silhouette confidence prediction network can help improve the performance of state-of-the-art shape estimation networks for human bodies under clothing. The experimental results demonstrate the effectiveness of the proposed approach. The code and dataset are available at https://github.com/YCL92/SHADER .

More from: ACM Transactions on Database Systems
  • New
  • Research Article
  • 10.1145/3771733
Tuple-Independent Representations of Infinite Probabilistic Databases
  • Nov 6, 2025
  • ACM Transactions on Database Systems
  • Nofar Carmeli + 3 more

  • New
  • Research Article
  • 10.1145/3774753
Update NDP: On Offloading Modifications to Smart Storage with Transactional Guarantees in Near-Data Processing DBMS
  • Nov 4, 2025
  • ACM Transactions on Database Systems
  • Arthur Bernhardt + 4 more

  • Research Article
  • 10.1145/3774316
Uniform Operational Consistent Query Answering
  • Nov 1, 2025
  • ACM Transactions on Database Systems
  • Marco Calautti + 3 more

  • Research Article
  • 10.1145/3716378
Degree Sequence Bounds
  • Oct 25, 2025
  • ACM Transactions on Database Systems
  • Kyle Deeds + 3 more

  • Research Article
  • 10.1145/3771766
Saga++: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications
  • Oct 14, 2025
  • ACM Transactions on Database Systems
  • Shafaq Siddiqi + 3 more

  • Research Article
  • 10.1145/3770577
Efficient Path Oracles for Proximity Queries on Point Clouds
  • Oct 2, 2025
  • ACM Transactions on Database Systems
  • Yinzhao Yan + 1 more

  • Research Article
  • 10.1145/3734517
Any-k Algorithms for Enumerating Ranked Answers to Conjunctive Queries
  • Sep 30, 2025
  • ACM Transactions on Database Systems
  • Nikolaos Tziavelis + 2 more

  • Research Article
  • 10.1145/3760773
BISLearner: Block-Aware Index Selection using Attention-Based Reinforcement Learning for Data Analytics
  • Sep 29, 2025
  • ACM Transactions on Database Systems
  • Yulai Tong + 7 more

  • Research Article
  • 10.1145/3764583
Unveiling Logic Bugs in SPJG Query Optimizations within DBMS
  • Sep 29, 2025
  • ACM Transactions on Database Systems
  • Xiu Tang + 6 more

  • Research Article
  • 10.1145/3743130
Space-Time Tradeoffs for Conjunctive Queries with Access Patterns
  • Jul 26, 2025
  • ACM Transactions on Database Systems
  • Hangdong Zhao + 2 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon