Efficient Parallel Boolean Expression Matching
Boolean expression matching plays an important role in many applications. However, existing solutions still show efficiency and scalability limitations. For example, existing solutions often exhibit degraded performance when applied to high-dimensional and diverse workloads, and existing algorithms rarely consider supporting concurrent matching and index updating under multicore environments. To overcome these limitations, in this article, we first design the PS-Tree data structure to efficiently index Boolean expressions in one dimension. By dividing predicates into disjoint predicate spaces, PS-Tree achieves high matching performance and good expressiveness. Based on the PS-Tree , we propose a Boolean expression matching algorithm called PSTDynamic . By dynamically adjusting the index and efficiently filtering out a large proportion of unmatching expressions, PSTDynamic achieves high matching performance under high-dimensional and diverse workloads. For multicore environment, we further extend the PSTDynamic algorithm to PSTParallel to achieve scalability with lower matching latency and higher matching throughput. We run experiments on both synthetic and real-world datasets. The experiments verify that our proposed algorithms show high efficiency and parallelism. Moreover, they also achieve fast index construction and a small memory footprint. Comprehensive experiments show that our solutions drastically outperform state-of-the-art methods.
34
- 10.1145/1807167.1807171
- Jun 6, 2010
506
- 10.1007/bf01231606
- Oct 1, 1994
- The VLDB Journal
83
- 10.14778/1687627.1687633
- Aug 1, 2009
- Proceedings of the VLDB Endowment
20
- 10.1109/tkde.2015.2421331
- Oct 1, 2015
- IEEE Transactions on Knowledge and Data Engineering
1845
- 10.1007/3-540-49257-7_15
- Jan 1, 1999
5575
- 10.1145/602259.602266
- Jan 1, 1984
55
- 10.14778/2732296.2732298
- Apr 1, 2014
- Proceedings of the VLDB Endowment
75
- 10.5555/1251086.1251089
- Oct 19, 2005
51
- 10.14778/1453856.1453906
- Aug 1, 2008
- Proceedings of the VLDB Endowment
886
- 10.1145/321479.321481
- Oct 1, 1968
- Journal of the ACM
- Research Article
1
- 10.7465/jkdi.2014.25.3.611
- May 31, 2014
- Journal of the Korean Data and Information Science Society
The inverse Weibull distribution (IWD) is the complementary Weibull distributionand plays an important role in many application areas. In Bayesian analysis, Soland’smethod can be considered to avoid computational complexities. One limitation of thisapproach is that parameters of interest are restricted to a nite number of values. Thispaper introduce nonparametric Bayesian estimator in the context of record statisticsvalues from the exponentiated inverse Weibull distribution (EIWD). In stead of Soland’sconjugate piror, stick-breaking prior is considered and the corresponding Bayesian esti-mators under the squared error loss function (quadratic loss) and LINEX loss functionare obtained and compared with other estimators. The results may be of interest espe-cially when only record values are stored.Keywords: Exponentiated inverse Weibull distribution, nonparametric Bayesian esti-mation, record statistics, stick-breaking prior. 1. Introduction The inverse Weibull distribution (IWD) is the complementary Weibull distribution andplays an important role in many applications including the dynamic components of dieselengines, the times to breakdown of an insulating uid subject to the action of constanttensioin and ood data (Nelson, 1982; Maswadah, 2003). Also, it has been used quite exten-sively when the data indicate a monotone hazard function beacuse of the exibility of thepdf and its corresponding hazard function. Studies for the inverse Weibull distribution wereconducted by many authors. Calabria and Pulcini (1994) studied Bayes 2-sample predictionfor the inverse Weibull distribution. Mahmoud et al. (2003) considered the order statisticsarising from the inverse Weibull distribution and derived the exact expression for the singlemoments of order statistics. They also obtained the variances and covariances based on themoments of order statistics.
- Book Chapter
3
- 10.1007/978-3-030-29551-6_56
- Jan 1, 2019
Knowledge graphs play an important role in many applications, such as data integration, natural language understanding and semantic search. Recently, there has been some work on constructing legal knowledge graphs from legal judgments. However, they suffer from some problems. First, existing work follows the Western legal system, thus cannot be applied to other legal systems, such as Asian legal systems; Second, existing work intends to build a precise legal knowledge graph, which is often not effective, especially when constructing the precise relationship between legal terms. To solve these problems, in this paper, we propose a framework for constructing a legal hybrid knowledge network from Chinese encyclopedia and legal judgments. First, we construct a network of legal terms through encyclopedia data. Then, we build a legal knowledge graph through Chinese legal judgments which captures the strict logical connections in the legal judgments. Finally, we build a Chinese legal hybrid knowledge network by combining the network of legal terms and the legal knowledge graph. We also evaluate the algorithms which are used to build the legal hybrid knowledge network on a real-world dataset. Experimental results demonstrate the effectiveness of these algorithms.
- Research Article
5
- 10.1609/aaai.v36i4.20356
- Jun 28, 2022
- Proceedings of the AAAI Conference on Artificial Intelligence
Maximum Inner Product Search (MIPS) plays an important role in many applications ranging from information retrieval, recommender systems to natural language processing and machine learning. However, exhaustive MIPS is often expensive and impractical when there are a large number of candidate items. The state-of-the-art approximated MIPS is product quantization with a score-aware loss, which weighs more heavily on items with larger inner product scores. However, it is challenging to extend the score-aware loss for additive quantization due to parallel-orthogonal decomposition of residual error. Learning additive quantization with respect to this loss is important since additive quantization can achieve a lower approximation error than product quantization. To this end, we propose a quantization method called Anisotropic Additive Quantization to combine the score-aware anisotropic loss and additive quantization. To efficiently update the codebooks in this algorithm, we develop a new alternating optimization algorithm. The proposed algorithm is extensively evaluated on three real-world datasets. The experimental results show that it outperforms the state-of-the-art baselines with respect to approximate search accuracy while guaranteeing a similar retrieval efficiency.
- Conference Article
6
- 10.1145/3447548.3467441
- Aug 14, 2021
Approximate nearest neighbor search (ANNs) plays an important role in many applications ranging from information retrieval, recommender systems to machine translation. Several ANN indexes, such as hashing and quantization, have been designed to update for the evolving database, but there exists a remarkable performance gap between them and retrained indexes on the entire database. To close the gap, we propose an online additive quantization algorithm (online AQ) to dynamically update quantization codebooks with the incoming streaming data. Then we derive the regret bound to theoretically guarantee the performance of the online AQ algorithm. Moreover, to improve the learning efficiency, we develop a randomized block beam search algorithm for assigning each data to the codewords of the codebook. Finally, we extensively evaluate the proposed online AQ algorithm on four real-world datasets, showing that it remarkably outperforms the state-of-the-art baselines.
- Research Article
- 10.1145/3723008
- May 19, 2025
- ACM Transactions on Intelligent Systems and Technology
POI recommendation plays an important role in many applications, such as mobility prediction and location-based advertisements. Existing POI recommendation methods mainly capture the observed patterns in user visits for recommendations, without a comprehensive consideration of the underlying reasons behind the visits. Therefore, different causes of a visit, i.e., users’ interest and geographical context, are entangled. When the underlying causes change (e.g., when a user moves to a new place), the robustness of the recommendations cannot be guaranteed. To address the above challenges, we propose DUIG, a novel user interest and geographical influences disentanglement framework for POI recommendations. We first design a personalized disentanglement strategy to divide check-ins through geographical influence. Specifically, the colliding effect of causality is leveraged to the divide cause-specific check-ins, such that user interest and geographical influence can be properly disentangled in user and POI embeddings. Through this mechanism, even if the underlying reasons that affect a user’s preference change, intervention can be conducted upon the causes to make recommendations generalized to the new scenario. In addition, a geographical-aware negative sampling strategy is proposed to utilize hard negatives to regularize the embedding and disentanglement in the latent space, where a larger sampling probability is introduced for negative samples containing more geographic information. Extensive experiments on two real-world POI recommendation datasets demonstrate the superior performance of DUIG.
- Single Report
- 10.2172/820273
- Oct 31, 2003
Combinatorial algorithms have long played an important role in many applications of scientific computing such as sparse matrix computations and parallel computing. The growing importance of combinatorial algorithms in emerging applications like computational biology and scientific data mining calls for development of a high performance library for combinatorial algorithms. Building such a library requires a new structure for combinatorial algorithms research that enables fast implementation of new algorithms. We propose a structure for combinatorial algorithms research that mimics the research structure of numerical algorithms. Numerical algorithms research is nicely complemented with high performance libraries, and this can be attributed to the fact that there are only a small number of fundamental problems that underlie numerical solvers. Furthermore there are only a handful of kernels that enable implementation of algorithms for these fundamental problems. Building a similar structure for combinatorial algorithms will enable efficient implementations for existing algorithms and fast implementation of new algorithms. Our results will promote utilization of combinatorial techniques and will impact research in many scientific computing applications, some of which are listed.
- Research Article
39
- 10.1007/s11042-012-1202-1
- Aug 14, 2012
- Multimedia Tools and Applications
Eye detection and gaze estimation play an important role in many applications, e.g., the eye-controlled mouse in the assisting system for disabled or elderly persons, eye fixation and saccade in psychological analysis, or iris recognition in the security system. Traditional research usually achieves eye tracking by employing intrusive infrared-based techniques or expensive eye trackers. Nowadays, there are more and more needs to analyze user behaviors from tracking eye attention in general applications, in which users usually use a consumer-grade computer or even laptop with an inexpensive webcam. To satisfy the requirements of rapid developments of such applications and reduce the cost, it is no more practical to apply intrusive techniques or use expensive/specific equipment. In this paper, we propose a real-time eye-gaze estimation system by using a general low-resolution webcam, which can estimate eye-gaze accurately without expensive or specific equipment, and also without an intrusive detection process. An illuminance filtering approach is designed to remove the influence from light changes so that the eyes can be detected correctly from the low-resolution webcam video frames. A hybrid model combining the position criterion and an angle-based eye detection strategy are also derived to locate the eyes accurately and efficiently. In the eye-gaze estimation stage, we employ the Fourier Descriptor to describe the appearance-based features of eyes compactly. The determination of eye-gaze position is then carried out by the Support Vector Machine. The proposed algorithms have high performances with low computational complexity. The experiment results also show the feasibility of the proposed methodology.
- Research Article
84
- 10.1109/tnnls.2015.2475750
- Sep 22, 2015
- IEEE Transactions on Neural Networks and Learning Systems
AdaBoost has attracted much attention in the machine learning community because of its excellent performance in combining weak classifiers into strong classifiers. However, AdaBoost tends to overfit to the noisy data in many applications. Accordingly, improving the antinoise ability of AdaBoost plays an important role in many applications. The sensitiveness to the noisy data of AdaBoost stems from the exponential loss function, which puts unrestricted penalties to the misclassified samples with very large margins. In this paper, we propose two boosting algorithms, referred to as RBoost1 and RBoost2, which are more robust to the noisy data compared with AdaBoost. RBoost1 and RBoost2 optimize a nonconvex loss function of the classification margin. Because the penalties to the misclassified samples are restricted to an amount less than one, RBoost1 and RBoost2 do not overfocus on the samples that are always misclassified by the previous base learners. Besides the loss function, at each boosting iteration, RBoost1 and RBoost2 use numerically stable ways to compute the base learners. These two improvements contribute to the robustness of the proposed algorithms to the noisy training and testing samples. Experimental results on the synthetic Gaussian data set, the UCI data sets, and a real malware behavior data set illustrate that the proposed RBoost1 and RBoost2 algorithms perform better when the training data sets contain noisy data.
- Book Chapter
- 10.1007/978-3-030-16181-1_74
- Jan 1, 2019
Modeling and characterizing information systems’ observation data (i.e., logs) is fundamental for proper system configuration, security analysis, and monitoring system status. Due to the underlying dynamics of such systems, observations can be viewed as high–dimensional, time–varying, multivariate data. One broad class for concisely modeling systems with such data points is low–rank modeling where the observations manifest themselves in a lower-dimensional subspace. Subspace Tracking plays an important role in many applications, such as signal processing, image tracking and recognition, and machine learning. However, it is not well understood which tracker is suitable for a given information system in a practical setting. In this paper, we present a comprehensive comparative analysis of three state-of-the-art low–rank modeling approaches; GROUSE, PETRELS, and RankMin. These algorithms will be compared in terms of their convergence and stability, parameter sensitivity, and robustness in dealing with missing data for synthetic and real information systems data sets, and then summarize our findings.
- Conference Article
233
- 10.1145/2339530.2339636
- Aug 12, 2012
Sequential pattern mining plays an important role in many applications, such as bioinformatics and consumer behavior analysis. However, the classic frequency-based framework often leads to many patterns being identified, most of which are not informative enough for business decision-making. In frequent pattern mining, a recent effort has been to incorporate utility into the pattern selection framework, so that high utility (frequent or infrequent) patterns are mined which address typical business concerns such as dollar value associated with each pattern. In this paper, we incorporate utility into sequential pattern mining, and a generic framework for high utility sequence mining is defined. An efficient algorithm, USpan, is presented to mine for high utility sequential patterns. In USpan, we introduce the lexicographic quantitative sequence tree to extract the complete set of high utility sequences and design concatenation mechanisms for calculating the utility of a node and its children with two effective pruning strategies. Substantial experiments on both synthetic and real datasets show that USpan efficiently identifies high utility sequences from large scale data with very low minimum utility.
- Research Article
1
- 10.1007/s10994-023-06488-6
- Dec 22, 2023
- Machine Learning
Interactions between predictors play an important role in many applications. Popular and successful tree-based supervised learning methods such as random forests or logic regression can incorporate interactions associated with the considered outcome without specifying which variables might interact. Nonetheless, these algorithms suffer from certain drawbacks such as limited interpretability of model predictions and difficulties with negligible marginal effects in the case of random forests or not being able to incorporate interactions with continuous variables, being restricted to additive structures between Boolean terms, and not directly considering conjunctions that reveal the interactions in the case of logic regression. We, therefore, propose a novel method called logic decision trees (logicDT) that is specifically tailored to binary input data and helps to overcome the drawbacks of existing methods. The main idea consists of considering sets of Boolean conjunctions, using these terms as input variables for decision trees, and searching for the best performing model. logicDT is also accompanied by a framework for estimating the importance of identified terms, i.e., input variables and interactions between input variables. This new method is compared to other popular statistical learning algorithms in simulations and real data applications. As these evaluations show, logicDT is able to yield high prediction performances while maintaining interpretability.
- Research Article
17
- 10.14778/3424573.3424580
- Sep 1, 2020
- Proceedings of the VLDB Endowment
High dimensional data is ubiquitous and plays an important role in many applications. However, the size of high dimensional data is usually excessively large. To alleviate this problem, in this paper, we develop novel techniques to compress and search high dimensional data. Specifically, we first apply vector quantization, a classical lossy data compression method. It quantizes a high dimensional vector to a sequence of small integers, namely the quantization code. Then, we propose a novel lossless compression algorithm, DeltaPQ, to further compress the quantization codes. DeltaPQ organizes the quantization codes in a tree structure and stores the differences between two quantization codes rather than the original codes. Among the exponential number of possible tree structures, we develop an efficient algorithm, whose time and space complexity are linear to the number of codes, to find the one with optimal compression ratio. The approximate nearest neighbor search query can be processed directly on the compressed data with small space overhead in a few bytes. Many similarity measures can be supported, such as inner product, cosine similarity, Euclidean distance, and Lp-norm. Experimental results on five large-scale real-world datasets show that DeltaPQ achieves a compression ratio of up to 5 (and often greater than 2) on the quantization codes whereas the state-of-art general-purpose lossless compression algorithms barely work.
- Research Article
31
- 10.1016/j.neucom.2016.03.017
- Mar 26, 2016
- Neurocomputing
Unsupervised spectral feature selection with l1-norm graph
- Conference Article
15
- 10.1109/dslw51110.2021.9523399
- Jun 5, 2021
Estimating the unknown causal dependencies among graph-connected time series plays an important role in many applications, such as sensor network analysis, signal processing over cyber-physical systems, and finance engineering. Inference of such causal dependencies, often know as topology identification, is not well studied for non-linear non-stationary systems, and most of the existing methods are batch-based which are not capable of handling streaming sensor signals. In this paper, we propose an online kernel-based algorithm for topology estimation of non-linear vector autoregressive time series by solving a sparse online optimization framework using the composite objective mirror descent method. Experiments conducted on real and synthetic data sets show that the proposed algorithm outperforms the state-of-the-art methods for topology estimation.
- Research Article
9
- 10.1109/tmm.2020.3029941
- Oct 9, 2020
- IEEE Transactions on Multimedia
The shape of the human body plays an important role in many applications, such as those involving personal healthcare and virtual clothing try-ons. However, accurate body shape measurements typically require the user to be wearing a minimal amount of clothing, which is not practical in many situations. To resolve this issue using deep learning techniques, we need a paired dataset of ground-truth naked human body shapes and their corresponding color images with clothes. As it is practically impossible to collect enough of this kind of data from real-world environments to train a deep neural network, in this paper, we present the Synthetic dataset of Human Avatars under wiDE gaRment (SHADER). The SHADER dataset consists of 300,000 paired ground-truth naked and dressed images of 1,500 synthetic humans with different body shapes, poses, garments, skin tones, and backgrounds. To take full advantage of SHADER, we propose a novel silhouette confidence measure and show that our silhouette confidence prediction network can help improve the performance of state-of-the-art shape estimation networks for human bodies under clothing. The experimental results demonstrate the effectiveness of the proposed approach. The code and dataset are available at https://github.com/YCL92/SHADER .
- New
- Research Article
- 10.1145/3771733
- Nov 6, 2025
- ACM Transactions on Database Systems
- New
- Research Article
- 10.1145/3774753
- Nov 4, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3774316
- Nov 1, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3716378
- Oct 25, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3771766
- Oct 14, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3770577
- Oct 2, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3734517
- Sep 30, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3760773
- Sep 29, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3764583
- Sep 29, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3743130
- Jul 26, 2025
- ACM Transactions on Database Systems
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.