Optimal Cache Partition-Sharing
When a cache is shared by multiple cores, its space may be allocated either by sharing, partitioning, or both. We call the last case partition-sharing. This paper studies partition-sharing as a general solution, and presents a theory an technique for optimizing partition-sharing. We present a theory and a technique to optimize partition sharing. The theory shows that the problem of partition-sharing is reducible to the problem of partitioning. The technique uses dynamic programming to optimize partitioning for overall miss ratio, and for two different kinds of fairness. Finally, the paper evaluates the effect of optimal cache sharing and compares it with conventional solutions for thousands of 4-program co-run groups, with nearly 180 million different ways to share the cache by each co-run group. Optimal partition-sharing is on average 26% better than free-for-all sharing, and 98% better than equal partitioning. We also demonstrate the trade-off between optimal partitioning and fair partitioning.
- Conference Article
22
- 10.5591/978-1-57735-516-8/ijcai11-019
- Jul 16, 2011
We conduct a computational analysis of fair and optimal partitions in additively separable hedonic games. We show that, for strict preferences, a Pareto optimal partition can be found in polynomial time while verifying whether a given partition is Pareto optimal is coNP-complete, even when preferences are symmetric and strict. Moreover, computing a partition with maximum egalitarian or utilitarian social welfare or one which is both Pareto optimal and individually rational is NP-hard. We also prove that checking whether there exists a partition which is both Pareto optimal and envy-free is Σ2p-complete. Even though an envy-free partition and a Nash stable partition are both guaranteed to exist for symmetric preferences, checking whether there exists a partition which is both envy-free and Nash stable is NP-complete.
- Conference Article
6
- 10.1109/fpl.2006.311237
- Jan 1, 2006
Multi-processor architectures have gained interest recently because of their ability to exploit programmable silicon parallelism at acceptable power-efficiency figures. Despite the potential benefit they offer over single-processor architectures, it is unresolved how one can write compact and efficient programs for multiple parallel cores. In this paper, we propose the use of a synchronous hardware description language to program a network of small PicoBlaze processors. The partitioning of a multiprocessor program over multiple cores is straightforward because the input specification is fully parallel. A systematic transformation process converts the parallel input specification into concurrent PicoBlaze programs. We demonstrate the mapping of a cryptographic design (AES) onto four PicoBlaze processors, showing almost linear speedup over an equivalent single-core design.
- Research Article
- 10.15421/322507
- Dec 22, 2025
- Problems of applied mathematics and mathematic modeling
The paper presents the development of algorithm and implementation of software for solving a dynamic optimal set partitioning problem with fixed centers, considering the dynamic density under integral constraints. This formulation arises in a wide range of applied problems – from logistics and demand allocation to monitoring, control, and risk assessment systems – where the intensity or distribution of a measured quantity (such as demand or population density) is a function of planar coordinates and time, and where changes in this density have a significant impact on the structure of the optimal partition. The objective of the study is to construct an adaptive partitioning structure that minimizes a prescribed objective functional for fixed centers while satisfying constraints on production capacities associated with the corresponding subsets. The proposed software solution is based on a mathematical model previously developed by the authors, which integrates the temporal dynamics of the density into the optimization algorithm and enables partition updates in a near real time regime. The software follows a modular architecture and includes a data preprocessing module, a computational core implementing numerical methods for determining the optimal partition under dynamic conditions, and a user interface for visualization and interactive analysis of the results. The employed numerical approaches and algorithmic optimizations ensure scalability and computational efficiency when solving large-scale problems under limited computational resources. The paper presents the results of computational experiments that demonstrate the adaptability and efficiency of the devel-oped software in the presence of dynamically evolving density. The software can be used both as a tool for operational analysis and decision support and as a research platform for modeling the behavior of dynamic systems with spatial structure.
- Research Article
39
- 10.1002/rsa.20001
- Mar 11, 2004
- Random Structures & Algorithms
We consider the problem of partitioning n integers into two subsets of given cardinalities such that the discrepancy, the absolute value of the difference of their sums, is minimized. The integers are i.i.d. random variables chosen uniformly from the set {1,…,M}. We study how the typical behavior of the optimal partition depends on n, M, and the bias s, the difference between the cardinalities of the two subsets in the partition. In particular, we rigorously establish this typical behavior as a function of the two parameters κ :=n−1log2M and b := |s|/n by proving the existence of three distinct “phases” in the κb‐plane, characterized by the value of the discrepancy and the number of optimal solutions: a “perfect phase” with exponentially many optimal solutions with discrepancy 0 or 1; a “hard phase” with minimal discrepancy of order Me−Θ(n); and a “sorted phase” with an unique optimal partition with discrepancy of order Mn, obtained by putting the (s + n)/2 smallest integers in one subset. Our phase diagram covers all but a relatively small region in the κb‐plane. We also show that the three phases can be alternatively characterized by the number of basis solutions of the associated linear programming problem, and by the fraction of these basis solutions whose ±1‐valued components form optimal integer partitions of the subproblem with the corresponding weights. We show in particular that this fraction is one in the sorted phase, and exponentially small in both the perfect and hard phases, and strictly exponentially smaller in the hard phase than in the perfect phase. Open problems are discussed, and numerical experiments are presented. © 2004 Wiley Periodicals, Inc. Random Struct. Alg., 2004
- Research Article
34
- 10.1088/0951-7715/22/1/005
- Dec 2, 2008
- Nonlinearity
We present and analyse numerical approximations of a norm-preserving gradient flow and consider applications to an optimal eigenvalue partition problem. We consider various discretizations and demonstrate that many of the properties shared by the continuous counterpart can be preserved at the discrete level. The numerical algorithms are then used to study the nonlinear and non-local interfacial dynamics associated with the optimal partition.
- Research Article
23
- 10.4172/2325-9647.1000119
- Jan 1, 2015
- Journal of Hydrogeology and Hydrologic Engineering
Exploring the Role of Domain Partitioning on Efficiency of Parallel Distributed Hydrologic Model Simulations Spatially distributed hydrologic models of watersheds and river basins are data and computation intensive because of the combined nature of hydrodynamics, complex forcings and heterogeneous parameter fields. Application of these models at fine temporal and spatial resolutions, and on large problem domains, is facilitated by parallel computation on multi-processor clusters. Notably, the computation efficiency of parallel simulations is crucially determined by the efficiency with which data are divided-and-distributed in a multiprocessor environment and how the information is shared between processors. While numerous data partitioning algorithms exist and have been extensively studied in computer science literature, detailed elucidation of the role of hydrologic model structure on data partitioning has not been presented yet. In addition, the relative role of computational load balance and interprocessor communication on parallel computation efficiency of a hydrologic model is not known. Considering the unstructured domain discretization scheme used in PIHM hydrologic model as an example, the paper first presents a generic methodology for incorporating hydrologic factors in optimal domain partitioning algorithms. The partitions are then used to explore the isolated role of computation load balance and interprocessor communication on parallel efficiency. Results confirm that parallel simulations on partitions that minimize interprocessor communication and divide the computational load equally are the most efficient. More importantly, load balance between processors is observed to be a more sensitive control on parallel efficiency than minimization of interprocessor communication. Further analyses of the efficiency and scalability of the parallel code for different partitioning configurations reveal a direct correspondence between parallel efficiency and theoretical metrics such as load balance ratio and communication to computation ratio. Results indicate that theoretical metrics can be used for the selection of best partitions before computationally intensive parallel simulations are performed. The study serves as a proof-of-concept evaluation of the impact of computation and communication on the efficiency of parallelized distributed hydrologic models at multiple resolutions.
- Research Article
1
- 10.1080/10618600.2022.2077351
- May 19, 2022
- Journal of Computational and Graphical Statistics
We generalize the spatial and subset scan statistics from the single to the multiple subset case. The two main approaches to defining the log-likelihood ratio statistic in the single subset case – the population-based and expectation-based scan statistics – are considered, leading to risk partitioning and multiple cluster detection scan statistics, respectively. We show that, for distributions in a separable exponential family, the risk partitioning scan statistic can be expressed as a scaled f-divergence of the normalized count and baseline vectors, and the multiple cluster detection scan statistic as a sum of scaled Bregman divergences. In either case, however, maximization of the scan statistic by exhaustive search over all partitionings of the data requires exponential time. To make this optimization computationally feasible, we prove sufficient conditions under which the optimal partitioning is guaranteed to be consecutive. This Consecutive Partitions Property generalizes the linear-time subset scanning property from two partitions (the detected subset and the remaining data elements) to the multiple partition case. While the number of consecutive partitionings of n elements into t partitions scales as , making it computationally expensive for large t, we present a dynamic programming approach which identifies the optimal consecutive partitioning in time, thus allowing for the exact and efficient solution of large-scale risk partitioning and multiple cluster detection problems. Finally, we demonstrate the detection performance and practical utility of partition scan statistics using simulated and real-world data.
- Research Article
1
- 10.1109/lwc.2024.3466728
- Dec 1, 2024
- IEEE Wireless Communications Letters
Platooning-based vehicle-to-vehicle (V2V) integrated sensing and communication (ISAC) frameworks have emerged as an attractive strategy in recent years. In this letter, we present an optimal time partitioning (OTP) framework in V2V ISAC systems. We propose a novel sensing measure for quantifying radar sensing performance as a function of the maximum detectable range and velocity of the radar. With the communication operation following the sensing operation, an OTP problem is formulated and solved as a convex problem, constrained by sensing and communication performance guarantees. Optimal bounds on the time duration for sensing and communication are derived, along with the maximum achievable communication throughput. Furthermore, analytical insights on the inherent trade-offs associated with the design parameters are presented. The simulation results demonstrate that the proposed OTP framework achieves a communication throughput gain of up to 12.6% over the equal time partitioning framework, in addition to meeting the sensing performance requirements.
- Research Article
28
- 10.1017/s0263574713001148
- Dec 18, 2013
- Robotica
SUMMARYThis paper presents decentralized algorithms for coverage with mobile robots on a graph. Coverage is an important capability of multi-robot systems engaged in a number of different applications, including placement for environmental modeling, deployment for maximal quality surveillance, and even coordinated construction. We use distributed vertex substitution for locational optimization and equal mass partitioning, and the controllers minimize the corresponding cost functions. We prove that the proposed controller with two-hop communication guarantees convergence to the locally optimal configuration. We evaluate the algorithms in simulations and also using four mobile robots.
- Conference Article
14
- 10.1109/ipdpsw.2012.12
- May 1, 2012
The problem of matrix partitioning for parallel matrix-matrix multiplication on heterogeneous processors has been extensively studied since the mid 1990s. During this time, previous research focused mainly on the design of efficient partitioning algorithms, optimally or sub-optimally partitioning matrices into rectangles. The optimality of the rectangular partitioning shape itself has never been studied or even seriously questioned. The accepted approach is that consideration of non-rectangular shapes will not significantly improve the optimality of the solution, but can significantly complicate the partitioning problem, which is already NP-complete even for the restricted case of rectangular shapes. There is no published research, however, supporting this approach. The shape of the globally optimal partitioning, and how the best rectangular partitioning compares with this global optimum, are still wide open problems. Solution of these problems will decide if new partitioning algorithms searching for truly optimal, and not necessarily rectangular, solutions are needed. This paper presents the first results of our research on the problem of optimal partitioning shapes for parallel matrix-matrix multiplication on heterogeneous processors. Namely, the case of two interconnected processors is comprehensively studied. We prove that, depending on performance characteristics of the processors and the communication link, the globally optimal partitioning will have one of just two well-specified shapes, one of which is rectangular and the other is non-rectangular. The theoretical analysis is conducted using an original mathematical technique proposed in the paper. It is shown that the technique can also be applied in the case of arbitrary numbers of processors. While comprehensive analysis of the cases of three and more processors is more complicated and the subject for future work, the paper does prove the optimality of some particular non-rectangular partitioning shapes for some combinations of performance characteristics of heterogeneous processors and communication links. The paper also presents experimental results demonstrating that the optimal non-rectangular partitioning can significantly outperform the optimal rectangular one on real-life heterogeneous HPC platforms.
- Research Article
14
- 10.1109/12.580427
- Mar 1, 1997
- IEEE Transactions on Computers
Given n heterogeneous traffic sources which generate multiple types of traffic among themselves, we consider the problem of finding a set of disjoint clusters to cover n traffic sources. The objective is to minimize the total communication cost for the entire system in the context that the intracluster communication is less expensive than the intercluster communication. Different from the general graph partitioning problem, our work takes into account the physical topology constraints of the linear arrangement of physical cells in highway cellular systems and the hexagonal mesh arrangement of physical cells in cellular systems. In our partitioning schemes, the optimal partitioning problem is transformed into an equivalent problem with a relative cost function, which generates the communication cost differences between the intracluster communications and the intercluster communications. For highway cellular systems, we have designed an efficient optimal partitioning algorithm of O(mn/sup 2/) by dynamic programming, where m is the number of clusters of n base stations. The algorithm also finds all the valid partitions in the same polynomial time, given the size constraint on a cluster and the total allowable communication cost for the entire system. For hexagonal cellular systems, we have developed four heuristics for the optimal partitioning based on the techniques of moving or interchanging boundary nodes between adjacent clusters. The heuristics have been evaluated and compared through experimental testing and analysis.
- Research Article
1
- 10.1007/s00357-008-9014-8
- Oct 24, 2008
- Journal of Classification
Data holders, such as statistical institutions and financial organizations, have a very serious and demanding task when producing data for official and public use. It's about controlling the risk of identity disclosure and protecting sensitive information when they communicate data-sets among themselves, to governmental agencies and to the public. One of the techniques applied is that of micro-aggregation. In a Bayesian setting, micro-aggregation can be viewed as the optimal partitioning of the original data-set based on the minimization of an appropriate measure of discrepancy, or distance, between two posterior distributions, one of which is conditional on the original data-set and the other conditional on the aggregated data-set. Assuming d-variate normal data-sets and using several measures of discrepancy, it is shown that the asymptotically optimal equal probability m-partition of $ \mathbb{R}^{d} $ , with m 1/d ? $ \mathbb{N} $ , is the convex one which is provided by hypercubes whose sides are formed by hyperplanes perpendicular to the canonical axes, no matter which discrepancy measure has been used. On the basis of the above result, a method that produces a sub-optimal partition with a very small computational cost is presented.
- Research Article
19
- 10.1007/s10766-015-0384-3
- Oct 7, 2015
- International Journal of Parallel Programming
When renting computing power, fairness and overall performance are important for customers and service providers. However, strict fairness usually results in poor performance. In this paper, we study this trade-off. In our experiments, equal cache partitioning results in 131 % higher miss ratios than optimal partitioning. In order to balance fairness and performance, we propose two elastic, or movable, cache allocation baselines: elastic miss ratio baseline (EMB) and elastic cache space baseline (ECB). Furthermore, we study optimal partitions for each baseline with different levels of elasticity, and show that EMB is more effective than ECB. We also classify programs from the SPEC 2006 benchmark suite based on how they benefit or suffer from the elastic baselines, and suggest essential information for customers and service provider to choose a baseline.
- Research Article
19
- 10.1080/03610920902763890
- Feb 10, 2010
- Communications in Statistics - Theory and Methods
Selective assembly is an effective approach for improving the quality of a product assembled from two types of components when the quality characteristic is the clearance between the mating components. In this article, optimal binning strategies under squared error loss in selective assembly when the clearance is constrained by a tolerance parameter are discussed. Conditions for a set of constrained optimal partition limits are given, and uniqueness of this set is shown for the case when the dimensional distributions of the two components are identical and strongly unimodal. Some numerical results are reported that compare constrained optimal partitioning, unconstrained optimal partitioning, and equal width partitioning.
- Conference Article
8
- 10.1145/3442381.3450041
- Apr 19, 2021
Public schools in the United States offer tuition-free primary and secondary education to their students, and are divided into school districts funded by the local and state governments. Although the primary source of school district revenue is public money, several studies have pointed to the inequality in funding across different school districts. In this paper, we focus on the spatial geometry/distribution of such inequality, i.e., how the highly funded and lesser funded school districts are located relative to each other. Due to the major reliance on local property taxes for school funding, we find existing school district boundaries promoting financial segregation, with highly-funded school districts surrounded by lesser-funded districts and vice-versa. To counter such issues, we formally propose the Fair Partitioning problem to divide a given set of schools into k districts such that the spatial inequality in the district-level funding is minimized. However, the Fair Partitioning problem turns out to be computationally challenging, and we formally show that it is strongly -complete. We further provide a greedy algorithm to offer practical solution to Fair Partitioning, and show its effectiveness in lowering spatial inequality in school district funding across different states in the United States.