Two-level Partitioning Research Articles

ABSTRACT Cross-matching operation, which is to find corresponding data for the same celestial object or region from multiple catalogues, is indispensable to astronomical data analysis and research. Due to the large amount of astronomical catalogues generated by the ongoing and next-generation large-scale sky surveys, the time complexity of the cross-matching is increasing dramatically. Heterogeneous computing environments provide a theoretical possibility to accelerate the cross-matching, but the performance advantages of heterogeneous computing resources have not been fully utilized. To meet the challenge of cross-matching for substantial increasing amount of astronomical observation data, this paper proposes Heterogeneous-computing-enabled Large Catalogue Cross-matcher (HLC2), a high-performance cross-matching framework based on spherical position deviation on CPU-GPU heterogeneous computing platforms. It supports scalable and flexible cross-matching and can be directly applied to the fusion of large astronomical catalogues from survey missions and astronomical data centres. A performance estimation model is proposed to locate the performance bottlenecks and guide the optimizations. A two-level partitioning strategy is designed to generate an optimized data placement according to the positions of celestial objects to increase throughput. To make HLC2 a more adaptive solution, the architecture-aware task splitting, thread parallelization, and concurrent scheduling strategies are designed and integrated. Moreover, a novel quad-direction strategy is proposed for the boundary problem to effectively balance performance and completeness. We have experimentally evaluated HLC2 using public released catalogue data. Experiments demonstrate that HLC2 scales well on different sizes of catalogues and the cross-matching speed is significantly improved compared to the state-of-the-art cross-matchers.

Read full abstract

Given two datasets of points (called Query and Training), the Group (K) Nearest-Neighbor (GKNN) query retrieves (K) points of the Training with the smallest sum of distances to every point of the Query. This spatial query has been studied during the recent years and several performance improving techniques and pruning heuristics have been proposed. In previous work, we presented the first MapReduce algorithm, consisting of alternating local and parallel phases, which can be used to effectively process the GKNN query when the Query fits in memory, while the Training one belongs to the Big Data category. In this paper, we present a significantly improved algorithm that incorporates a new high-performance refining method, a fast way to calculate distance sums for pruning purposes and several other minor coding and algorithmic improvements. Moreover, we transform this algorithm (which has been implemented in the Hadoop framework) to SpatialHadoop (a popular distributed framework that is dedicated to spatial processing), using a novel two-level partitioning method. Using real world and synthetic datasets, we also present a thorough experimental study of the Hadoop and SpatialHadoop versions of the algorithm, including a backstage analysis of the algorithm’s performance, using metrics that highlight its internal functioning. Finally, we present an experimental comparison of the Hadoop, the SpatialHadoop versions and the version of our previous work, showing that the improved versions are the big winners, with the SpatialHadoop one being faster than its Hadoop counterpart.

Read full abstract

Two-level Partitioning Research Articles

Articles published on Two-level Partitioning

HLC2: a highly efficient cross-matching framework for large astronomical catalogues on heterogeneous computing environments

Answering why-not questions on top-[formula omitted] augmented spatial keyword queries

Service-Aware Two-Level Partitioning for Machine Learning-Based Network Intrusion Detection With High Performance and High Scalability

Algorithms for processing the group K nearest-neighbor query on distributed frameworks

Multilevel parallelization for simulating compressible turbulent flows on most kinds of hybrid supercomputers

Augmented keyword search on spatial entity databases

Processing Long Queries Against Short Text

Power System Dynamic Simulations Using a Parallel Two-Level Schur-Complement Decomposition

On scalable RDFS reasoning using a hybrid approach

Improving the parallel efficiency of large-scale structural dynamic analysis using a hierarchical approach

Optimal loop scheduling for hiding memory latency based on two-level partitioning and prefetching

Improving classification performance using fuzzy MLP and two-level selective partitioning of the feature space

Optimization by iterative improvement: an experimental evaluation on two-way partitioning

Two-level partitioning algorithm with stable performance

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Two-level Partitioning Research Articles

Articles published on Two-level Partitioning

HLC2: a highly efficient cross-matching framework for large astronomical catalogues on heterogeneous computing environments

Answering why-not questions on top-[formula omitted] augmented spatial keyword queries

Service-Aware Two-Level Partitioning for Machine Learning-Based Network Intrusion Detection With High Performance and High Scalability

Algorithms for processing the group K nearest-neighbor query on distributed frameworks

Multilevel parallelization for simulating compressible turbulent flows on most kinds of hybrid supercomputers

Augmented keyword search on spatial entity databases

Processing Long Queries Against Short Text

Power System Dynamic Simulations Using a Parallel Two-Level Schur-Complement Decomposition

On scalable RDFS reasoning using a hybrid approach

Improving the parallel efficiency of large-scale structural dynamic analysis using a hierarchical approach

Optimal loop scheduling for hiding memory latency based on two-level partitioning and prefetching

Improving classification performance using fuzzy MLP and two-level selective partitioning of the feature space

Optimization by iterative improvement: an experimental evaluation on two-way partitioning

Two-level partitioning algorithm with stable performance