Data Partitioning Scheme Research Articles

Over the last several years, many sequence alignment tools have appeared and become popular for the fast evolution of next generation sequencing technologies. Obviously, researchers that use such tools are interested in getting maximum performance when they execute them in modern infrastructures. Today’s NUMA (Non-uniform memory access) architectures present major challenges in getting such applications to achieve good scalability as more processors/cores are used. The memory system in NUMA systems shows a high complexity and may be the main cause for the loss of an application’s performance. The existence of several memory banks in NUMA systems implies a logical increase in latency associated with the accesses of a given processor to a remote bank. This phenomenon is usually attenuated by the application of strategies that tend to increase the locality of memory accesses. However, NUMA systems may also suffer from contention problems that can occur when concurrent accesses are concentrated on a reduced number of banks. Sequence alignment tools use large data structures to contain reference genomes to which all reads are aligned. Therefore, these tools are very sensitive to performance problems related to the memory system. The main goal of this study is to explore the trade-offs between data locality and data dispersion in NUMA systems. We have performed experiments with several popular sequence alignment tools on two widely available NUMA systems to assess the performance of different memory allocation policies and data partitioning strategies. We find that there is not one method that is best in all cases. However, we conclude that memory interleaving is the memory allocation strategy that provides the best performance when a large number of processors and memory banks are used. In the case of data partitioning, the best results are usually obtained when the number of partitions used is greater, sometimes combined with an interleave policy.

Read full abstract

Running parallel database systems in an environment with heterogeneous resources has become increasingly common, due to cluster evolution and increasing interest in moving applications into public clouds. Performance differences among machines in the same cluster pose new challenges for parallel database systems. First, for database systems running in a heterogeneous cluster, the default uniform data partitioning strategy may overload some of the slow machines, while at the same time it may underutilize the more powerful machines. Since the processing time of a parallel query is determined by the slowest machine, such an allocation strategy may result in a significant query performance degradation. Second, since machines might have varying resources or performance, different choices of machines may lead to different costs or performance for executing the same workload. By carefully selecting the most suitable machines for running a workload, we may achieve better performance with the same budget, or we may meet the same performance requirements with a lower cost. We address these challenges by introducing techniques we call resource bricolage and resource selection that improve database performance in heterogeneous environments. Our approaches quantify the performance differences among machines with various resources as they process workloads with diverse resource requirements. For the purpose of better resource utilization, we formalize the problem of minimizing workload execution time and view it as an optimization problem, and then, we employ linear programming to obtain a recommended data partitioning scheme. For the purpose of better resource selection, we formalize two problems: One minimizes the total workload execution time with a given budget, and the other minimizes the total budget with a given performance target. We then employ different mixed-integer programs to search for the optimal resource selection decisions. We verify the effectiveness of both resource bricolage and resource selection techniques with an extensive experimental study.

Read full abstract

Data Partitioning Scheme Research Articles

Related Topics

Articles published on Data Partitioning Scheme

Package queries: efficient and scalable computation of high-order constraints

A Survey of Traditional and MapReduceBased Spatial Query Processing Approaches

A performance comparison of data and memory allocation strategies for sequence aligners on NUMA architectures

Synchrophasor Sensor Networks for Grid Communication and Protection.

A Scalable Execution Engine for Package Queries

Translating on pairwise entity space for knowledge graph embedding

Two-dimensional Phase Unwrapping Method Using Cost Function of L0 Norm

The Data Aggregation Privacy Protection Algorithm of Body Area Network Based on Data Partitioning

The phylogenetic position of eriophyoid mites (superfamily Eriophyoidea) in Acariformes inferred from the sequences of mitochondrial genomes and nuclear small subunit (18S) rRNA gene

FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop Clusters

DRSO-EGSM: data replication strategy oriented to automatic energy gear-shifting mechanism

Semiautomated Alignment of High-Throughput Metabolite Profiles with Chemometric Tools.

ParaView visualization of Abaqus output on the mechanical deformation of complex microstructures

Improved pan-specific prediction of MHC class I peptide binding using a novel receptor clustering data partitioning strategy.

Data Partitioning Strategy of GPU Heterogeneous Clusters Based on Learning

Multiresolution and fast decompression for optimal web-based rendering

Scalable 3D hybrid parallel Delaunay image-to-mesh conversion algorithm for distributed shared memory architectures

Resource bricolage and resource selection for parallel database systems

Resource Bricolage for Parallel DBMSs on Heterogeneous Clusters

Know your customer: computing k-most promising products for targeted marketing

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Data Partitioning Scheme Research Articles

Related Topics

Articles published on Data Partitioning Scheme

Package queries: efficient and scalable computation of high-order constraints

A Survey of Traditional and MapReduceBased Spatial Query Processing Approaches

A performance comparison of data and memory allocation strategies for sequence aligners on NUMA architectures

Synchrophasor Sensor Networks for Grid Communication and Protection.

A Scalable Execution Engine for Package Queries

Translating on pairwise entity space for knowledge graph embedding

Two-dimensional Phase Unwrapping Method Using Cost Function of L0 Norm

The Data Aggregation Privacy Protection Algorithm of Body Area Network Based on Data Partitioning

The phylogenetic position of eriophyoid mites (superfamily Eriophyoidea) in Acariformes inferred from the sequences of mitochondrial genomes and nuclear small subunit (18S) rRNA gene

FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop Clusters

DRSO-EGSM: data replication strategy oriented to automatic energy gear-shifting mechanism

Semiautomated Alignment of High-Throughput Metabolite Profiles with Chemometric Tools.

ParaView visualization of Abaqus output on the mechanical deformation of complex microstructures

Improved pan-specific prediction of MHC class I peptide binding using a novel receptor clustering data partitioning strategy.

Data Partitioning Strategy of GPU Heterogeneous Clusters Based on Learning

Multiresolution and fast decompression for optimal web-based rendering

Scalable 3D hybrid parallel Delaunay image-to-mesh conversion algorithm for distributed shared memory architectures

Resource bricolage and resource selection for parallel database systems

Resource Bricolage for Parallel DBMSs on Heterogeneous Clusters

Know your customer: computing k-most promising products for targeted marketing