Data Placement Algorithm Research Articles

Cloud computing is increasingly being seen as a way to reduce infrastructure costs and add elasticity, and is being used by a wide range of organizations. Cloud data management systems today need to serve a range of different workloads, from analytical read-heavy workloads to transactional (OLTP) workloads. For both the service providers and the users, it is critical to minimize the consumption of resources like CPU, memory, communication bandwidth, and energy, without compromising on service-level agreements if any. In this article, we develop a workload-aware data placement and replication approach, called SWORD, for minimizing resource consumption in such an environment. Specifically, we monitor and model the expected workload as a hypergraph and develop partitioning techniques that minimize the average query span, i.e., the average number of machines involved in the execution of a query or a transaction. We empirically justify the use of query span as the metric to optimize, for both analytical and transactional workloads, and develop a series of replication and data placement algorithms by drawing connections to several well-studied graph theoretic concepts. We introduce a suite of novel techniques to achieve high scalability by reducing the overhead of partitioning and query routing. To deal with workload changes, we propose an incremental repartitioning technique that modifies data placement in small steps without resorting to complete repartitioning. We propose the use of fine-grained quorums defined at the level of groups of data items to control the cost of distributed updates, improve throughput, and adapt to different workloads. We empirically illustrate the benefits of our approach through a comprehensive experimental evaluation for two classes of workloads. For analytical read-only workloads, we show that our techniques result in significant reduction in total resource consumption. For OLTP workloads, we show that our approach improves transaction latencies and overall throughput by minimizing the number of distributed transactions.

Read full abstract

Data availability is one of the most important properties of peer-to-peer (P2P) storage systems. Availability analysis model and data placement are two key design choices. Users in P2P storage system are both providers and customers. This characteristic determines that the availability analysis must be user-centric, and thereby enhance the quality of service and decrease the system cost. The popular approach in recent studies is simple random placement with steady-state model, which has the following drawbacks: 1) It ignores the up/down patterns of nodes, whose availability is over-estimated or under-estimated at different periods of time. 2) It ignores the access patterns of users, so the availability perceived by users is hard to evaluate precisely. 3) It ignores the huge difference of nodes’ availability, thus leading to the absence of incentive. This paper proposes a novel user-experience-based availability model, which evaluates the availability of P2P storage system in terms of user experience, which can degenerate to traditional availability analysis model. Based on the new model, this paper proposes decentralized data placement algorithms for two typical P2P storage applications: “data sharing” and “personal backup”. By the trace-driven simulation, we prove that our methods can enhance the availability perceived by users greatly, reduce the variance of the availability dramatically and eliminate the nodes with low availability in data-sharing applications; meanwhile, it can provide different-level service to encourage users according to their contributions.

Read full abstract

Data Placement Algorithm Research Articles

Related Topics

Articles published on Data Placement Algorithm

A Dynamic Data Placement Strategy for Hadoop in Heterogeneous Environments

SWORD: workload-aware data placement and replica selection for cloud data management systems

Improving flash write performance by using update frequency

Optimizing Data Placement of Loops for Energy Minimization with Multiple Types of Memories

Data Placement and Duplication for Embedded Multicore Systems With Scratch Pad Memory

Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework

User-experience-based availability analysis model and its application in P2P storage systems

Clustering-Based and Consistent Hashing-Aware Data Placement Algorithm

RSEDP: an effective hybrid data placement algorithm for large-scale storage systems

Efficient Data Maintenance Scheme for Peer-to-Peer Storage Systems

SEA: A Striping-Based Energy-Aware Strategy for Data Placement in RAID-Structured Storage Systems

D/H Placement and On-Line Data Reorganization Based on Control Theory in Dynamic Disk Array

A placement strategy of multimedia objects in multimedia information systems

Resource scheduling in a high-performance multimedia server

Data placement in shared-nothing parallel database systems

Memory Hierarchy Configuration Analysis

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Data Placement Algorithm Research Articles

Related Topics

Articles published on Data Placement Algorithm

A Dynamic Data Placement Strategy for Hadoop in Heterogeneous Environments

SWORD: workload-aware data placement and replica selection for cloud data management systems

Improving flash write performance by using update frequency

Optimizing Data Placement of Loops for Energy Minimization with Multiple Types of Memories

Data Placement and Duplication for Embedded Multicore Systems With Scratch Pad Memory

Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework

User-experience-based availability analysis model and its application in P2P storage systems

Clustering-Based and Consistent Hashing-Aware Data Placement Algorithm

RSEDP: an effective hybrid data placement algorithm for large-scale storage systems

Efficient Data Maintenance Scheme for Peer-to-Peer Storage Systems

SEA: A Striping-Based Energy-Aware Strategy for Data Placement in RAID-Structured Storage Systems

D/H Placement and On-Line Data Reorganization Based on Control Theory in Dynamic Disk Array

A placement strategy of multimedia objects in multimedia information systems

Resource scheduling in a high-performance multimedia server

Data placement in shared-nothing parallel database systems

Memory Hierarchy Configuration Analysis