Data Distribution and Distributed Transaction Management

Wilfried Lemahieu ,Seppe Vanden Broucke ,Bart Baesens

doi:10.1017/9781316888773.018

Abstract

Chapter Objectives In this chapter, you will learn to: • grasp the basics of distributed systems and distributed databases; • discern key architectural implications of distributed databases; • understand the impact of fragmentation, allocation, and replication; • identify different types of transparency; • understand the steps in distributed query processing; • understand distributed transaction management and concurrency control; • grasp the impact of eventual consistency and BASE transactions. Opening Scenario As Sober envisions growing as part of its long-term strategy, it wants to have a careful understanding of the data implications involved. More specifically, the company wants to know if it would make sense to distribute its data across a network of offices and work with a distributed database. Sober wants to know the impact of data distribution on query processing and optimization, transaction management, and concurrency control. In this chapter, we focus on the specifics of distributed databases (i.e., systems in which the data and DBMS functionality are distributed over different nodes or locations on a network). First, we discuss the general properties of distributed systems and offer an overview of some architectural variants of distributed database systems. Then, we tackle the different ways of distributing data over nodes in a network, including the possibility of data replication. We also focus on the degree to which the data distribution can be made transparent to applications and users. Then, we discuss the complexity of query processing and query optimization in a distributed setting. A next section is dedicated to distributed transaction management and concurrency control, focusing on both tightly coupled and loosely coupled settings. The last section overviews the particularities of transaction management in Big Data and NoSQL databases, which are often distributed in a cluster set-up, presenting BASE transactions as an alternative to the traditional ACID transaction paradigms. Distributed Systems and Distributed Databases Ever since the early days of computing, which were dominated by monolithic mainframes, distributed systems have had their place in the ICT landscape. A distributed computing system consists of several processing units or nodes with a certain level of autonomy, which are interconnected by a network and which cooperatively perform complex tasks. These complex tasks can be divided into subtasks as performed by the individual nodes.

Full Text