Modelling complex stochastic systems: approaches to management and stability

Brendan John Patch

doi:10.14264/uql.2019.299

Abstract

This thesis is about coping with variability in outcomes for complex stochastic systems. We focus on systems where jobs arrive randomly throughout time to utilise resources for a random amount of time before departure. The systems we investigate are primarily concerned with the communication and storage of data. The thesis is partitioned into two parts. The first part studies systems where congestion leads to jobs waiting for service (queueing systems) and the second part considers systems where congestion leads to losses due to departures before provision of service (loss systems).For queueing systems, we are mainly interested in the management objective of ensuring that the expected time a job must wait before entering is finite --- a property known as stability. Finite waiting times occur naturally for loss systems due to the balking behaviour of jobs in response to congestion and so our attention in this case turns to the more ambitious goal of managing systems in such a way that the number of lost jobs is minimised.Each part consists of an introductory chapter providing background knowledge, which is followed by three chapters containing original research. In both parts we progress through these chapters by first applying traditional analytical approaches to novel models and then developing novel simulation-based approaches for models which are out of reach of traditional approaches.We begin our research on queueing networks in Part 1 by considering a network of infinite-server queues with the special feature that, triggered by specific events, the network population vector may undergo a linear transformation. We use moment generating functions to obtain expressions for transient and stationary moments of the queue size vector and characterise the set of parameters for which the system is stable. A variety of systems fit in the framework developed, such as networks of retrial queues, networks in which jobs can be rerouted when links fail, and storage systems.In the next chapter of Part 1 we study the recently introduced Queue-Proportional Rate Allocation scheduling algorithm for multihop radio networks. The main contribution is a proof using fluid limit techniques to show that a natural generalisation of this policy to allow weighting of packets at each link, to reflect nonhomogeneous priorities, retains the maximal stability property. We also state a conjecture that in heavy traffic the diffusion-scaled workload process of the network converges weakly to a reflected Brownian motion and that in this weak limit the vector of queue lengths is always proportional to the traffic arrival rate vector.We conclude Part 1 by devising a simulation-based method for detecting whether a non-negative Markov chain is unstable for a given set of parameter values. More precisely, for a given subset of the parameter space, we develop an algorithm that can decide whether the set has a subset of positive Lebesgue measure for which the Markov chain is unstable. The approach is based on a variant of simulated annealing, and consequently only mild assumptions are needed to obtain rigorous performance guarantees. Our framework leads to a procedure that can perform statistically rigorous tests for instability, which has been extensively tested using several examples of standard and non-standard queueing networks.We begin our investigation of loss systems in Part 2 by considering a finite-capacity Erlang B model that alternates between active and inactive states according to a two-state modulating Markov process. Jobs arrives to the system as a Poisson process but are blocked from entry when the system is at capacity or inactive. We use Laplace transforms to derive expressions for the revenue lost during short term planning horizons. These expressions can be used to assess alternative system designs.In the next chapter of Part 2 we develop a sophisticated loss system type model for cloud computing systems. User demand on the computational resources of cloud computing platforms varies over time. These variations in the arrival process can be predictable or unpredictable, resulting in time-varying and `bursty' demand fluctuations. Furthermore, jobs can arrive in batches, and users whose demands are not met can be impatient. We demonstrate how to compute the expected revenue loss over a finite time horizon in the presence of all these model characteristics using matrix analytic methods. It is seen that taking these characteristics of fluctuating user demand into account can result in a substantial reduction of losses.We conclude Part 2 by developing an optimisation framework for a model applicable to mobile cloud edge computing systems. Our model is a stochastic network with blocking: jobs attempt to be processed sequentially at nodes in a network but are lost when they attempt to access a node that is at capacity. The problem is mathematically intractable in general and time consuming to solve using standard simulation methods. Our novel method combines simulation with analytical approximations to quickly obtain high quality solutions. We extensively test our approach using several complex models.

Full Text