Robust and efficient memory management in Apache AsterixDB

Taewoo Kim,Chen Li,Shiva Jahangiri,Yingyi Bu,Ian Maxon,Alexander Behm,Pouria Pirzadeh,Vinayak Borkar,Jianfeng Jia,Murtadha Hubail,Michael Blow,Michael J Carey,Chen Luo

doi:10.1002/spe.2799

Taewoo Kim, Chen Li + Show 11 more

Open Access

https://doi.org/10.1002/spe.2799

Copy DOI

Journal: Software: Practice and Experience	Publication Date: Feb 17, 2020
Citations: 15	License type: publisher-specific, author manuscript

Affiliation: University of California, Irvine

Abstract

SummaryTraditional relational database systems handle data by dividing their memory into sections such as a buffer cache and working memory, assigning a memory budget to each section to efficiently manage a limited amount of overall memory. They also assign memory budgets to memory‐intensive operators such as sorts and joins and control the allocation of memory to these operators; each memory‐intensive operator attempts to maximize its memory usage to reduce disk I/O cost. Implementing such memory‐intensive operators requires a careful design and application of appropriate algorithms that properly utilize memory. Today's Big Data management systems need the ability to handle large amounts of data similarly, as it is unrealistic to assume that truly big data will fit into memory. In this article, we share our memory management experiences in Apache AsterixDB, an open‐source Big Data management software platform that scales out horizontally on shared‐nothing commodity computing clusters. We describe the implementation of AsterixDB's memory‐intensive operators and their designs related to memory management. We also discuss memory management at the global (cluster) level. We conducted an experimental study using several synthetic and real datasets to explore the impact of this work. We believe that future Big Data management system builders can benefit from these experiences.

Full Text