PhilDB: the time series database with built-in change logging

Andrew Macdonald

doi:10.7717/peerj-cs.52

Abstract

PhilDB is an open-source time series database that supports storage of time series datasets that are dynamic; that is, it records updates to existing values in a log as they occur. PhilDB eases loading of data for the user by utilising an intelligent data write method. It preserves existing values during updates and abstracts the update complexity required to achieve logging of data value changes. It implements fast reads to make it practical to select data for analysis. Recent open-source systems have been developed to indefinitely store long-period high-resolution time series data without change logging. Unfortunately, such systems generally require a large initial installation investment before use because they are designed to operate over a cluster of servers to achieve high-performance writing of static data in real time. In essence, they have a ‘big data’ approach to storage and access. Other open-source projects for handling time series data that avoid the ‘big data’ approach are also relatively new and are complex or incomplete. None of these systems gracefully handle revision of existing data while tracking values that change. Unlike ‘big data’ solutions, PhilDB has been designed for single machine deployment on commodity hardware, reducing the barrier to deployment. PhilDB takes a unique approach to meta-data tracking; optional attribute attachment. This facilitates scaling the complexities of storing a wide variety of data. That is, it allows time series data to be loaded as time series instances with minimal initial meta-data, yet additional attributes can be created and attached to differentiate the time series instances when a wider variety of data is needed. PhilDB was written in Python, leveraging existing libraries. While some existing systems come close to meeting the needs PhilDB addresses, none cover all the needs at once. PhilDB was written to fill this gap in existing solutions. This paper explores existing time series database solutions, discusses the motivation for PhilDB, describes the architecture and philosophy of the PhilDB software, and performs an evaluation between InfluxDB, PhilDB, and SciDB.

Highlights

PhilDB was created to store changing time series data, which is of great importance to the scientific community
Open-source ‘big data’ time series database offerings don’t support the ability to track any changed values out of the box
Contrasting with InfluxDB, SciDB met the requirement of time series storage with update logging but didn’t meet the requirement for simplicity to deploy and use

Summary

INTRODUCTION

PhilDB was created to store changing time series data, which is of great importance to the scientific community. Existing proprietary and open-source database solutions for storing time series fail to provide for effortless scientific analysis. Most fail to provide the ability to store any changes to a time series over time. Most current open-source database systems are designed for handling ‘big data,’ which in turn requires extreme computing power on a cluster of servers. This paper will explore existing time series database solutions. It will examine the need for a liberally licensed, open-source, deployed time series database, that is capable of tracking data changes, and look at why the existing systems that were surveyed failed to meet these requirements. An evaluation will be performed to compare PhilDB to the most promising alternatives of the existing open-source systems

BACKGROUND

MOTIVATION

EVALUATION

Evaluation dataset

Evaluation method

CONCLUSION