ParPaRaw

Elias Stehle,Hans-Arno Jacobsen

doi:10.14778/3377369.3377372

Abstract

Parsing is essential for a wide range of use cases, such as stream processing, bulk loading, and in-situ querying of raw data. Yet, the compute-intense step often constitutes a major bottleneck in the data ingestion pipeline, since parsing of inputs that require more involved parsing rules is challenging to parallelise. This work proposes a massively parallel algorithm for parsing delimiter-separated data formats on GPUs. Other than the state-of-the-art, the proposed approach does not require an initial sequential pass over the input to determine a thread's parsing context. That is, how a thread, beginning somewhere in the middle of the input, should interpret a certain symbol (e.g., whether to interpret a comma as a delimiter or as part of a larger string enclosed in double-quotes). Instead of tailoring the approach to a single format, we are able to perform a massively parallel finite state machine (FSM) simulation, which is more flexible and powerful, supporting more expressive parsing rules with general applicability. Achieving a parsing rate of as much as 14.2 GB/s, our experimental evaluation on a GPU with 3 584 cores shows that the presented approach is able to scale to thousands of cores and beyond. With an end-to-end streaming approach, we are able to exploit the full-duplex capabilities of the PCIe bus and hide latency from data transfers. Considering the end-to-end performance, the algorithm parses 4.8 GB in as little as 0.44 seconds, including data transfers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ParPaRaw

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Jan 1, 2020
Citations: 6

Similar Papers

A Finite State Machine Compiler and Simulator
Raj Singh ... K V S H Rao
IETE Technical Review | VOL. 11
Raj Singh, et. al.Raj Singh ... K V S H Rao
01 Sep 1994
IETE Technical Review | VOL. 11

Evaluating Length of a Shortest Adaptive Homing Sequence for Weakly Initialized FSMs
Evgenii Vinarskii ... Nina Yevtushenko
-
Evgenii Vinarskii, et. al.Evgenii Vinarskii ... Nina Yevtushenko
01 Sep 2020
01 Sep 2020

GPU acceleration of finite state machine input execution: Improving scale and performance
Vanya Yaneva ... Ajitha Rajan
Software Testing, Verification and Reliability | VOL. 32
Vanya Yaneva, et. al.Vanya Yaneva ... Ajitha Rajan
08 Oct 2021
Software Testing, Verification and Reliability | VOL. 32

Deriving adaptive homing sequences for weakly initialized nondeterministic FSMs
Evgenii Vinarskii ... Aleksandr Tvardovskii
-
Evgenii Vinarskii, et. al.Evgenii Vinarskii ... Aleksandr Tvardovskii
01 Sep 2019
01 Sep 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ParPaRaw

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment