Asynchronous Automata Processing on GPUs

Hongyuan Liu,Sreepathi Pai,Adwait Jog

doi:10.1145/3578338.3593524

Abstract

Finite-state automata serve as compute kernels for application domains such as pattern matching and data analytics. Existing approaches on GPUs exploit three levels of parallelism in automata processing tasks: 1) input stream level, 2) automaton-level, and 3) state-level. Among these, only state-level parallelism is intrinsic to automata while the other two levels of parallelism depend on the number of automata and input streams to be processed. As GPU resources increase, a parallelism-limited automata processing task can underutilize GPU compute resources. To overcome this, we propose AsyncAP, a low-overhead approach that optimizes scalability and throughput. Our insight is that most automata processing tasks have an additional source of parallelism originating from the input symbols which has not been leveraged before. By making the matching process asynchronous, which involves having parallel GPU threads process an input stream from different input locations instead of processing it serially, AsyncAP is able to significantly improve throughput and scale with input length. Detailed evaluation across 12 applications shows that AsyncAP achieves an average speedup of 58x speedup over the state-of-the-art GPU automata processing engine when the task does not have enough parallelism to utilize all GPU cores. When tasks have enough parallelism to utilize GPU cores, AsyncAP still achieves 2.4x speedup.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Asynchronous Automata Processing on GPUs

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Asynchronous Automata Processing on GPUs
Hongyuan Liu ... Sreepathi Pai
Proceedings of the ACM on Measurement and Analysis of Computing Systems | VOL. 7
Hongyuan Liu, et. al.Hongyuan Liu ... Sreepathi Pai
27 Feb 2023
Proceedings of the ACM on Measurement and Analysis of Computing Systems | VOL. 7

Trilinear Space-Time-Frequency Codes for broadband MIMO-OFDM systems
Andre L F De Almeida ... Joao C M Mota
-
Andre L F De Almeida, et. al.Andre L F De Almeida ... Joao C M Mota
01 Sep 2006
01 Sep 2006

A Parallel Scanner for the Concurrent Execution of Lexical Analyzer Tasks on Multi-Core Machines using Dynamic Task Allocation Algorithm
Vaikunta Pai T ... Nethravathi P S
International Journal of Management, Technology, and Social Sciences | VOL. -
Vaikunta Pai T, et. al.Vaikunta Pai T ... Nethravathi P S
06 Apr 2022
International Journal of Management, Technology, and Social Sciences | VOL. -

An Optimal Tag Generation Data Compression Technique for WSN’s
Usha Tiwari ... Shabana Mehfuz
-
Usha Tiwari, et. al.Usha Tiwari ... Shabana Mehfuz
01 Oct 2018
01 Oct 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Asynchronous Automata Processing on GPUs

Abstract

Talk to us

Similar Papers