GPU-Based PostgreSQL Extensions for Scalable High-Throughput Pattern Matching

Grant Scott,Kevin Melkowski,Zachary Fields,Matthew England,Derek T Anderson

doi:10.1109/icpr.2014.329

Abstract

Numerous fields require large-scale pattern matching to achieve a variety of computational goals. Herein, we present novel graphics processing unit (GPU) extensions that facilitate high-throughput pattern matching in a PostgreSQL database. We have developed an extension framework to perform data block processing of large pattern data sets, using a stream processing design that results in global k-nearest neighbor matches. This framework was specifically designed to support pattern matching on GPU from within the database environment. This approach avoids the necessity of storing an entire data set onto GPU hardware, which facilitates significant scale-up of pattern databases. This provides enormous potential to incorporate or exploit auxiliary (meta)data as part of the pattern matching process, as well as pipelining the results into traditional relational algebra expressions. By pipelining pattern matching results into a relational expression, the power of the database can be leveraged to build result sets based on various parameterized correlations between the query pattern(s) and the results. In this preliminary work, we have integrated GPU-based high-throughput p-norm metric functions into the database server. This allows one to design heterogeneous data processing techniques that combine large-scale content-based image retrieval (CBIR) with traditional data processing capabilities of the database such as relational, spatial, or text search. We present timing characteristics for various pattern sizes and metric combinations, as well as address the balancing of database and GPU parameterization. Our feature vector datasets range from 18 to 85 GB in database table storage size, reaching 100 million 128 dimensional vectors. We are able to efficiently execute global top k searches from within the database.

Full Text