Abstract

Vast majority of the data warehouses have less than few terabytes of data and their performance for complex queries on traditional database systems are often not very satisfactory. Data warehouse appliances have been announced by vendors (HP Oracle Exadata Storage server, HP Neoview, Neteeza etc.) to address this burgeoning need. Most of these involve creating a large parallel database systems using scale-out of commodity machines and/or pushing filters into disk retrieval system to reduce the data coming to memory; these done along the lines pioneered by research projects such as Gamma, Bubba and other prior database machine research. These approaches deliver performance by deploying many CPUs, large amount of memory, large number of disk-heads & disk space and in effect extracting performance by under utilizing the resources -- albeit very inexpensive commodity resources. In contrast we propose a database system in a box (i.e., a single system) that can deliver high performance for complex queries while utilizing much less resources (memory, disks etc.); i.e., better resource utilization and therefore lower cost. This approach consists of using column store (pioneered in the Bubba project) which has the effect of 1) reducing the need for large number of disk heads (i.e., I/O bandwidth); and 2) reducing the need for large amount of memory for achieving memory-resident query execution. Having mitigated the disk I/O problem using column store & memory, the Von Neumann bottleneck becomes the force majeure. This problem has been pursued by database researchers in the context of cache-conscious query execution. Unfortunately, traditional CPUs provide limited control to page the data into the cache and retain it there to leverage the cache effectively. Our approach is to leverage a custom dataflow machine that can be coupled with a large memory and thereby practically eliminating the Von Neumann bottleneck. Besides mitigating this bottleneck, the exploitation of fine-grained pipelined and operator parallelism in hardware provides significant performance improvement. This results in a low-cost high-performance database appliance for vast majority of the data warehouse market. Kickfire has shown that such an appliance can deliver both price/performance and raw performance as compared to the competitive approaches. Note that this high performance appliance does not preclude leveraging scale-out; i.e., it can itself be used to scale-out to a much larger database in the future.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call