Prefix Sum Research Articles

This paper explores efficient ways to use flash memory to store online analytical processing (OLAP) data. The particular type of queries considered are range queries using the aggregate functions SUM, COUNT and AVG. The asymmetric cost of reads and writes for flash memory gives higher importance to how updates are handled in a flash memory environment. A popular data structure used for answering OLAP range-sum queries is the prefix sum cube. It allows the range-sum query to be answered in constant time. However, updating the prefix sum cube is very expensive. To overcome this, the Δ-tree was proposed by Chun et al. (Dynamic update cube for range-sum queries. Proc. Int. Conf. Very Large Data Bases, San Francisco, CA, USA, 2001, pp. 521–530. Morgan Kaufmann Publisher). The Δ-tree stores all updates to the prefix sum cube in a separate r-tree. This approach worked well for the hard disk where in-place updates are relatively cheap. However, for flash memory where in-place updates are very expensive, the Δ-tree performs very poorly. We take a four-pronged approach to overcome the problem of expensive in-place updates. The first is efficient caching of updates in RAM. The second is writing out whole trees from RAM to flash memory instead of incrementally updating a disk resident tree. The third is we allow users to trade bounded amounts of accuracy for less updates via lossy compression. Finally, we use a quadtree index structure instead of the R-tree. We prove that the quadtree compression problem is NP-complete. A greedy heuristic is proposed to find near optimal solutions in polynomial time. Various experiments were conducted to compare the proposed algorithms against the existing Δ-tree. The results show that our algorithms consistently outperformed Δ-tree by factors of between 10 and 100. This demonstrates the importance of designing flash memory customized algorithms for OLAP range queries. In addition, among our algorithms, the error bound solutions with a small error bound setting significantly outperform the accurate solution in terms of performance for a variety of parameter settings. This indicates that the error bound algorithms offer users an effective trade-off between execution time and accuracy.

Purpose: To investigate the feasibility and the potential speed gain of GPU accelerated dose calculation for proton pencil beam algorithms with heterogeneity correction and to offer solutions to obstacles in implementing our algorithm. Method and Materials: We implemented our in‐house proton dose calculation system on an NVIDIA GTX 280 graphic card and an Intel Xeon 2.83 GHz processor using the Compute Unified Device Architecture environment. Several key techniques and strategies were employed in order to optimize the performance of the GPU code. (1) Modified the scaling algorithm used to calculate lateral spreading of the proton beam in the presence of tissue inhomogeneities. (2) Used an incremental ray tracing algorithm to reduce the memory requirement for each ray allowing for better parallelization. (3) The penalty of massive non‐coalesced memory reads required for convolution/superposition was alleviated by texture fetching. (4) Parallel algorithms such as reduction and prefix sum were implemented on GPU to help avoid data transfer between the CPU and GPU memory. The performance of both implementations was evaluated on a prostate clinical case. Results: With no loss in accuracy, the dose calculation time per beamlet with our GPU implementation ranges from 200 – 500 ms compared to 15 – 40 s on CPU. Approximately 80 times speed gain is achieved for various number of convolution/superposition steps. Conclusions: GPU‐based proton dose calculation is feasible with adapted algorithms and proper implementation techniques. Close to two magnitude of performance gain can be achieved with typical hardware. We believe GPU‐based fast dose calculation can reduce the routine treatment planning workload as well as provide a feasible solution to the realization of online adaptive intensity modulated proton therapy.

Prefix Sum Research Articles

Related Topics

Articles published on Prefix Sum

Scalable GPU graph traversal

Flexible Management on BSP Process Rescheduling: Offering Migration at Middleware and Application Levels

Energy cost evaluation of parallel algorithms for multiprocessor systems

Fast and Simultaneous Data Aggregation Over Multiple Regions in Wireless Sensor Networks

Movie-based representation of reduction operations in numerical computing

Efficient Updates for OLAP Range Queries on Flash Memory

TU‐C‐BRA‐09: High‐Performance Dose Calculation for Proton Radiotherapy Using GPU

Simultaneous aggregate sum retrieval from multiple regions in sensor networks by distributed data cubes

Two-tree algorithms for full bandwidth broadcast, reduction and scan

A Formalization of Powerlist Algebra in ACL2

Permutation algorithms on optical multi-trees

Succinct indexable dictionaries with applications to encoding k -ary trees, prefix sums and multisets

Automatic inversion generates divide-and-conquer parallel programs

The cell probe complexity of succinct data structures

O(log*n) algorithms on a Sum-CRCW PRAM

On a class of cell circuits

Hypercube computations on partitioned optical passive stars networks

On a class of cell circuits

CGMGRAPH/CGMLIB: Implementing and Testing CGM Graph Algorithms on PC Clusters and Shared Memory Machines

Constant time fault tolerant algorithms for a linear array with a reconfigurable pipelined bus system

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Prefix Sum Research Articles

Related Topics

Articles published on Prefix Sum

Scalable GPU graph traversal

Flexible Management on BSP Process Rescheduling: Offering Migration at Middleware and Application Levels

Energy cost evaluation of parallel algorithms for multiprocessor systems

Fast and Simultaneous Data Aggregation Over Multiple Regions in Wireless Sensor Networks

Movie-based representation of reduction operations in numerical computing

Efficient Updates for OLAP Range Queries on Flash Memory

TU‐C‐BRA‐09: High‐Performance Dose Calculation for Proton Radiotherapy Using GPU

Simultaneous aggregate sum retrieval from multiple regions in sensor networks by distributed data cubes

Two-tree algorithms for full bandwidth broadcast, reduction and scan

A Formalization of Powerlist Algebra in ACL2

Permutation algorithms on optical multi-trees

Succinct indexable dictionaries with applications to encoding k -ary trees, prefix sums and multisets

Automatic inversion generates divide-and-conquer parallel programs

The cell probe complexity of succinct data structures

O(log*n) algorithms on a Sum-CRCW PRAM

On a class of cell circuits

Hypercube computations on partitioned optical passive stars networks

On a class of cell circuits

CGMGRAPH/CGMLIB: Implementing and Testing CGM Graph Algorithms on PC Clusters and Shared Memory Machines

Constant time fault tolerant algorithms for a linear array with a reconfigurable pipelined bus system