Computed Tomography (CT) image reconstruction algorithms such as convolution back-projection (CBP) and algebraic reconstruction technique (ART) are highly compute-intensive for today's single processor systems. In this work, we investigate the suitability of TM-1000 media processor and Analog Device's ADSP 21160 as a compute engine for the execution of image reconstruction algorithms. Philips Trimedia TM-1000, a very large instruction word (VLIW) processor, is a high performance media processor optimized for real-time processing of audio, video, graphifcs, and communication data streams. TM-1000 has a high performance digital signal processor (DSP) core, supported by multiple functional units. The DSP core and the functional blocks operate in parallel, driven by a mix of RISC, multimedia, SIMD-type DSP and floating point instructions. A typical DSP such as Analog Device's ADSP 21160 is based on super harvard architecture (SHARC) and is an optimized processor for digital signal processing applications. It has two sets of computation units. Each computation unit comprises of three functional blocks namely, arithmetic and logic unit (ALU), multiplier and shifter unit. ADSP 21160 supports single instruction multiple data (SIMD) computation model to handle dual computation units. Both sets of computation units operate concurrently. We compare the performance of TM-1000 media processor and ADSP 21160 DSP processor to execute the image reconstruction algorithms by comparing the execution time of CBP and ART algorithms on them. The image reconstruction algorithms normally break down to a repetitive multiply-accumulate operation (MAC). All DSP processors support single-cycle MAC and zero-overhead loop instructions. The media processors normally do not support single-cycle MAC instruction and zero-overhead loop functionality. However, media processors are equipped with multiple functional units that perform multiple operations in a single instruction time. A DSP processor is expected to execute image reconstruction algorithms much faster than a multimedia processor. However, the experimental results show that the execution time on DSP and media processor are more or less same when 16-bit representation is used for data. When floating point data is used for implementation, DSP processor has an edge. ADSP 21160 gives same execution time for both floating point and 16-bit fixed point data. But, the execution time almost doubles when data is in floating point format on the media processor, compared to a 16-bit implementation. This can be attributed to the fact that TM-1000 processes two sets of operands in a single instruction time when data is in 16-bit format. The executable code for ADSP 21160 was generated from an optimized assembly language program whereas the executable code for TM-1000 was generated from an optimized ‘C’ with a few custom operations.
Read full abstract