Abstract

The work illustrates the use of platform-based design to achieve efficiently-configured hardware-software system solution that can meet the conflicting demands of high performance, low power and quick turnaround time for embedded system development. It presents embedded system design techniques using field-programmable gate arrays (FPGAs) for image and video processing application. Here, by identifying, building and integrating all necessary hardware and software components, an embedded implementation of a kernel-based mean shift (KBMS) object tracking algorithm has been proposed [1]. To fulfill the specific needs of hardware/software implementation Virtex-5 FXT FPGA device (which has an embedded PowerPC processor) available on Xilinx ML-507 platform has been used [2]. The work begins with the development of required configurations of Xilinx ML-507 FPGA platform for developing architectures and algorithms in the integrated hardware software environment. This requires configurations of FPGA platform peripherals using application programmer interface (API) and other required hardware building blocks. This configuration is necessary for accessing the image pixels by the FPGA and for testing of various architectural blocks designed subsequently. It has been used for capturing real-time 640×480VGA resolution video frame at 60 frames per second in the double data rate (DDR2) synchronous dynamic random access memory (SDRAM). Fig. 1 shows a top-level view of the complete system. On the configured platform, we have proposed and developed various architectural building blocks that are mostly generic in nature and can be widely used in many practical image and video processing systems. These image processing hardware units invariably need many complex arithmetic operations. These complex operations such as division, square root etc. are realized through fixed-point binary logarithmic and antilogarithmic units. In the proposed architectures, most of the operations are performed using 32-bit fixed-point format. As the architectures are based on logarithmic number system (LNS), they have the advantages of consuming minimal logic resources and are able to process. large datasets by realizing time-critical processes in the available block RAMs (BRAMs) and DSP slices. Architecture for global image thresholding operation has been proposed that results in a resource-efficient FPGA implementation of the computation of between class variance (BCV) for realizing the Otsu's image thresholding algorithm [3]. The compute-intensive BCV requires the computation of normalized cumulative histogram and normalized cumulative intensity area. We have next proposed an improved label-equivalence based connected component labeling algorithm [3] that works on the binary images obtained from the image thresholding unit and identifies an object on a video frame. The proposed algorithm improves upon the Stefano-Bulgarelli (SB) algorithm by modifying its equivalence handling procedure, and removes the partial merging problem associated with the SB algorithm. The improved algorithm is implemented on the embedded PowerPC processor. Finally, all the hardware building blocks and algorithms described so far are utilized for an embedded implementation of the KBMS algorithm. The required application-specific architectural building blocks have been proposed for its embedded realization on Xilinx ML-507 platform. To understand issues related to the embedded realization of the KBMS algorithm, a MATLAB/C implementation is created. Subsequently, hardware architectures have been proposed for the time-critical parts, namely, the computations of weighted local histogram, kernel-smoothed local histogram (KSLH), Bhattacharyya coefficient based local similarity measure, center of gravity (COG) and new mean shift location. The embedded design utilizes soft IPs that include, joint test action group (JTAG) controller, BRAMs controller, multi-port memory controller (MPMC), processor local bus (PLB), inter-integrated circuit (I2C) controller and the universal asynchronous receiver-transmitter (UART) controller. The hard IPs includes the PowerPC 440 processor, BRAMs, digital clock manager (DCM) and DSP48E slices. Device utilization summary for implementing the KBMS algorithm shows that it utilizes 7.8% FPGA slices, 8.1% BRAMs and 35.9% DSP48E slices

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call