Abstract

Embedded processors for video image recognition require to address both the cost (die size and power) versus real-time performance issue, and also to achieve high flexibility due to the immense diversity of recognition targets, situations, and applications. This paper describes IMAP, a highly parallel SIMD linear processor and memory array architecture that addresses these trading-off requirements. By using parallel and systolic algorithmic techniques, despite of its simple architecture IMAP achieves to exploit not only the straightforward per image row data level parallelism (DLP), but also the inherent DLP of other memory access patterns frequently found in various image recognition tasks, under the use of an explicit parallel C language (1DC). We describe and evaluate IMAP-CE, a latest IMAP processor, which integrates 128 of 100MHz 8 bit4-way VLIW PEs, 128 of 2KByte RAMs, and one 16 bit RISC control processor, into a single chip. The PE instruction set is enhanced for supporting 1DC codes. IMAP-CE is evaluated mainly by comparing its performance running 1DC codes with that of a 2.4GHz Intel P4 running optimized C codes. Based on the use of parallelizing techniques, benchmark results show a speedup of up to 20 for image filter kernels, and of 4 for a full image recognition application.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.