Various levels of parallelism have recently been introduced in advanced microprocessors to meet the demanding computing need in digital video processing and other multimedia applications. Because many imaging algorithms are easily parallelizable, these architectural features and their wide availability at low cost have become a powerful tool in tackling both existing and new imaging applications. At the lowest level, the subword parallelism is used in the new instructions aimed at processing multiple multimedia data simultaneously. Instruction-level parallelism including subword parallelism is realized in either very long instruction word or superscalar architectures, while on-chip and/or off-chip multiprocessing capability is available for easier multiprocessor system designs. One of the difficulties in maximizing the computing throughput via parallelism has been the level of programming in that to obtain the optimal performance, assembly-level programming has typically been required. We review the architectural features in several modern microprocessors such as TMS320C60, TM-1000, PowerPC 604, Pentium II, R10000, Alpha 21264, PA-RISC 8200, UltraSPARC-II, and TMS320C80. Various obstacles to obtaining the best performance from these microprocessors with high-level and assembly languages are discussed, and several approaches to overcome these difficulties in diverse imaging applications are presented. © 1998 John Wiley & Sons, Inc. Int J Imaging Syst Technol 9: 407–415, 1998