Cellular nonlinear networks (CNNs) constitute a very powerful paradigm for single instruction/multiple data computers with fine granularity. Analog and mixed-signal implementations have proven to be suitable for applications in high-speed image processing, robot control, medical signal processing, and many more. Especially digital emulations on field-programmable gate arrays (FPGAs) allow the development of general-purpose computers based on the CNN universal machine with an inherently parallel structure, a high degree of flexibility and a superior computational precision. However, these emulations turn out to be inefficient for the execution of binary operations, which account for more than two-thirds of all processing steps in a typical CNN algorithm. In this contribution, we present an architecture for the emulation of CNNs that supports both a fast and efficient processing of binary images, and a high computational accuracy when needed. With the FPGA implementation of this architecture, a speed-up factor of up to 5 is achieved for binary-data operations.