An efficient application-specific architecture is presented for a real-time edge detection system. The architecture is based on the cooperating-data-path model, which allow both the throughput and the area to be optimized for this recursive algorithm. Careful scheduling of the operations on the partly parallel, partly shared hardware has allowed the load to be balanced on each of the four data paths. In this way, the inherently high degree of concurrency in the algorithm has been effectively exploited in the parallel pipelined hardware. The layout of the data paths has been generated by means of powerful CAD tools and the use of a parameterizable functional-building-block library. The corresponding global controller has been partitioned in order to optimize the critical path. This has increased the achievable clock rate even further, up to 10 MHz. The stringent I/O requirements have been taken into account. The resulting ASIC has been verified by register-transfer simulation. It is more than twice as fast as existing designs. The effectiveness of the cooperating-data-path model is thus clearly substantiated by this large, practical test vehicle.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>