In this paper, we present a Programmable Packet Processing Engine suitable for deep header processing in high-speed networking systems. The engine, which has been – fabricated as part of a complete network processor, consists of a typical RISC-CPU, whose register file has been modified in order to support efficient context switching, and two simple special-purpose processing units. The engine can be used in a number of network processing units (NPUs), as an alternative to the typical design practice of employing a large number of simple general purpose processors, or in any other embedded system designed to process mainly network protocols. To assess the performance of the engine, we have profiled typical networking applications and a series of experiments were carried out. Further, we have compared the performance of our processing engine to that of two widely used NPUs and show that our proposed packet-processing engine can run specific applications up to three times faster. Moreover, the engine is simpler to be fabricated, less complex in terms of hardware complexity, while it can still be very easily programmed.