The exponential growth of data traffic, which is not expected to stop anytime soon, brought about a vast amount of advancements in the networking field. Latest network interfaces support data rates in the range of 40 Gb/s and higher. This, however, does not guarantee higher packet processing speeds which are limited due to the overheads imposed by the architecture of the network stack. Nevertheless, there is a great need for a speedup in the forwarding engine, which is the most important part of a high-speed router. For this reason, many software-based and hardware-based solutions have emerged recently with a goal of increasing packet processing speeds. An operating system’s networking stack is conceived for general purpose communications rather than high-speed networking applications. In this paper, we investigate multiple approaches that attempt to improve packet processing performance on server-class network hosts, either by using software, hardware, or the combination of the two. We survey various solutions, among which some are based on the Click modular router, which offloads its functions on different types of hardware like graphics processing units, field programmable gate arrays or different cores among different servers with parallel execution. Furthermore, we explore other software solutions which are not based on the Click modular router. We compare them in terms of the domain in which they operate (user-space or kernel-space). Then we compare them based on their use of zero-copy techniques, batch packet processing, and parallelism. We also discuss different hardware solutions and compare them additionally in terms of the type of hardware that they use, their usage of CPU and how they connect with it. Furthermore, we discuss the integration possibilities in virtualized environments of the described solutions, their constraints and their requirements. At last, we discuss the latest approaches and the future directions in the field of fast packet processing.