Implantable microelectrodes arrays are used to record electrical signals from surrounding neurons and have led to incredible improvements in modern neuroscience research. Digital signals resulting from conditioning and the analog-to-digital conversion of neural spikes captured by microelectrodes arrays have to be elaborated in a dedicated DSP core devoted to a real-time spike-sorting process for the classification phase based on the source neurons from which they were emitted. On-chip spike-sorting is also essential to achieve enough data reduction to allow for wireless transmission within the power constraints imposed on implantable devices. The design of such integrated circuits must meet stringent constraints related to ultra-low power density and the minimum silicon area, as well as several application requirements. The aim of this work is to present real-time hardware architecture able to perform all the spike-sorting tasks on chip while satisfying the aforementioned stringent requirements related to this type of application. The proposed solution has been coded in VHDL language and simulated in the Cadence Xcelium tool to verify the functional behavior of the digital processing chain. Then, a synthesis and place and route flow has been carried out to implement the proposed architecture in both a 130 nm and a FD-SOI 28 nm CMOS process, with a 200 MHz clock frequency target. Post-layout simulations in the Cadence Xcelium tool confirmed the proper operation up to a 200 MHz clock frequency. The area occupation and power consumption of the proposed detection and clustering module are 0.2659 mm2/ch, 7.16 μW/ch, 0.0168 mm2/ch, and 0.47 μW/ch for the 130 nm and 28 nm implementation, respectively.