In recent years, fast computation, low power, and small footprint are the key motivations for building SNN hardware. The unique features of SNN hardware have not been fully exploited, where the computation speed and energy efficiency of the SNN hardware can be improved according to the sparse spiking and non-uniform traffic of SNN. In this paper, we propose a 5.28-mm <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$^{2}~4096$ </tex-math></inline-formula> -neuron 1M-synapse energy-efficient digital SNN hardware that can achieve ultra-low energy per synaptic operation of 4.5 pJ. The proposed neuron computing unit is implemented in pipeline architecture to achieve high synaptic processing speed. The proposed spike processing unit can significantly increase the processing speed of the neuron core by <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.9\times $ </tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$9.4\times $ </tex-math></inline-formula> when the spike injection rate is 50% and 10%, respectively. Besides, the increase in the processing speed of the neuron core leads to a reduction in energy consumption of up to 81.5%. An event-driven clock gating circuit that can reduce the power consumption of the proposed neuron block by more than 70% is proposed in this paper. This paper proposes a supervised STDP+ algorithm for SNN training, and the classification accuracy of the MNIST digits is 89.6% with 73.6% weight sparsity of the output layer.