The realization of brain-scale spiking neural networks (SNNs) is impeded by power constraints and low integration density. To address these challenges, multi-core SNNs are utilized to emulate numerous neurons with high energy efficiency, where spike packets are routed through a network-on-chip (NoC). However, the information can be lost in the NoC under high spike traffic conditions, leading to performance degradation. This work presents NEXUS, a 16-core SNN with a diamond-shaped NoC topology fabricated in 28-nm CMOS technology. It integrates 4096 leaky integrate-and-fire (LIF) neurons with 1M 4-bit synaptic weights, occupying an area of 2.16 mm2. The proposed NoC architecture is scalable to any network size, ensuring no data loss due to contending packets with a maximum routing latency of 5.1μs for 16 cores. The proposed congestion management method eliminates the need for FIFO in routers, resulting in a compact router footprint of 0.001 mm2. The proposed neurosynaptic core allows for increasing the processing speed by up to 8.5× depending on input sparsity. The SNN achieves a peak throughput of 4.7 GSOP/s at 0.9 V, consuming a minimum energy per synaptic operation (SOP) of 3.3 pJ at 0.55 V. A 4-layer feed-forward network is mapped onto the chip, classifying MNIST digits with 92.3% accuracy at 8.4Kclassification/ s and consuming 2.7-μJ/classification. Additionally, an audio recognition task mapped onto the chip achieves 87.4% accuracy at 215-μJ/classification.