Graph neural networks (GNNs) are of great interest in real-life applications such as citation networks and drug discovery owing to GNN’s ability to apply machine learning techniques on graphs. GNNs utilize a two-step approach to classify the nodes in a graph into pre-defined categories. The first step uses a combination kernel to perform data-intensive convolution operations with regular memory access patterns. The second step uses an aggregation kernel that operates on sparse data having irregular access patterns. These mixed data patterns render CPU/GPU-based compute energy-inefficient. Von Neumann based accelerators like AWB-GCN [ 7 ] suffer from increased data movement, as the data-intensive combination requires large data movement to/from memory to perform computations. ReFLIP [ 8 ] performs resistive random access memory based in-memory (PIM) compute to overcome data movement costs. However, ReFLIP suffers from increased area requirement due to dedicated accelerator arrangement, and reduced performance due to limited parallelism and energy due to fundamental issues in ReRAM-based compute. This article presents a scalable (non-exponential storage requirement), DAC/ADC-less PIM-based combination, with (i) early compute termination and (ii) pre-compute by reconfiguring SOC components. Graph and sparsity-aware near-memory aggregation using the proposed compute-as-soon-as-ready (CAR) broadcast approach improves performance and energy further. NEM-GNN achieves ∼80–230x, ∼80–300x, ∼850–1,134x, and ∼7–8x improvement over ReFLIP, in terms of performance, throughput, energy efficiency, and compute density.