Energy consumed for transferring data across the processor memory hierarchy constitutes a large fraction of total system energy consumption, and this fraction has steadily increased with technology scaling. This article presents a near-DRAM acceleration (NDA) architecture wherein lightweight processors (LWPs) with the same ISA as their host processor are 3D-stacked atop commodity DRAM devices in a standard memory module to efficiently process data. In contrast to previous architectures, the authors' NDA architecture requires negligible changes to commodity DRAM device and standard memory module architectures. This allows the NDA to be more easily adopted in both existing and emerging systems. Experiments demonstrate that, on average, the authors' NDA-based system consumes almost 65 percent less energy at nearly two times higher performance than the baseline system.