Networks of sensors must process large amounts of intermittently-available data in situ. This motivates the investigation of means for achieving high performance when required, but ultra-low-power dissipation when idle. One approach to this challenge is the use of embedded multiprocessor systems, leading to trade-offs between parallelism, performance, energy efficiency, and cost. To evaluate these trade-offs and to gain insight for future system designs, this article presents the design, implementation, and evaluation of a miniature, energy-scalable, 24-processor module, L24 , for use in embedded sensor systems. Analytic results and empirical evidence motivating such embedded multiprocessors is provided, and a parallel fixed-point fast Fourier transform implementation is presented. This application is used as a challenging but realistic evaluator of the presented hardware platform. Through a combination of hardware measurements, instruction-level microarchitectural simulation, and analytic modeling, it is demonstrated that the platform provides idle power dissipation over an order of magnitude lower than systems employing a monolithic processor of equivalent performance, while dynamic power dissipation remains competitive. Taking into account both application computation and interprocessor communication demands, it is shown that there may exist an optimum operating voltage that minimizes either time-to-solution, energy usage, or the energy-delay product. This optimum operating point is formulated analytically, calibrated with system measurements, and evaluated for the hardware platform and application presented.