In this article, we target approximate computing for arithmetic circuits, focusing on the most complex and power-hungry units: hardware multipliers. Driven by the lack of a clear solution on the energy-error efficiency of existing approximate multiplication techniques, we present a new, efficient, and easily applied approximation design, as well as explore the current state-of-the-art design space. We show that the proposed approximation scheme can be equally applied at design time to enable synthesis of customized approximate multiplier circuits and at runtime to support dynamic approximation tuning scenarios. We achieve significant gains-up to 69-percent energy and 64-percent area savings with respect to accurate designs-by proposing hybrid approximation performed by two independent techniques that reduce both the depth (through perforation) and the width (through rounding) of the partial products accumulation tree. The corresponding runtime approximation solution delivers energy gains of up to 47 percent, introducing negligible area. More importantly, we show that design solutions configured through the proposed approach form the Pareto frontier of the energy-error space when considering direct quantitative comparisons with existing state of the art.