AbstractIn general, an optimum quantizer which gives the minimum distortion at a given bit‐rate is uniquely determined when the statistical characteristics of the source signal are given. This quantizer, however, is not optimum for the human visual characteristics; and a quantizer which is designed to minimize the estimated distortion (estimated noise) reflecting the human visual characteristics is more likely to be optimum. In the optimum quantization of orthogonal transform schemes, theoretical quantizer design schemes considering only the human visual characteristics of the spatial frequency direction have so far been reported. However, there is another human visual characteristic, that is, the noise‐masking effect which is the difference in the detectability of noise, depending on the local variation of a video signal (activity). For this, only the quantizers which use this effect experimentally have been reported; and they have not been studied theoretically.In this paper, it is proposed, as the first step, that the visual sensitivity to noise be expressed on two axes, i.e., spatial frequency and “activity.” A three‐dimensional (3‐D) structure quantization cube is then proposed which is obtained by minimizing the noise estimated by the visual sensitivity of the noise at the given bit‐rate and the theoretical design scheme of this optimum quantization cube is described. Finally, the effectiveness of the quantization cube designed by this theoretical scheme is compared with that of the conventional quantization schemes by a computer simulation using real picture data.