Mask optimization is essential in the resolution scaling of optical lithography due to its strong ability to overcome the optical proximity effect. However, it often demands extensive computation in solving the nonlinear optimization problem with a large number of variables. In this paper, we use a set of basis functions to represent the mask patterns, and incorporate this representation into the mask optimization at both the nominal plane and various defocus conditions. The representation coefficients are updated according to the gradient to the coefficients, which can be easily obtained from the gradient to the pixel variables. To ease the computation of the gradient, we use an adaptive method that divides the optimization into two steps, in which a small number of kernels is used as the first step, and more kernels are used for fine optimization. Simulations performed on two test patterns demonstrate that this method can improve the optimization efficiency by several times, and the optimized patterns have better manufacturability compared with regular pixel-based representation.