Two-stage precoding is a promising transmission strategy for frequency-division duplex (FDD) massive multiple-input multiple-output (MIMO) systems due to its large multiplexing gain with significant overhead reduction in both downlink training and feedback. In this paper, we propose a new agglomerative clustering method to significantly simplify the user clustering process. In order to suppress the residual inter-cluster interference in realistic two-stage precoding transmission, we propose an average signal-to-leakage-plus-noise ratio (SLNR)-based iterative cluster scheduling and outer precoder design scheme to achieve a balance between providing high multiplexing gain and improving per-user rate. For uniform linear arrays (ULAs), a fast implementation of the scheme with discrete Fourier transformation (DFT) approximation is proposed. The numerical results demonstrate the performance improvement of the proposed methods over the existing methods.