Quick and accurate building damage assessment following a disaster is critical to making a preliminary estimate of losses. Remote sensing image analysis based on convolutional neural network and its relatives has shown a growing potential in this task, but faces the challenge of collecting dense pixel-level annotations. In this letter, we propose a novel weakly supervised semantic segmentation (WSSS) method based on image-level labels for pixel-wise damaged building extraction from post-earthquake high resolution remote sensing (HRRS) images. The proposed method aims to improve the quality of the class activation map (CAM) to boost model performance. To be specific, a multi-scale dependence (MSD) module and a spatial correlation refinement (SCR) module are designed by considering the special characteristics of damaged building, and are integrated into an encoder-decoder network. The former is used for complete and dense localization of damaged buildings in CAM, and the latter contributes to noise suppression. Extensive experimental evaluations over three datasets are conducted to confirm the effectives of the proposed approach. Both generated CAMs and extracted damaged buildings results of our methods are better than that of current state-of-the-art methods.