Construction waste is an inevitable byproduct of urban renewal, causing severe pressure on the environment, health, and ecology. Accurately estimating the production of construction waste is crucial for assessing the consumption of urban renewal. However, traditional manual estimation methods rely heavily on statistical data and historical experience, which lack flexibility in practical applications and are time-consuming and labor-intensive. In addition, their accuracy and timeliness need to be improved urgently. Fortunately, with the advantages of high-resolution remote sensing images (HRSIs) such as strong timeliness, large amounts of information, and macroscopic observations, they are suitable for the large-scale dynamic change detection of construction waste. However, the existing deep learning models have a relatively poor ability to extract and fuse features for small and multi-scale targets, and it is difficult to deal with irregularly shaped and fragmented detection areas. Therefore, this study proposes a Multi-scale Target Attention-Enhanced Network (MT-AENet), which is used to dynamically track and detect changes in buildings and construction waste disposal sites through HRSIs and accurately estimate the annual production of urban construction waste. The MT-AENet introduces a novel encoder–decoder architecture. In the encoder, ResNet-101 is utilized to extract high-level semantic features. A depthwise separable-atrous spatial pyramid pooling (DS-ASPP) module with different dilation rates is constructed to address insufficient receptive fields, resolving the issue of discontinuous holes when extracting large targets. A dual-attention mechanism module (DAMM) is employed to better preserve positional and channel details. In the decoder, multi-scale feature fusion (MS-FF) is utilized to capture contextual information, integrating shallow and intermediate features of the backbone network, thereby enhancing extraction capabilities in complex scenes. The MT-AENet is used to extract buildings and construction waste at different periods in the study area, and the actual production and landfill volume of construction waste are calculated based on area changes, indirectly measuring the rate of urban construction waste resource conversion. The experimental results in Changping District, Beijing demonstrate that the MT-AENet outperforms existing baseline networks in extracting buildings and construction waste. The results of this study are validated according to government statistical standards, providing a promising direction for efficiently analyzing the consumption of urban renewal.