• This work can not only reduce the labeling burden and the training time, but also maintain the performance of the model. • The indicator fusion algorithm can improve the effectiveness of core image extraction. • We propose an iterative optimization selection module to make subsequent batch extraction more accurate. • Comparing with the existing works, this work can work on datasets that are not labeled with any lesion location. With the advancement of technology in the big data era, the amount of data in the medical field has increased considerably, which has promoted the rapid development of intelligent medical diagnoses. Lesion localization plays an indispensable role in the medical field. However, this approach has not been widely applied in realizing intelligent diagnoses. The effect of lesion localization depends on the training of a convolutional neural network, for which a large amount of medical image data with lesion location labeling is required. Although a considerable amount of medical image data is available, the quality varies and most of the data are not labeled in terms of the lesion location. This labeling process is not only time-consuming and laborious but also requires professional knowledge. To solve this problem and facilitate the development of lesion localization, we propose a novel core dataset extraction architecture, which is a general architecture for extracting the core dataset from unlabeled medical big data for lesion localization. In the architecture, the comprehensive core degree of the images is computed by using three evaluation indicators and an indicator fusion algorithm. In addition, we propose an iterative optimization selection module to enhance the performance of the subsequent batch extraction. The experimental results show that the proposed method only needs to extract 30% of the training data to achieve the training effect of the entire training data, thereby considerably reducing the amount of required human resources.