Multi-instance learning (MIL) has become the mainstream solution for processing super-high resolution whole slide images (WSIs) with the pyramidal structure in digital pathology. Current MIL-based methods usually learn features from WSI at a specific magnification, ignoring the multi-scale information contained in the WSI and the comparative learning of global features. In addition, the lack of instance labeling can lead to weak model supervision, which may compromise the model’s ability to discriminate fine-grained features, ultimately affecting bag-level feature learning. Therefore, we propose a novel multi-scale multi-instance contrastive learning framework to learn more discriminative feature representation across scales for pathological WSI classification. The proposed method begins with a two-stream feature aggregator module, which extracts both bag embeddings and selects the representative instances simultaneously. Following the bag embedding branch, a multi-scale contrastive learning module is designed to learn the global feature comparisons of WSIs across multiple scales by leveraging its inherent pyramid structure. Additionally, based on the instances selection branch, a patch-level classifier is combined with the bag-level classifier to jointly optimize the model training process, enhancing the supervision of the model. The proposed framework is evaluated on three publicly available WSI datasets, achieving an area under the curve of 95.8%, 95.5%, and 88.2%, respectively, consistently outperforming all the compared methods including single- and multi-scale ones.