Fine-grained image classification can be considered as a discriminative learning process where images of different subclasses are separated from each other while the same subclass images are clustered. Most existing methods perform synchronous discriminative learning in their approaches. Although achieving promising results in fine-grained visual classification (FGVC) in natural images, these methods may fail in fine-grained ship classification (FGSC) problem in remote sensing (RS) images due to the highly “imbalanced fineness" and “imbalanced appearances" of ships among subclasses. To tackle the issue, we propose an asynchronous contrastive learning-based method for effective FGSC. The proposed method, which we refer to as “Push-and-Pull Network (P2Net)", includes a “push-out stage” and a “pull-in stage”, where the first stage forces all the instances to be de-correlated and then the second one groups them into each subclass. A dual-branch network is designed to separate/de-correlate the images with each other, while an Integration Module is designed to aggregate the de-correlated images into their corresponding subclass together with a Proxy-based Module designed for acceleration. In this way, the correlation between subclasses can be decoupled, which in turn makes the final classification much easier. Our method can be trained end-to-end and requires no additional annotations other than category information. Extensive experiments are conducted on two large-scale FGSC datasets (FGSC-23 and FGSCR-42). Our method outperforms other state-of-the-art approaches. Ablation experiments also suggest the effectiveness of our design. Our code is available at https://github.com/WindVChen/Push-and-Pull-Network.