Visible–Infrared Person Re-Identification (VI-ReID) aims to retrieve the identities of persons from different modalities. The significant distribution discrepancy makes this task challenging. Recent methods usually generate an intermediate modality to bridge the modality gap. However, these methods may lead to spectral confusion, and ignore the negative impact of visible channel redundancy information on establishing cross-modal correspondence. In this paper, we propose a Progressive Discrepancy Elimination Model (PDEM) for VI-ReID. Specifically, we design a Single-Channel Stripe Aggregation (SCSA) module to synthesize high-quality transitional modality images by using only raw visible and infrared spectral stripes. Meanwhile, we propose a two-stage Spectral Information Filtering (SIF) strategy, consisting of our designed Semantic Consistency Loss (SCL) and Cascade Aggregation Loss (CAL). SCL and CAL are employed in different training stages to achieve visible task-related information filtering and visible–infrared multi-scale alignment, respectively. In this way, the most relevant semantics with infrared modality in visible features are retained, and spectral correspondence is subsequently learned in a small cross-modal gap. We conduct extensive experiments on two VI-ReID datasets and achieve superior performance compared to most state-of-the-art methods. The source code of this paper is available at https://github.com/wxz0530/PDEM.