The remarkable progress in cross-modal retrieval relies on accurately-annotated multimedia datasets. In practice, most existing datasets used for training cross-modal retrieval models are automatically collected from the Internet to reduce data collection costs. However, it inevitably contains mismatched pairs, i.e. , noisy correspondences, thus degrading the model performance. Recent advances utilize the predicted similarity distribution of individual samples for noise validation and correction, which easily faces two challenging dilemmas: 1) confirmation bias and 2) unstable performance with increasing noise. In light of the above, we propose a generalized B ias M itigation and R epresentation O ptimization framework (BMRO). Specifically, we propose a Bias Estimator (BE) to estimate the unbiased confidence factor of a sample by contrasting it against its nearest neighbors. Unbiased confidence factor can precisely adjust sample contribution and enhance accurate sample division. This facilitates the Adaptive Representation Optimizer (ARO) in providing tailored optimization strategies for clean and noisy samples. ARO performs contrastive learning between clean samples and generated hard samples, thus promoting the generalizability and robustness of the representation. Besides, it utilizes complementary learning to reduce incorrect guidance from noisy samples. Extensive experiments on five visual-text benchmarks verify that our BMRO can significantly improve the matching accuracy and performance stability against noisy correspondences.