The stock index summarizes the overall movement of a group of stocks, typically calculated as a weighted average of constituent stock prices. Stock Index Prediction (SIP) aids investors in assessing economic performance and making informed decisions. Traditional SIP methods rely heavily on historical index data, neglecting valuable insights from macro-financial indicators and stock prices. However, challenges arise when incorporating these data: (1) Including numerous unfiltered macro-financial indicators introduces noise. (2) Using non-correlated stock prices can be counterproductive. (3) Supervised training on stochastic time series can lead to overfitting. To overcome these issues, we propose a multi-scale contrast approach for stock index prediction with adaptive stock fusion (MCSIP). This method identifies highly correlated macro-financial indicators and stocks, optimizing the model through self-supervised multi-scale contrastive learning. MCSIP relies on contrastive learning in the time series domain to extract robust contextual information from stock index time series, mitigating stochastic data backpropagation and ensuring reliable representation. Experiments on US and Chinese stock datasets demonstrate superior performance compared to existing methods.