Time series prediction is a current research hotspot in deep learning. However, due to the complex nature of time series data, the modelling in this task is often highly non-convex, which can make the final convergence unstable. To address this challenge, recent works have proposed deep mutual learning frameworks that allow models to learn from both ground truth and knowledge of other models in order to locate a better convergence point. However, a key disadvantage of deep mutual learning is that models that converge to poor local optima may still share their knowledge, limiting overall performance. To overcome this limitation, we propose a new learning framework called mutual adaptation, which selects a prototype model that has the least error among all the models in the framework as the common teacher model. In addition, we incorporate a strategy of learning from each individual model's best local optimum in the history of training. Our experimental results show that, on average across multiple datasets, our method improves the performance of both Informer and LSTM models compared to deep mutual learning by 4.73% in MAE and 6.99% in MSE for Informer, and 11.54% in MAE and 18.15% in MSE for LSTM. We also demonstrate the importance of memory of individual best local optima and provide sensitivity analysis and visualization of error and the loss descending process. Our method represents a new state-of-the-art in group learning for time series prediction.