Machine learning-based control-oriented modelling is a promising solution to model the complex system dynamics of buildings. It can benefit learning-based controllers, e.g., Model Predictive Control and Reinforcement Learning. However, obtaining reliable models is non-trivial, especially when only limited historical data generated by the baseline controller is available. This paper analyses the modelling tasks and identifies the sources of prediction error to obtain suitable machine learning models, and then proposes a training framework to provide a well-designed training procedure, mitigating the reducible errors. Six typical machine learning methods are evaluated using simulated and real-world datasets using in- and out-of-distribution data. Comprehensive evaluation metrics are also proposed. The results of absolute error and prediction interval coverage percentage indicate that purely point estimation is not enough to provide reliable predictions, and uncertainties must be quantified. Further results from analysis of the out-of-distribution data show that although some models have satisfactory predictions on in-distribution data, their capability of interpolation and extrapolation is poor. Some predictions may deviate from the targets. Point estimation may provide over-confident results. It is, therefore, essential to quantify the uncertainty caused by the lack of data to ensure that the true value falls into the prediction interval with high probability. The decomposition of the total uncertainty provides more information and detects out-of-distribution data. These investigations demonstrated the importance of accurate point estimation and uncertainty quantification for models' reliability.