Machine Learning (ML) algorithms are emerging in various industries as a powerful complement/alternative to traditional data regression methods. A major reason is that, unlike deterministic models, they can be used even in the absence of detailed phenomenological knowledge. Not surprisingly, the use of ML algorithms is being explored also in heat transfer applications. It is of particular interest in systems dealing with complex geometries and underlying phenomena (e.g. fluid phase change, multi-phase flow, heavy fouling build-up). However, heat transfer systems present specific challenges that need addressing, such as the scarcity of high-quality data, the inconsistencies across published data sources, the complex (and often correlated) influence of inputs, the split of data between training and testing sets, and the limited extrapolation capabilities to unseen conditions. In an attempt to help overcome some of these challenges and, more importantly, to provide a systematic approach, this article reviews and analyses past efforts in the application of ML algorithms to heat transfer applications, and proposes a regression framework for their deployment to estimate key quantities (e.g. heat transfer coefficient), to be used for improved design and operation of heat exchangers. The framework consists of six steps: i) data pre-treatment, ii) feature selection, iii) data splitting philosophy, iv) training and testing, v) tuning of hyperparameters, and vi) performance assessment with specific indicators, to support the choice of accurate and robust models. A relevant case study involving the estimation of the condensation heat transfer coefficient in microfin tubes is used to illustrate the proposed framework. Two data-driven algorithms, Deep Neural Networks and Random Forest, are tested and compared in terms of their estimation and extrapolation capabilities. The results show that ML algorithms are generally more accurate in predicting the heat transfer coefficient than a well-known semi-empirical correlation proposed in past studies, where the mean absolute error of the most suitable ML model is 535 [Wm2K-1], compared to the error using the correlation of 1061 [Wm2K-1]. In terms of extrapolation, the selected ML model has a mean absolute error of 1819 [Wm2K-1], while for the correlation is 1111 [Wm2K-1], indicating a disadvantage of the use of semi-empirical models, although the comparison was not entirely suitable, given that the correlation was used as is and no training was done. In addition, feature selection enables simpler models that depend only on features that are potentially most related to the target variable. Special attention is needed however, as overfitting and limited extrapolation capabilities are common difficulties that are encountered when deploying these models.