In recent years, low-power wide area networks (LPWANs), particularly Long-Range Wide Area Network (LoRaWAN) technology, are increasingly being adopted into large-scale Internet of Things (IoT) applications thanks to having the ability to offer cost-effective long-range wireless communication at low-power. The need to provide location-stamped communications to IoT applications for meaningful interpretation of physical measurements from IoT devices has increased demand to incorporate location estimation capabilities into LoRaWAN networks. Fingerprint-based localization methods are increasingly becoming popular in LoRaWAN networks because of their relatively high accuracy compared to range-based localization methods. This work proposes hybrid convolutional neural networks (CNNs)-transformer fingerprinting method to localize a node in a LoRaWAN network. CNNs are adopted to complement the strengths of the Transformer by adding the ability to capture local features from input data and consequently allow the Transformer, through the attention mechanism, to effectively learn global dependencies from the input data. Specifically, the proposed method works by first learning the local location features from the input data using the CNNs and passing the resulting information to the transformer encoder to learn global features from the input data. The output of the transformer encoder is then concatenated with information learned at the local level and then passed through the regressor for the final location estimation. With a localization performance of 290.71 m mean error achieved, the proposed method outperformed similar state-of-the-art works in the literature evaluated on the same publicly available LoRaWAN dataset.