Internet of Things (IoT) devices are widely being used in various smart applications and being equipped with cognitive radio (CR) capabilities for dynamic spectrum allocation. Our objectives in this work are to achieve higher data rates and minimize end-to-end routing delays in CR-enabled IoT communication in order to maximize throughput. We propose a reinforcement learning (RL)-based routing approach in the cognitive radio network (CRN)-based IoT environment. The idea is to add the channel selection decision capability to the network layer in order to minimize packet collisions as well as end-to-end delay (EED). We perform a comprehensive performance evaluation of the proposed RL-IoT routing mechanism by simulating the cognitive radio-enabled Internet of Things (CR-IoT) communication environment in the cognitive radio cognitive network (CRCN) simulator and comparing the network performance achieved by our proposed mechanism with that of the recent AODV-based routing mechanism for IoT (AODV-IoT), ELD-CRN, and SpEED-IoT routing approaches. Our evaluation results show that the RL-IoT model performs better than existing approaches in terms of average data rate, throughput, packet collision, and EED.