HighlightsAn ensemble method using color segmentation, deep learning, and image transformation was developed.Experiments were conducted to compare the method with other state-of-the-art tracking algorithms.The optimized ensemble method to track bolls achieved 94.4% accuracy using weakly trained tiny YOLOv2 models.The method achieved 7.6 frames per second and outperformed five other tracking methods.Abstract. In robotic applications, good perception can be computationally costly and create undesirable latency before a control decision is initiated. Most of the methods available for object detection deep learning are either fast with low accuracy or slow with high accuracy. Fast and accurate methods are necessary to track and localize objects such as cotton bolls that may be visible or occluded by each other or not well illuminated. In this study, an ensemble of a deep learning method and other image processing techniques was used to detect cotton bolls in-field on defoliated plants. In each image, a trained deep learning method, the YOLOv2 model, was used to detect open cotton bolls, and color segmentation was applied to confirm if the bolls detected by the YOLOv2 model were actually white to avoid false positives. Boll tracking was performed by following the spatial movement of good features on the edges of the bolls using the Lucas-Kanade algorithm. An image transformation algorithm was applied to the next image in case the previously detected boll was lost to retrieve the information of the missing boll. Each tracked and localized boll was stored and counted to give the total number of bolls detected. In this study, detection accuracy was sacrificed for image processing speed by using the YOLOv2 model. Detection accuracy was improved by using an ensemble method that combined image color segmentation, optical flow, and image transformation. This method was compared to eight other open-source methods implemented in OpenCV. The ensemble method detected and counted bolls at a speed of 7.6 fps with an accuracy of 94.4% using the Jetson TX2 embedded system to process 1K resolution images, outperforming the other OpenCV methods in various measurements. Keywords: Boll counting, Cotton, Cotton harvesting, DarkFlow, Darknet, Deep learning, Machine vision, YOLOv2.