Decision Tree (DT) models provide a well-known class of interpretable machine learning tools for diverse pattern recognition problems. However, applying DTs to learn floating features in images and categorical data based on their raw representation has been challenging. Convolutional Neural Networks (CNNs) are the current state-of-the-art method for classifying raw images, but have their own disadvantages, including that they are often difficult to interpret, have a large number of parameters and hyperparameters, require a fixed image size, and have only partial translational invariance directly built into its architecture. We propose a novel application of Convolutional Decision Trees (CDTs) and show that our approach is more interpretable and can learn higher quality convolutional filters compared to CNNs. CDTs have full translational invariance built into the architecture and can be trained and make predictions on variable-sized images. Using two independent test cases—protein-DNA binding prediction, and hand-written digit classification—we demonstrate that our GPU-enabled implementation of the Cross Entropy (CE) optimization method for training CDTs learns informative convolutional filters that can both facilitate accurate data classifications in a tree-like pattern and be used for transfer learning to improve CNNs themselves. These results motivate further studies on developing accurate and efficient tree-based models for pattern recognition and computer vision.