Accurate detection and classification of teeth is the first step in dental disease diagnosis. However, the same class of tooth exhibits significant variations in surface appearance. Moreover, the complex geometrical structure poses challenges in learning discriminative features among the different tooth classes. Due to these complex features, tooth classification is one of the challenging research domains in deep learning. To address the aforementioned issues, the presented study proposes discriminative local feature extraction at different granular levels using YOLO models. However, this necessitates a granular intra-oral image dataset. To facilitate this requirement, a dataset at three granular levels (two, four, and seven teeth classes) is developed. YOLOv5, YOLOv6, and YOLOv7 models were trained using 2,790 images. The results indicate superior performance of YOLOv6 for two-class classification problems. The model generated a mean average precision (mAP) value of 94%. However, as the granularity level is increased, the performance of YOLO models decreases. For, four and seven-class classification problems, the highest mAP value of 87% and 79% was achieved by YOLOv5 respectively. The results indicate that different levels of granularity play an important role in tooth detection and classification. The YOLO’s performance gradually decreased as the granularity decreased especially at the finest granular level.