Object recognition plays an important role in applications such as surveillance, perimeter, and traffic monitoring. Imaging in thermal infrared (IR) band provides the benefits of operability in no light and harsh weather conditions unlike visible band imaging, which is dependent on illumination. However, infrared imagery poses challenges such as low contrast, lack of sharp boundaries and features, and high variability of object’s signature, making object recognition in infrared band very challenging, demanding robust recognition framework. We have evaluated state-of-the-art feature extractors in a bag of words (BoW) framework for their suitability for IR images and present an optimized bag of features framework for object recognition in IR images. Another bottleneck for target classification research in IR is the limited scope of available databases, which do not capture variations of target categories. To address this issue, a richly diverse IR dataset containing images captured under diverse conditions affecting the infrared signature of the objects, environmental and background factors such as season, time of day, experimental sites, target factors in terms of thermal emission characteristics, pose, distance from the sensor, and technological factors such as camera resolution, sensitivity is presented. The IR dataset consists of 3841 images representing six image categories primarily of urban scenes. We present a hierarchical three-stage classification in an infrared domain, primary classification as {target, background}, secondary classification of target subcategory as {vehicles, pedestrians}, and tertiary classification of vehicle subcategory as {two-wheelers, three-wheelers, light motor vehicles, heavy motor vehicles}. We report the performance of state-of-the-art feature extractors in a BoW framework establishing their limits of performance using evaluation metrics—accuracy, precision, sensitivity, specificity, and F-score.