The exponential growth of digital image data poses numerous open problems to computer vision researchers. In this regard, designing an efficient and more accurate mechanism that finds and retrieve desired images from large repositories is of greater importance. To this end, various types of content based image retrieval (CBIR) systems have been developed. A typical CBIR system enables the search and retrieval of desired images from large databases that are similar to a given query image by means of automatically extracted visual features from image pixels. In CBIR domain, the bag of visual words (BoVW) model is one of the most widely used feature representation scheme and there exist a number of image retrieval frameworks based on BoVW model. It has been observed that most of them demonstrated promising results for the task of medium and large scale image retrieval. However, image retrieval literature lacks a comparative evaluation of these extended BoVW formulations. To this end, this paper aims to categorize and evaluate the existing BoVW model based formulations for the task of content based image retrieval. The commonly used datasets and the evaluation metrics to assess the retrieval effectiveness of these existing models are discussed. Moreover, quantitative evaluation of state of the art image retrieval systems based on BoVW model is also provided. Finally, certain promising directions for future research are proposed on the basis of the existing models and the demand from real-world.