There are two challenges associated with the interpretability of deep learning models in medical image analysis applications that need to be addressed: confidence calibration and classification uncertainty. Confidence calibration associates the classification probability with the likelihood that it is actually correct - hence, a sample that is classified with confidence X% has a chance of X% of being correctly classified. Classification uncertainty estimates the noise present in the classification process, where such noise estimate can be used to assess the reliability of a particular classification result. Both confidence calibration and classification uncertainty are considered to be helpful in the interpretation of a classification result produced by a deep learning model, but it is unclear how much they affect classification accuracy and calibration, and how they interact. In this paper, we study the roles of confidence calibration (via post-process temperature scaling) and classification uncertainty (computed either from classification entropy or the predicted variance produced by Bayesian methods) in deep learning models. Results suggest that calibration and uncertainty improve classification interpretation and accuracy. This motivates us to propose a new Bayesian deep learning method that relies both on calibration and uncertainty to improve classification accuracy and model interpretability. Experiments are conducted on a recently proposed five-class polyp classification problem, using a data set containing 940 high-quality images of colorectal polyps, and results indicate that our proposed method holds the state-of-the-art results in terms of confidence calibration and classification accuracy.