Background and ObjectiveHistopathology is the gold standard for diagnosis of many cancers. Recent advances in computer vision, specifically deep learning, have facilitated the analysis of histopathology images for many tasks, including the detection of immune cells and microsatellite instability. However, it remains difficult to identify optimal models and training configurations for different histopathology classification tasks due to the abundance of available architectures and the lack of systematic evaluations. Our objective in this work is to present a software tool that addresses this need and enables robust, systematic evaluation of neural network models for patch classification in histology in a light-weight, easy-to-use package for both algorithm developers and biomedical researchers. MethodsHere we present ChampKit (Comprehensive Histopathology Assessment of Model Predictions toolKit): an extensible, fully reproducible evaluation toolkit that is a one-stop-shop to train and evaluate deep neural networks for patch classification. ChampKit curates a broad range of public datasets. It enables training and evaluation of models supported by timm directly from the command line, without the need for users to write any code. External models are enabled through a straightforward API and minimal coding. As a result, Champkit facilitates the evaluation of existing and new models and deep learning architectures on pathology datasets, making it more accessible to the broader scientific community. To demonstrate the utility of ChampKit, we establish baseline performance for a subset of possible models that could be employed with ChampKit, focusing on several popular deep learning models, namely ResNet18, ResNet50, and R26-ViT, a hybrid vision transformer. In addition, we compare each model trained either from random weight initialization or with transfer learning from ImageNet pretrained models. For ResNet18, we also consider transfer learning from a self-supervised pretrained model. ResultsThe main result of this paper is the ChampKit software. Using ChampKit, we were able to systemically evaluate multiple neural networks across six datasets. We observed mixed results when evaluating the benefits of pretraining versus random intialization, with no clear benefit except in the low data regime, where transfer learning was found to be beneficial. Surprisingly, we found that transfer learning from self-supervised weights rarely improved performance, which is counter to other areas of computer vision. ConclusionsChoosing the right model for a given digital pathology dataset is nontrivial. ChampKit provides a valuable tool to fill this gap by enabling the evaluation of hundreds of existing (or user-defined) deep learning models across a variety of pathology tasks. Source code and data for the tool are freely accessible at https://github.com/SBU-BMI/champkit.