Introduction/ Background Quantification of tissue biomarkers is increasingly demanded for diagnosis and is commonly performed by expert pathologists using microscopy of stained tissue at high magnification. This manual scoring is a reasonably fast, supervised procedure, but it suffers from inter- and intra-observer differences due to a) differences in selection of regions of interest, b) differences in quantity estimation, c) intra-tissue variability of biomarker expression. Computers and whole slide microscopy scanners have made it feasible to perform high-capacity analysis of high resolution images of tissue. Image analysis (IA) enables better reproducibility, but conversely, the unsupervised analysis introduces challenges regarding accuracy. Furthermore, borderline cases will always have to be rigorously inspected by pathologists. Many IA evaluation methods exist, but for pathology, a supervised comparison of experimental segmentation to an appropriately obtained standard criterion is the optimal strategy. The production of standard criterion necessitates evaluation of whole slide images to eliminate any possible region sampling bias while inter- and intra- observer bias can only be minimized by replacing any manual estimates by objective measurements. A logical step is thus to change the task of the pathologist from quantity estimation to verifying the output an automated procedure reports. Still, verification of entire tissue slides is in daily pathology practice too time-consuming. To minimize the workload pathology is turning to stereological methods which aim to efficiently quantify matter unbiasedly and have been proved useful for supervised validation of automated analysis for Ki67 scoring of breast cancer. However, the workload still needs to be reduced to a level comparable to the manual scoring procedure. Aims We aim to enable high accuracy, objective evaluation of automated image analysis with a workload and workflow feasible for daily pathology practice. This regards both production of reference data for image analysis tool calibration and continuous quality control inspection of borderline cases. Methods This study investigates proportionate sampling, a very efficient stereological sampling scheme utilizing weighted sampling of regions of automated image analysis for manual evaluation of automated IA. The sampling of regions to be inspected by a pathologist draws upon the IA to assign probability weights to all regions. This results in a highly efficient, unbiased sampling and quality assurance estimate for the automated image analysis. Results Presented here is proof-of-concept of an efficient, unbiased image analysis evaluation methodology. The task of the pathologist is changed from quantity estimation to instead annotate discrepancies between the output from the IA and the tissue in a few sampled regions. From the annotations an unbiased quality assurance estimate of the IA can be estimated including levels of accuracy obtainable and expected workloads. This confirms that the stereological proportionate sampling enables manual verification of automated whole slide image analysis for unbiased reference dataset creation and quality control inspection in borderline cases. Furthermore, the methodology is easily integrated into both image analysis platforms for production of reference data sets and laboratory information systems for daily pathology practices.