This article presents recommendations by model developers and the authors about calibration and validation procedures for the Hydrological Simulation Program - Fortran (HSPF) as applied through BASINS. HSPF is a continuous simulation watershed model that simulates nonpoint-source runoff and pollutant loadings for a watershed and performs flow and water quality routing in stream reaches and well-mixed lakes and impoundments. HSPF can be used to estimate nonpoint-source loads from various land uses as well as fate and transport processes in streams and lakes. This article describes the ideal calibration and validation process for the full range of constituents modeled by HSPF, as well as the process for acceptable minimum calibration and validation of this model. The model information and guidance provided in this article may be used to help in determining the scope of the proposed ASABE Standard/Engineering Practice for model calibration and validation. Model calibration and validation are necessary and critical steps in any model application. For HSPF and most other watershed models, calibration is an iterative procedure of parameter evaluation and refinement, as a result of comparing simulated and observed values of interest. Model validation is in reality an extension of the calibration process. Its purpose is to ensure that the calibrated model properly assesses all the variables and conditions that can affect model results, and to demonstrate the ability to predict field observations for periods separate from the calibration effort. For HSPF calibration and validation, a “weight of evidence” approach is most widely used in practice when models are examined and judged for acceptance for assessment and regulatory purposes. This article explores the “weight of evidence” approach and the current practice of watershed model calibration and validation based on more than 30 years of experience with HSPF. Example applications are described and model results are shown to demonstrate the graphical and statistical procedures used to assess model performance. In addition, quantitative criteria for various statistical measures are discussed as a basis for evaluating model results and documenting the model application efforts.