The effects of methods used to improve the interrater reliability of reviewers' ratings of manuscripts submitted to the Journal of the American Academy of Child and Adolescent Psychiatry were studied. Reviewers' ratings of consecutive manuscripts submitted over approximately 1 year were first analyzed; 296 pairs of ratings were studied. Intraclass correlations and confidence intervals for the correlations were computed for the two main ratings by which reviewers quantified the quality of the article: a 1-10 overall quality rating and a recommendation for acceptance or rejection with four possibilities along that continuum. Modifications were then introduced, including a multi-item rating scale and two training manuals to accompany it. Over the next year, 272 more articles were rated, and reliabilities were computed for the new scale and for the scales previously used. The intraclass correlation of the most reliable rating before the intervention was 0.27; the reliability of the new rating procedure was 0.43. The difference between these two was significant. The reliability for the new rating scale was in the fair to good range, and it became even better when the ratings of the two reviewers were averaged and the reliability stepped up by the Spearman-Brown formula. The new rating scale had excellent internal consistency and correlated highly with other quality ratings. The data confirm that the reliability of ratings of scientific articles may be improved by increasing the number of rating scale points, eliciting ratings of separate, concrete items rather than a global judgment, using training manuals, and averaging the scores of multiple reviewers.