Research on voice quality perception has proceeded essentially atheoretically over the years. A thorough literature review suggests that most studies implicitly assume a model that partitions listeners’ ratings into two components: the true magnitude of the quality being rated (e.g., roughness) and an undifferentiated error component. However, recent studies suggest that several sources, including listener biases and context effects, contribute consistently to the variance in voice quality ratings. A model is proposed in which listeners judge vocal qualities against internal standards which are built up out of their experience with different populations of speakers. These standards are inherently unstable, particularly for less-experienced clinicians, and may be systematically influenced by factors other than the quality being judged. Research and clinical protocols and statistical treatments that control for regular sources of variance in voice quality ratings are described. These protocols may improve the reliability and validity of perceptual evaluations, both of which are questionable given current methods of assessment.