Health service psychology (HSP) graduate programs are shifting from knowledge- to competency-based assessments of trainees' psychotherapy skills. This study used Generalizability Theory to test the dependability of psychotherapy competence assessments based on video observation of trainees. A 10-item rating form was developed from a collection of forms used by graduate programs (n = 102) in counseling and clinical psychology, and a review of the common factors research literature. This form was then used by 11 licensed psychologists to rate eight graduate trainees while viewing 129, approximately 5-min video clips from their psychotherapy sessions with clients (n = 22) at a graduate program's training clinic. Generalizability analyses were used to forecast how the number of raters and clients, and length of observation time impact the dependability of ratings in various rating designs. Raters were the primary source of error variance in ratings, with rater main effects (leniency bias) and dyadic effects (rater-target interactions) contributing 24% and 7% of variance, respectively. Variance due to segments (video clips) was also substantial, suggesting that therapist performance varies within the same counseling session. Generalizability coefficients (G) were highest for crossed rating designs and reached maximum levels (G > .50) after four raters watched each therapist working with three clients and observed 15 min per dyad. These findings suggest that expert raters show consensus in ratings even without rater training and only limited direct observation. Future research should investigate the validity of competence ratings as predictors of outcome. (PsycInfo Database Record (c) 2022 APA, all rights reserved).