A multitask speaking measure consisting of both integrated and independent tasks is expected to be an important component of a new version of the TOEFL test. This study considered two critical issues concerning score dependability of the new speaking measure: How much would the score dependability be impacted by (1) combining scores on different task types into a composite score and (2) rating each task only once? To answer these questions, generalizability theory (G-theory) procedures were used to examine the impact of the numbers of tasks and raters per speech sample and of subsection lengths on the dependability of speaking scores. Univariate and multivariate G-theory analyses were conducted on rating data collected for 261 examinees for the study. The finding in the univariate analyses was that it would be more efficient to increase the number of tasks rather than the number of ratings per speech sample in maximizing the score dependability. The multivariate G-theory analyses also revealed that (1) the universe (or true) scores among the task-type subsections were very highly correlated and that (2) slightly larger gains in composite score reliability would result from increasing the number of listening - speaking tasks for the fixed section lengths.