This paper examines the implications for mutual fund performance measurement of two likely sources of specification error. We compare three well-known models, those of Jensen (1968), Treynor and Mazuy (1966), and Henriksson and Merton (1981), and two commonly-used timing benchmarks, the SP (2) benchmark misspecification results in qualitatively similar inferences, although statistical significance is not as strong; and (3) the power of detecting ability for an individual fund or for distinguishing between a good fund from a bad fund is typically quite low and such power is not appreciably altered by model and benchmark misspecification. These results are robust to alternative asset pricing specifications (CAPM versus Carhart 4-factor) and the periodicity of the simulation (calibrated to daily versus monthly data).