Definitions of musical aptitude as well as musical aptitude tests frequently are criticized for low ecological validity. In many cases, however, the demand for ecological validity can actually make a test worse as a measure of musical aptitude because a maximally ecologically valid test is necessarily multidimensional. A test that is devised to measure a psychological construct such as musical aptitude should be reasonably homogeneous. Defining musical aptitude as auditory structuring ability is suggested as a compromise between homogeneity and explanatory power. Central aspects of construct validation of an auditory structuring test are presented as examples of theory-driven validation where ecological validity is not the first criterion. These aspects show that the test (a) measures a music-related property, (b) is not much dependent of training in music, and (c) measures sound structuring instead of hearing absolute qualities of sound. (ProQuest-CSA LLC: ... denotes obscured text omitted.) Construct Validity as an Ideal Definitions of musical aptitude and consequently, musical aptitude tests, frequently are strongly criticized. According to the criticism, tests often have very low validity or do not measure musical aptitude at all (Choksy, 2003). A usual complaint is that tests of musical aptitude do not predict real-world musical skills or behaviors well; that is, they have poor predictive or ecological validity (Demorest, 1995; Hallam & Shaw, 2002; Kinarskaya & Winner, 1997; Mota, 1997). In many cases, this criticism is warranted. However, the purpose and possibilities of musical aptitude measures often are misunderstood and consequently, unrealistic demands are made on them. Tests may be better or worse, of course, but difficulties in predicting realworld musical performances do not automatically mean flaws in a test or the theory behind it. Psychological tests are devised to measure psychological constructs such as personality, intelligence or musical aptitude. A test is valid to the extent it measures the target construct. In other words, validity should be understood as construct validity (Cronbach, 1984). This principle has important consequences. Gembris (1997) sees three distinct phases in the definition of musicality. The first is the phenomenological approach that was the main trend in the 19th century, although traces of it are present also in the 20th century. This approach had a close link to the music and aesthetics of its time; understanding of musical beauty was an important ingredient in the concept of musicality, for instance. The second phase, the psychometric approach, was dominant in most of the 20th century. Its main interests were objective definition of musicality and standardized tests to measure it. According to Gembris, the third phase, the musical meaning approach, is the most important one today. It views the psychometric approach as narrow and mechanistic and stresses the importance of the ability to generate meaning in music. The purpose of the present article is to (a) show that problems observed in the psychometric approach often are not real but rather consequences of misunderstanding central aspects of construct definition and test validity, and (b) describe the validation of a musical aptitude test as a concrete example of the preferred theory-driven validation process. The sources of problems in validation roughly can be seen in two groups: first, using composite validity criteria as if they were unidimensional, and second, using subject groups that do not represent the whole distribution of the construct in question. Maximizing ecological validity usually demands the use of composite measures as validity criteria. When a composite of several constructs is predicted, the best predictor also is multidimensional. In such a case, the predictor and the criterion contain the same properties in the same proportions. Using success in music studies as a validity criterion for a musical aptitude test can be taken as an example (Karma, 1982). …