A fundamental concept in psychological and intelligence testing involves the assumption of comparability in which performance on a test is compared to a normative standard derived from prior testing on individuals who are comparable to the examinee. When evaluating cognitive abilities, the primary variable used for establishing comparability and, in turn, validity is age, given that intellectual abilities develop largely as a function of general physical growth and neuromaturation. When an individual has been raised only in the language of the test, language development is effectively controlled by age. For example, when measuring vocabulary, a 12-year-old will be compared only to other 12-year-olds, all of whom have been learning the language of the test for approximately 12 years-hence, they remain comparable. The same cannot be said when measuring the same or other abilities in a 12-year-old who has been raised only in a different language or raised partly with a different language and partly with the language of the test. In such cases, a 12-year-old may have been learning the language of the test at some point shortly after birth, or they might have just begun learning the language a week ago. Their respective development in the language of the test thus varies considerably, and it can no longer be assumed that they are comparable in this respect to others simply because they are of the same age. Psychologists noted early on that language differences could affect test performance, but it was viewed mostly as an issue regarding basic comprehension. Early efforts were made to address this issue, which typically involved simplification of the instructions or reliance on mostly nonverbal methods of administration and measurement. Other procedures that followed included working around language via test modifications or alterations (e.g., use of an interpreter), testing in the dominant language, or use of tests translated into other languages. None of these approaches, however, have succeeded in establishing validity and fairness in the testing of multilinguals, primarily because they fail to recognize that language difference is not the same as language development, much like cultural difference is not the same as acquisition of acculturative knowledge. Current research demonstrates that the test performance of multilinguals is moderated primarily by the amount of exposure to and development in the language of the test. Moreover, language development, specifically receptive vocabulary, accounts for more variance in test performance than age or any other variable. There is further evidence that when the influence of differential language development is examined and controlled, historical attributions to race-based performance disappear. Advances in fairness in the testing of multilinguals rest on true peer comparisons that control for differences in language development within and among multilinguals. The BESA and the Ortiz PVAT are the only two examples where norms have been created that control for both age and degree of development in the language(s) of the test. Together, they provide a blueprint for future tests and test construction wherein the creation of true peer norms is possible and, when done correctly, exhibits significant influence in equalizing test performance across diverse groups, irrespective of racial/ethnic background or language development. Current research demonstrates convincingly that with deliberate and careful attention to differences that exist, not only between monolinguals and multilinguals of the same age but also among multilinguals themselves, tests can be developed to support claims of validity and fairness for use with individuals who were in fact not raised exclusively in the language or the culture of the test.