Testing for discrimination in mortgage lending requires classifying consumers into treatment and control groups. Classification is quite complicated, becau se Home Mortgage Disclosure Act (HMDA) data, the primary source of data for these analyses, contain information on the ethnicity, race, and gender for both primary and coapplicants. In addition, applicants have the option of reporting multiple races. Using these detailed data to construct standard groups, such as “Black,” “Hispanic,” and “White,” requires subjective decisions on how to aggregate applications. This study uses a data-driven approach to classify applications, minimizing subjectivity. Using HMDA data, as well as data from a recent examination conducted by the Office of the Comptroller of the Currency, we disaggregated applications into the most basic subsets possible. Our objectives are to better unders tand the characteristics of applicants, analyze variation in denial rates across underlying subsets of appl ications, and develop a data-driven classification strategy feasible for fair lending analyses.