Many researchers assessing the efficacy of educational programs face challenges due to issues with non-randomization and the likelihood of dependence between nested subjects. The purpose of the study was to demonstrate a rigorous research methodology using a hierarchical propensity score matching method that can be utilized in contexts where randomization is not feasible and dependence between subjects is a concern. Although propensity score matching is not new in helping to create quasi-experimental models, many studies limit propensity score matching to student-level variables. To address this limitation in educational research, this study extends propensity score matching to the next level so that hierarchical modeling techniques can be used to help minimize error due to the likelihood of dependence between nested students. A large-scale educational program that targets first-semester freshmen was used to illustrate the utility and value of the methodology. This type of program is typical in higher education where student self-selection creates difficulty in assessing its true effects on student achievement; however, by using a rigorous methodology, administrators can have higher confidence when making programmatic and budgetary decisions.