The Content of Their Coursework: Understanding Course-Taking Patterns at Community Colleges by Clustering Student Transcripts

Matthew Zeidenberg ,Marc Scott

doi:10.7916/d8pc39hm

Abstract

Community college students typically have access to a large selection of courses and programs, and therefore the student transcripts at any one college or college system tend to be very diverse. As a result, it is difficult for faculty, administrators, and researchers to understand the course-taking patterns of students in order to determine what programs of study they appear to be pursuing. Attempting to examine these patterns and then comparing them with listed program requirements would be a very timeconsuming activity. The most common way of assigning a program of study to a student—picking the subject in which she has taken the most courses—is overly simple, because many programs require courses across several subjects. However, because students who have similar patterns of course-taking in terms of subjects and particular courses taken are likely to be in similar programs, clustering can be a useful way to make sense of the relevant data. Clustering allows researchers to group similar items into clusters, relying only on a measure of the similarity of those items. In this paper, we apply a clustering algorithm to the problem of understanding college transcripts, which serve as the items to be clustered. To our knowledge, this is the first effort to organize transcripts based on their course content using clustering. We base the measure of similarity on the proportion of curricular subjects that each transcript has in common with every other one. Our data are community and technical college transcripts for a cohort of students who first entered the Washington State system during the fall of the 2005–06 academic year and who had no prior postsecondary experience. We used our clustering algorithm to separately cluster liberal arts and career-technical students. We found that the algorithm did a good job of separately clustering each of these groups. The clusters roughly corresponded to programs of study, so we were able to estimate how many students were undertaking each program and what subjects students were studying within each cluster. We were also able to examine the demographics and the completion and transfer rates of the students within each cluster, in order to get an idea of what types of students were in each program of study and how successful they seemed to be in college. We found substantial variation on these dimensions as well as on the extent to which students’ programs were either concentrated in a single subject or spread across several subjects. Clustering is a powerful way to understand the course-taking patterns of students and assign programs of study. It makes few prior assumptions about the data; rather, it allows the data to organize itself based on a similarity measure. It relieves the analyst of determining what the program categories should be. It has the ability to detect patterns of activity across subjects within student transcripts. Note that although we have applied this method to community college students, it is applicable at all levels of postsecondary education. We conclude that this method would be useful to researchers throughout education who are trying to understand student course-taking patterns and programs of study, and who need to organize large amounts of transcript data.

Full Text