Automatically assembling a full census of an academic field.

Allison C. Morgan,Aaron Clauset,Samuel F. Way

doi:10.1371/journal.pone.0202223

Allison C. Morgan, Aaron Clauset + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0202223

Copy DOI

Journal: PloS one	Publication Date: Aug 29, 2018
Citations: 6	License type: CC BY 4.0

Affiliation: University of Colorado Boulder, Santa Fe Institute

Abstract

The composition of the scientific workforce shapes the direction of scientific research, directly through the selection of questions to investigate, and indirectly through its influence on the training of future scientists. In most fields, however, complete census information is difficult to obtain, complicating efforts to study workforce dynamics and the effects of policy. This is particularly true in computer science, which lacks a single, all-encompassing directory or professional organization. A full census of computer science would serve many purposes, not the least of which is a better understanding of the trends and causes of unequal representation in computing. Previous academic census efforts have relied on narrow or biased samples, or on professional society membership rolls. A full census can be constructed directly from online departmental faculty directories, but doing so by hand is expensive and time-consuming. Here, we introduce a topical web crawler for automating the collection of faculty information from web-based department rosters, and demonstrate the resulting system on the 205 PhD-granting computer science departments in the U.S. and Canada. This method can quickly construct a complete census of the field, and achieve over 99% precision and recall. We conclude by comparing the resulting 2017 census to a hand-curated 2011 census to quantify turnover and retention in computer science, in general and for female faculty in particular, demonstrating the types of analysis made possible by automated census construction.

Highlights

Tenured and tenure-track university faculty play a special role in determining the speed and direction of scientific progress, both directly through their research and indirectly through their training of new researchers
We evaluate the performance of the entire system, applied to the full set of 205 computer science departments
The novel system we describe here, which uses a topical web crawler to automatically assemble an academic census from semi-structured web-based data, is both accurate and efficient

Summary

Introduction

Tenured and tenure-track university faculty play a special role in determining the speed and direction of scientific progress, both directly through their research and indirectly through their training of new researchers. On the other hand, including computer science, lack a single all-encompassing organization and membership information is instead distributed across many disjoint lists, such as web-based faculty directories for individual departments Because assembling such a full census is difficult, past studies have tended to avoid this task and have instead used samples of researchers [8,9,10,11], usually specific to a particular field [12,13,14,15,16], and often focused on the scientific elite [17, 18]. The second provides an example of the type of research enabled by our system and uses the 2011 and 2017 censuses to investigate the “leaky pipeline” problem in faculty retention

Background

Problem formalization

Results

Navigate and classify

Parse and filter

Deploying and evaluating the crawler

Extending to other fields

Retention in computer science

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatically assembling a full census of an academic field.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Approaches to Biology Teaching and Learning: On Integrating Pedagogical Training into the Graduate Experiences of Future Science Faculty
Kimberly Tanner ... Deborah Allen
CBE—Life Sciences Education | VOL. 5
Kimberly Tanner, et. al.Kimberly Tanner ... Deborah Allen
01 Mar 2006
CBE—Life Sciences Education | VOL. 5

From faculty for undergraduate neuroscience: encouraging innovation in undergraduate neuroscience education by supporting student research and faculty development.
Jean C Hardwick ... Eric P Wiertelak
CBE life sciences education | VOL. 5
Jean C Hardwick, et. al.Jean C Hardwick ... Eric P Wiertelak
01 Jun 2006
CBE life sciences education | VOL. 5

British Food Journal Volume 12 Issue 6 1910
-
British Food Journal | VOL. 12
--
01 Jun 1910
British Food Journal Volume 12 Issue 6 1910
-

Considering a Society of Environmental Health Science
David A Schwartz
Environmental Health Perspectives | VOL. 114
David A SchwartzDavid A Schwartz
01 Mar 2006
Environmental Health Perspectives | VOL. 114

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatically assembling a full census of an academic field.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one