The electroencephalogram (EEG) is an excellent tool for probing neural function, both in clinical and research environments, due to its low cost, non-invasive nature, and pervasiveness. In the clinic, the EEG is the standard test for diagnosing and characterizing epilepsy and stroke, as well as a host of other trauma and pathology related conditions (Tatum et al., 2007; Yamada and Meng, 2009). In research laboratories, EEG is used to study neural responses to external stimuli, motor planning and execution, and brain-computer interfaces (Lebedev and Nicolelis, 2006; Wang et al., 2013). While human interpretation is still the gold standard for EEG analysis in the clinic, a host of software tools exist to facilitate the process or to make predictive analyses such as seizure prediction. Recently, a confluence of events has underscored the need for robust EEG tools. First, there has been a renewed push via the White House BRAIN initiative to understand neural function and disease (Weiss, 2013). Secondly, there is an increased awareness on brain injury owing to both the influx of injured warfighters and numerous high-profile athletes found to have chronic brain damage (McKee et al., 2009; Stern et al., 2011). And thirdly, a wave of consumer grade scalp sensors has entered the market, allowing end users to monitor sleep, arousal, and mood (Liao et al., 2012). In all these applications, there is a need for robust signal processing tools to analyze the EEG data. Historically, EEG signal processing tools have been devised using either ad hoc heuristic methods, or by training pattern recognition engines on small data sets (Gotman, 1982). These methods have yielded limited results, owing mostly to the fact that brain signals (and EEG in particular) are characterized by great variability, which can only be properly interpreted by building statistical models using massive amounts of data (Alotaiby et al., 2014; Ramgopal et al., 2014). Unfortunately, despite EEG being perhaps the most pervasive modality for acquiring brain signals, there is a severe lack of data in the public domain. For example, the “EEG Motor Movement/Imagery Dataset” (http://www.physionet.org/pn4/eegmmidb/) contains ~1500 recordings of 1 or 2 min duration apiece from 109 subjects (Goldberger et al., 2000; Schalk et al., 2004). The CHB-MIT database contains data from 22 subjects, mostly pediatric (Shoeb, 2009). A database from Karunya University contains 175 16-channel EEGs of duration 10 s (Selvaraj et al., 2014). One of the most extensive databases for supporting epilepsy research is the European Epilepsy Database (http://epilepsy-database.eu/), which contains 250 datasets from 30 unique patients, but sells for €3000. Other databases, such as ieee.org, contain a wealth of data from more invasive modalities such as electrocorticogram, but little or no EEG. This lack of publically available data is ironic considering that hundreds of thousands of EEGs are administered annually in clinical settings around the world. Relatively little of this data is publicly available to the research community in a form that is useful to machine learning research. Massive amounts of EEG data would allow the use of state-of-the-art machine learning algorithms to discover new diagnostics and validate clinical practice. Furthermore, it is desirable that such data be collected in clinical settings, as opposed to tightly controlled research environments, since “clinical-grade” data is inherently more variable with respect to parameters such as electrode location, clinical environment, equipment, and noise. Capturing this variability is critical to the development of robust, high performance technology that has real-world impact. In this work, we describe a new corpus, the TUH-EEG Corpus, which is an ongoing data collection effort that has recently released 14 years of clinical EEG data collected at Temple University Hospital. The records have been curated, organized, and paired with textual clinician reports that describe the patients and scans. The corpus is publicly available from the Neural Engineering Data Consortium (www.nedcdata.org) (Picone and Obeid, 2016).