Risk-stratification tools are routinely used in obstetrics to assist care teams in assessing and communicating risk associated with delivery. Electronic health record data and machine learning methods may offer a novel opportunity to improve and automate risk assessment. To compare the predictive performance of natural language processing (NLP) of clinician documentation with that of a previously validated tool to identify individuals at high risk for maternal morbidity. This retrospective diagnostic study was conducted at Brigham and Women's Hospital and Massachusetts General Hospital, Boston, Massachusetts, and included individuals admitted for delivery at the former institution from July 1, 2016, to February 29, 2020. A subset of these encounters (admissions from February to December 2018) was part of a previous prospective validation study of the Obstetric Comorbidity Index (OB-CMI), a comorbidity-weighted score to stratify risk of severe maternal morbidity (SMM). Natural language processing of clinician documentation and OB-CMI scores. Natural language processing of clinician-authored admission notes was used to predict SMM in individuals delivering at the same institution but not included in the prospective OB-CMI study. The NLP model was then compared with the OB-CMI in the subset with a known OB-CMI score. Model discrimination between the 2 approaches was compared using the DeLong test. Sensitivity and positive predictive value for the identification of individuals at highest risk were prioritized as the characteristics of interest. This study included 19 794 individuals; 4034 (20.4%) were included in the original prospective validation study of the OB-CMI (testing set), and the remaining 15 760 (79.6%) composed the training set. Mean (SD) age was 32.3 (5.2) years in the testing cohort and 32.2 (5.2) years in the training cohort. A total of 115 individuals in the testing cohort (2.9%) and 468 in the training cohort (3.0%) experienced SMM. The NLP model was built from a pruned vocabulary of 2783 unique words that occurred within the 15 760 admission notes from individuals in the training set. The area under the receiver operating characteristic curve of the NLP-based model for the prediction of SMM was 0.76 (95% CI, 0.72-0.81) and was comparable with that of the OB-CMI model (0.74; 95% CI, 0.70-0.79) in the testing set (P = .53). Sensitivity (NLP, 28.7%; OB-CMI, 24.4%) and positive predictive value (NLP, 19.4%; OB-CMI, 17.6%) were comparable between the NLP and OB-CMI high-risk designations for the prediction of SMM. In this study, the NLP method and a validated risk-stratification tool had a similar ability to identify patients at high risk of SMM. Future prospective research is needed to validate the NLP approach in clinical practice and determine whether it could augment or replace tools requiring manual user input.