Our objective in this paper was to visualize the evolution of clinical language and sentiment with respect to several common population-level categories including: time in the hospital, age, mortality, gender and race. Our analysis utilized seven years of unstructured free text notes from the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) database. The text data was partitioned by category and used to generate several high dimensional vector space representations. We generated visualizations of the vector spaces using Distributed Stochastic Neighbor Embedding (tSNE) and Principal Component Analysis (PCA). We also investigated representative words from clusters in the vector space. Lastly, we inferred the general sentiment of the clinical notes toward each parameter by gauging the average distance between positive and negative keywords and all other terms in the space. We found intriguing differences in the sentiment of clinical notes over time, outcome, and demographic features. We noted a decrease in the homogeneity and complexity of clusters over time for patients with poor outcomes. We also found greater positive sentiment for females, unmarried patients, and patients of African ethnicity.
Read full abstract