Abstract
Background: Deep Phenotyping is the precise and comprehensive analysis of phenotypic features, where the individual components of the phenotype are observed and described. In UK mental health clinical practice, most clinically relevant information is recorded as free text in the Electronic Health Record, and offers a granularity of information beyond that expressed in most medical knowledge bases. The SNOMED CT nomenclature potentially offers the means to model such information at scale, yet given a sufficiently large body of clinical text collected over many years, it's difficult to identify the language that clinicians favour to express concepts. Methods: Vector space models of language seek to represent the relationship between words in a corpus in terms of cosine distance between a series of vectors. When utilising a large corpus of healthcare data and combined with appropriate clustering techniques and manual curation, we explore how such models can be used for discovering vocabulary relevant to the task of phenotyping Serious Mental Illness (SMI) with only a small amount of prior knowledge. Results: 20 403 n-grams were derived and curated via a two stage methodology. The list was reduced to 557 putative concepts based on eliminating redundant information content. These were then organised into 9 distinct categories pertaining to different aspects of psychiatric assessment. 235 (42%) concepts were found to be depictions of putative clinical significance. Of these, 53 (10%) were identified having novel synonymy with existing SNOMED CT concepts. 106 (19%) had no mapping to SNOMED CT. Conclusions: We demonstrate a scalable approach to discovering new depictions of SMI symptomatology based on real world clinical observation. Such approaches may offer the opportunity to consider broader manifestations of SMI symptomatology than is typically assessed via current diagnostic frameworks, and create the potential for enhancing nomenclatures such as SNOMED CT based on real world depictions.
Highlights
The dramatic decrease in genetic sequencing costs, coupled with the growth of our understanding of the molecular basis of diseases, has led to the identification of increasingly granular subsets of disease populations that were once thought of as homogenous groups
Precision medicine has arisen in response to the fact that the ‘real world’ application of many treatments have a lower efficacy and a differential safety profile compared to clinical trials, most likely due to genetic and environmental differences in the disease population
A complete description of the structure and challenges of SNOMED CT are beyond the scope of this paper, we describe how aspects of these problems manifest themselves in accordance with the task of phenotyping serious mental illness (SMI) from a real world Electronic Health Record (EHR) system
Summary
The dramatic decrease in genetic sequencing costs, coupled with the growth of our understanding of the molecular basis of diseases, has led to the identification of increasingly granular subsets of disease populations that were once thought of as homogenous groups. Precision medicine seeks to obtain deeper genotypic and phenotypic knowledge of the disease population, in order to offer tailored care plans with evidence-based outcomes. The SNOMED CT nomenclature potentially offers the means to model such information at scale, yet given a sufficiently large body of clinical text collected over many years, it’s difficult to identify the language that clinicians favour to express concepts. The list was reduced to 557 putative concepts based on eliminating redundant information content These were organised into 9 distinct categories pertaining to different aspects of psychiatric assessment. Conclusions: We demonstrate a scalable approach to discovering new depictions of SMI symptomatology based on real world clinical observation. Such approaches may offer the opportunity to consider version 2
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have