Abstract
BackgroundDuration of untreated psychosis (DUP) is an important clinical construct in the field of mental health, as longer DUP can be associated with worse intervention outcomes. DUP estimation requires knowledge about when psychosis symptoms first started (symptom onset), and when psychosis treatment was initiated. Electronic health records (EHRs) represent a useful resource for retrospective clinical studies on DUP, but the core information underlying this construct is most likely to lie in free text, meaning it is not readily available for clinical research. Natural Language Processing (NLP) is a means to addressing this problem by automatically extracting relevant information in a structured form. As a first step, it is important to identify appropriate documents, i.e., those that are likely to include the information of interest. Next, temporal information extraction methods are needed to identify time references for early psychosis symptoms. This NLP challenge requires solving three different tasks: time expression extraction, symptom extraction, and temporal “linking”. In this study, we focus on the first step, using two relevant EHR datasets.ResultsWe applied a rule-based NLP system for time expression extraction that we had previously adapted to a corpus of mental health EHRs from patients with a diagnosis of schizophrenia (first referrals). We extended this work by applying this NLP system to a larger set of documents and patients, to identify additional texts that would be relevant for our long-term goal, and developed a new corpus from a subset of these new texts (early intervention services). Furthermore, we added normalized value annotations (“2011–05”) to the annotated time expressions (“May 2011”) in both corpora. The finalized corpora were used for further NLP development and evaluation, with promising results (normalization accuracy 71–86%). To highlight the specificities of our annotation task, we also applied the final adapted NLP system to a different temporally annotated clinical corpus.ConclusionsDeveloping domain-specific methods is crucial to address complex NLP tasks such as symptom onset extraction and retrospective calculation of duration of a preclinical syndrome. To the best of our knowledge, this is the first clinical text resource annotated for temporal entities in the mental health domain.
Highlights
Duration of untreated psychosis (DUP) is an important clinical construct in the field of mental health, as longer DUP can be associated with worse intervention outcomes
Relevant temporal information on DUP is not always well represented by temporal models relying on TimeML. To further investigate this aspect, we annotated a corpus of mental health documents for time expression spans and types, with a specific focus on patients with a diagnosis of schizophrenia [20]. Comparing this annotated corpus to two related works, we found that mental health documents are much longer, with an average of 3974 tokens per document, and contain a larger variety of temporal references
Our long-term goal is to automatically extract from mental health notes all the elements needed for the generation of DUP data on a large patient cohort. To address this long-term goal, we have previously developed a corpus annotated with time expressions and adapted a time expression extraction system (SUTime) [20] to be used for temporal Natural Language Processing (NLP) development in the mental health domain - in particular to support DUP extraction [21]
Summary
Duration of untreated psychosis (DUP) is an important clinical construct in the field of mental health, as longer DUP can be associated with worse intervention outcomes. In the field of mental health, investigating the duration of untreated symptoms in relation to intervention outcomes represents an important research topic [1]. Relevant information on DUP is documented mainly in text fields and cannot be analyzed automatically. To make this information available for computational analysis and clinical research, Natural Language Processing (NLP) methods can be used [4, 5]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have