Abstract
BackgroundAge and time information stored within the histories of clinical notes can provide valuable insights for assessing a patient’s disease risk, understanding disease progression, and studying therapeutic outcomes. However, details of age and temporally-specified clinical events are not well captured, consistently codified, and readily available to research databases for study.MethodsWe expanded upon existing annotation schemes to capture additional age and temporal information, conducted an annotation study to validate our expanded schema, and developed a prototypical, rule-based Named Entity Recognizer to extract our novel clinical named entities (NE). The annotation study was conducted on 138 discharge summaries from the pre-annotated 2014 ShARe/CLEF eHealth Challenge corpus. In addition to existing NE classes (TIMEX3, SUBJECT_CLASS, DISEASE_DISORDER), our schema proposes 3 additional NEs (AGE, PROCEDURE, OTHER_EVENTS). We also propose new attributes, e.g., “degree_relation” which captures the degree of biological relation for subjects annotated under SUBJECT_CLASS. As a proof of concept, we applied the schema to 49 H&P notes to encode pertinent history information for a lung cancer cohort study.ResultsAn abundance of information was captured under the new OTHER_EVENTS, PROCEDURE and AGE classes, with 23%, 10% and 8% of all annotated NEs belonging to the above classes, respectively. We observed high inter-annotator agreement of >80% for AGE and TIMEX3; the automated NLP system achieved F1 scores of 86% (AGE) and 86% (TIMEX3). Age and temporally-specified mentions within past medical, family, surgical, and social histories were common in our lung cancer data set; annotation is ongoing to support this translational research study.ConclusionsOur annotation schema and NLP system can encode historical events from clinical notes to support clinical and translational research studies.
Highlights
Age and time information stored within the histories of clinical notes can provide valuable insights for assessing a patient’s disease risk, understanding disease progression, and studying therapeutic outcomes
Annotator agreement generally decreases as match criteria become stricter
Agreement for the AGE and TIMEX3 classes remain unchanged even after attributes and relationships are added. This indicates that the annotation schema is well-designed for these classes and/ or that these classes are easier to annotate
Summary
Age and time information stored within the histories of clinical notes can provide valuable insights for assessing a patient’s disease risk, understanding disease progression, and studying therapeutic outcomes. Clinical histories contained within the electronic health record (EHR) document pertinent age and temporal information that could be useful for determining a patient’s disease risk, understanding the course of a disease phenotype, and predicting patient health. Studies suggest that patients have elevated cancer risk, if one or more family members have cancer, if these cancers occur significantly earlier in life than those with sporadic cancer in the general population, or if the patient has a personal history of other prior cancers [1, 4]. Patient clinical histories play an important role in explaining risk of developing lung cancer. Better characterization of lung cancer risk may lead to improved and better targeted screening efforts, which can potentially save patient lives because earlier detection of lung cancer is known to improve survival [11]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.