Abstract

BackgroundA shareable repository of clinical notes is critical for advancing natural language processing (NLP) research, and therefore a goal of many NLP researchers is to create a shareable repository of clinical notes, that has breadth (from multiple institutions) as well as depth (as much individual data as possible).MethodsWe aimed to assess the degree to which individuals would be willing to contribute their health data to such a repository. A compact e-survey probed willingness to share demographic and clinical data categories. Participants were faculty, staff, and students in two geographically diverse major medical centers (Utah and New York). Such a sample could be expected to respond like a typical potential participant from the general public who is given complete and fully informed consent about the pros and cons of participating in a research study.Results2140 respondents completed the surveys. 56% of respondents were “somewhat/definitely willing” to share clinical data with identifiers, while 89% of respondents were “somewhat (17%) /definitely willing (72%)” to share without identifiers. Results were consistent across gender, age, and education, but there were some differences by geographical region. Individuals were most reluctant (50–74%) sharing mental health, substance abuse, and domestic violence data.ConclusionsWe conclude that a substantial fraction of potential patient participants, once educated about risks and benefits, would be willing to donate de-identified clinical data to a shared research repository. A slight majority even would be willing to share absent de-identification, suggesting that perceptions about data misuse are not a major concern. Such a repository of clinical notes should be invaluable for clinical NLP research and advancement.

Highlights

  • A shareable repository of clinical notes is critical for advancing natural language processing (NLP) research, and a goal of many NLP researchers is to create a shareable repository of clinical notes, that has breadth as well as depth

  • Depending on the formation of syntactic structures, currently parsers could be categorized into two major types: the constituency parsers which are dependent on constituency grammars to distinguish between terminal and non-terminal nodes [1]; and the dependency parsers which generates simplified parse trees of only terminal nodes without considering the interior constituents [2]

  • The Bist-parser achieved the optimal performance by using the default Penn TreeBank and word embeddings of Gigaword for training; while the jPTDP parser obtained the lowest performance of 77.59% Unlabeled attachment score (UAS), 83.60% LA and 71.58% Labeled attachment score (LAS)

Read more

Summary

Introduction

A shareable repository of clinical notes is critical for advancing natural language processing (NLP) research, and a goal of many NLP researchers is to create a shareable repository of clinical notes, that has breadth (from multiple institutions) as well as depth (as much individual data as possible). Parsing is a NLP task to assign syntactic structures to sentences according to grammar. Depending on the formation of syntactic structures, currently parsers could be categorized into two major types: the constituency parsers which are dependent on constituency grammars to distinguish between terminal (word) and non-terminal (e.g., phrase) nodes [1]; and the dependency parsers which generates simplified parse trees of only terminal nodes without considering the interior constituents [2]. The shallow semantic relations between pairs of terminal nodes are labeled as dependency relations by the parsers. Many downstream NLP tasks, such as relation extraction [3,4,5] and machine translation [6], are highly relied on the dependencies between syntactic components. Dependency parsers are widely applided in multiple NLP applications including in the medical domain

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.