CAS: corpus of clinical cases in French

Natalia Grabar,Vincent Claveau,Clément Dalloux

doi:10.1186/s13326-020-00225-x

Abstract

BackgroundTextual corpora are extremely important for various NLP applications as they provide information necessary for creating, setting and testing those applications and the corresponding tools. They are also crucial for designing reliable methods and reproducible results. Yet, in some areas, such as the medical area, due to confidentiality or to ethical reasons, it is complicated or even impossible to access representative textual data. We propose the CAS corpus built with clinical cases, such as they are reported in the published scientific literature in French.ResultsCurrently, the corpus contains 4,900 clinical cases in French, totaling nearly 1.7M word occurrences. Some clinical cases are associated with discussions. A subset of the whole set of cases is enriched with morpho-syntactic (PoS-tagging, lemmatization) and semantic (the UMLS concepts, negation, uncertainty) annotations. The corpus is being continuously enriched with new clinical cases and annotations. The CAS corpus has been compared with similar clinical narratives. When computed on tokenized and lowercase words, the Jaccard index indicates that the similarity between clinical cases and narratives reaches up to 0.9727.ConclusionWe assume that the CAS corpus can be effectively exploited for the development and testing of NLP tools and methods. Besides, the corpus will be used in NLP challenges and distributed to the research community.

Highlights

Textual corpora are extremely important for various Natural Language Processing (NLP) applications as they provide information necessary for creating, setting and testing those applications and the corresponding tools
The purpose of our work is to introduce the CAS corpus, that contains clinical cases in French such as those published in scientific literature or used in the education and training of medical students
We present the methods used for building, annotation and analysis of the CAS corpus with clinical cases in French (“Methods”)

Summary

Introduction

Textual corpora are extremely important for various NLP applications as they provide information necessary for creating, setting and testing those applications and the corresponding tools. They are crucial for designing reliable methods and reproducible results. In some areas, such as the medical area, due to confidentiality or to ethical reasons, it is complicated or even impossible to access representative textual data. We propose the CAS corpus built with clinical cases, such as they are reported in the published scientific literature in French

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of biomedical semantics	Publication Date: Aug 6, 2020
Citations: 10	License type: open-access

R Discovery Prime

R Discovery Prime

CAS: corpus of clinical cases in French

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of biomedical semantics

Lead the way for us

Similar Papers

CAS: French Corpus with Clinical Cases
Natalia Grabar ... Clément Dalloux
-
Natalia Grabar, et. al.Natalia Grabar ... Clément Dalloux
01 Jan 2018
01 Jan 2018

European Clinical Case Corpus
Bernardo Magnini ... Roberto Zanoli
-
Bernardo Magnini, et. al.Bernardo Magnini ... Roberto Zanoli
02 Nov 2022
02 Nov 2022

Online teaching of inflammatory skin pathology by a French-speaking International University Network.
Emilie Perron ...
Diagnostic Pathology | VOL. Suppl 9 1
Emilie Perron, et. al.Emilie Perron ...
01 Dec 2014
Diagnostic Pathology | VOL. Suppl 9 1

Retrieval of Similar Electronic Health Records Using UMLS Concept Graphs
Laura Plaza ... Alberto Díaz
-
Laura Plaza, et. al.Laura Plaza ... Alberto Díaz
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CAS: corpus of clinical cases in French

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of biomedical semantics