Abstract

The Common Data Elements (CDEs) standard of the International organization for Standardization (ISO) 11179 is commonly used in the field of clinical data processing. The Biomedical Research Integrated Domain Group (BRIDG) model is the framework for biomedical and clinical research. Mapping CDEs to BRIDG (also known as CDE classification) would help with interoperability and data analysis in the field of clinical research. That said, manually mapping CDEs to their corresponding BRIDG class is highly time-consuming and labor-intensive. In this paper we present a new classification algorithm along with a new oversampling method. Our algorithm uses the Term Frequency-Inverse Document Frequency (TF-IDF) as the feature representation method. By assigning different weights to various attributes, we enable more important attributes to perform more important roles during the mapping process. In addition, the oversampling method generates every new attribute in the minor class by picking the length and setting the word of the new attribute according to the existing training set. Our research outcomes demonstrate significant contributions to the field in the following ways: (1) Generation of a new CDE classification algorithm that outperforms existing algorithms in the literature, including the Random Forest Classifier, Linear Support Vector Classification (SVC), Multinomial Naive Bayes (NB), Logistic Regression, and Long Short-Term Memory (LSTM) networks, in terms of accuracy, precision, recall, and F-1 score measures. (2) Generation of a new oversampling method able to improve CDE classification accuracy for Random Forest and Multinomial NB. (3) Our classification algorithm employs two novel attributes, namely “Data Element Preferred Definition” and “Document,” which are more efficient at classifying CDEs than the six attributes traditionally selected by domain experts.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.