Original Data Sources Research Articles

Machine learning has become increasingly important in biomechanics. It allows to unveil hidden patterns from large and complex data, which leads to a more comprehensive understanding of biomechanical processes and deeper insights into human movement. However, machine learning models are often trained on a single dataset with a limited number of participants, which negatively affects their robustness and generalizability. Combining data from multiple existing sources provides an opportunity to overcome these limitations without spending more time on recruiting participants and recording new data. It is furthermore an opportunity for researchers who lack the financial requirements or laboratory equipment to conduct expensive motion capture studies themselves. At the same time, subtle interlaboratory differences can be problematic in an analysis, due to the bias that they introduce. In our study, we investigated differences in motion capture datasets in the context of machine learning, for which we combined overground walking trials from four existing studies. Specifically, our goal was to examine whether a machine learning model was able to predict the original data source based on marker and GRF trajectories of single strides, and how different scaling methods and pooling procedures affected the outcome. Layer-wise relevance propagation was applied to understand which factors were influential to distinguish the original data sources. We found that the model could predict the original data source with a very high accuracy (up to >99%), which decreased by about 15 percentage points when we scaled every dataset individually prior to pooling. However, none of the proposed scaling methods could fully remove the dataset bias. Layer-wise relevance propagation revealed that there was not only one single factor that differed between all datasets. Instead, every dataset had its unique characteristics that were picked up by the model. These variables differed between the scaling and pooling approaches but were mostly consistent between trials belonging to the same dataset. Our results show that motion capture data is sensitive even to small deviations in marker placement and experimental setup and that small inter-group differences should not be overinterpreted during data analysis, especially when the data was collected in different labs. Furthermore, we recommend scaling datasets individually prior to pooling them which led to the lowest accuracy. We want to raise awareness that differences in datasets always exist and are recognizable by machine learning models. Researchers should thus think about how these differences might affect their results when combining data from different studies.

Read full abstract

Abstract Introduction: The promise of achieving cancer control both nationally and globally rest on rigorous data collection, data harmonization and data sharing to catalyze scientific knowledge and understanding to more quickly advance our progress to addressing cancer disparities. In response, the LIFE Project aimed to establish a cancer-focused robust, functional platform for data harmonization and sharing across national and international studies with a focus on low-resourced settings and LMIC. Methods: Established in 2022, the integrated database platform hosts data collected from the ongoing Jamaican LIFE and US CAP3 projects. Both studies focus on investigating biologic and socioecologic risk factors for non-communicable chronic diseases (NCDs) and cancers. Participants self-reported data via REDCap. A one-to one mapping of the variables from both questionnaires was conducted and stored in the integrated REDCap file. An automated REDCap Application Programming Interface (API) script weekly imports the new entries into the integrated database. To ensure data accuracy, on a weekly schedule, data importation and data integration are monitored and assessed by random sampling of original source data entries and the corresponding entries in the integrated database. Any additional errors in data integration are resolved via consultations with the programming team. Results: As of June 2023, 3,369 participants were enrolled - 2,561 enrolled from the LIFE study and 808 from the CAP3 study. Overall, the combined participants were 89% Afro-Caribbean or African American, and comprised 40% males with a median age of 54 years and 60% females with median age being 51 years. For the LIFE study, median age is 50 years, and for the CAP3 the median age is 55. The prevalence of self-reported health conditions are as follows: diabetes - 13% LIFE, 13% CAP3; hypertension - 37% LIFE, 19% CAP3; dyslipidemia - 15% LIFE, 21% CAP3. Conclusion: With the continuing need for precision public health, there is a dire need to address disparities through innovative ways that promote comprehensive data linkage and analyses. This platform can be used in mapping individual cohort data with demonstrated success for the LIFE and CAP3 studies. Our web-based query tool to interface that interfaces with our REDCap database also allows us to query the available variables in both the separate and combined cohort databases and will facilitate hypothesis development. The impact of our study goes beyond the data harmonization, and robust data integration platform that links the LIFE and CAP3 studies. This allows for US comparative studies. Novel. We will also have the ability to harmonize with large US cohort studies, as well as interrogate research questions examining commonalities and differences among African Americans, Afro-Caribbean Americans and Afro-Caribbeans in the Caribbean. These comparative studies will help to elucidate the causes increased health risks, and accelerate the science to remedy the global Black NCD and cancer disparities. Citation Format: Camille C. R. Ragin, Janeil Williams, Olga Tchuvatkina, Joette Mckenzie, Kimlin Ashing, Marshall Tulloch-Reid. Harmonization and integration of regional based prospective cohorts: A preliminary report of the African Caribbean Cancer Consortium [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 2 (Late-Breaking, Clinical Trial, and Invited Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(7_Suppl):Abstract nr LB141.

Read full abstract

Original Data Sources Research Articles

Related Topics

Articles published on Original Data Sources

Sequential Manipulation Against Rank Aggregation: Theory and Algorithm.

Exploring Dataset Bias and Scaling Techniques in Multi-Source Gait Biomechanics: An Explainable Machine Learning Approach

Danish and Swedish National Data Collections for Cancer – Solutions for Radiotherapy

Psychological and pharmacological treatments of intermittent explosive disorder: a meta-analysis protocol

Wind loading on gable and multi-span roof buildings: Comparison between field monitoring, wind tunnel experiments, and design code provisions

ANALYSIS OF PARAPRASHING STRATEGIES IN THESIS OF EXEMPLARY STUDENTS’ ENGLISH EDUCATION STUDY PROGRAM UNIVERSITAS MUHAMMADIYAH KOTABUMI

Dynamically documenting archaeological excavations based on 3D modeling: a case study of the excavation of the #3 fossil of hominin cranium from Yunxian, Hubei, China

The Corporate Chief of Staff: Strategic Leadership Influence From Outside the Spotlight

FetalAI: A deep learning web-based application for predicting birthweight from prenatal ultrasound measurements

A bi-fidelity DeepONet approach for modeling hysteretic systems under uncertainty

Ethnobotanical survey of medicinal plants used in north-central Morocco as natural analgesic and anti-inflammatory agents

Two-Stage Training Framework Using Multicontrast MRI Radiomics for IDH Mutation Status Prediction in Glioma.

Predicting an Optimal Virtual Data Model for Uniform Access to Large Heterogeneous Data

Abstract LB141: Harmonization and integration of regional based prospective cohorts: A preliminary report of the African Caribbean Cancer Consortium

DRC-EDI: An integrity protection scheme based on data right confirmation for mobile edge computing

Synthetic data in cancer and cerebrovascular disease research: A novel approach to big data.

The Attitude of The Chinese and Vietnamese Ruling Class Towards Western Astronomy From the 16th to the 18th Centuries

Workarounds produce pseudo-data quality: Insights from case studies.

Helicopter Aerial Work: Technology to Meet Growing Needs in Critical Missions

基于InVEST模型的2000–2020年乌鲁木齐碳储量分布数据集

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Original Data Sources Research Articles

Related Topics

Articles published on Original Data Sources

Sequential Manipulation Against Rank Aggregation: Theory and Algorithm.

Exploring Dataset Bias and Scaling Techniques in Multi-Source Gait Biomechanics: An Explainable Machine Learning Approach

Danish and Swedish National Data Collections for Cancer – Solutions for Radiotherapy

Psychological and pharmacological treatments of intermittent explosive disorder: a meta-analysis protocol

Wind loading on gable and multi-span roof buildings: Comparison between field monitoring, wind tunnel experiments, and design code provisions

ANALYSIS OF PARAPRASHING STRATEGIES IN THESIS OF EXEMPLARY STUDENTS’ ENGLISH EDUCATION STUDY PROGRAM UNIVERSITAS MUHAMMADIYAH KOTABUMI

Dynamically documenting archaeological excavations based on 3D modeling: a case study of the excavation of the #3 fossil of hominin cranium from Yunxian, Hubei, China

The Corporate Chief of Staff: Strategic Leadership Influence From Outside the Spotlight

FetalAI: A deep learning web-based application for predicting birthweight from prenatal ultrasound measurements

A bi-fidelity DeepONet approach for modeling hysteretic systems under uncertainty

Ethnobotanical survey of medicinal plants used in north-central Morocco as natural analgesic and anti-inflammatory agents

Two-Stage Training Framework Using Multicontrast MRI Radiomics for IDH Mutation Status Prediction in Glioma.

Predicting an Optimal Virtual Data Model for Uniform Access to Large Heterogeneous Data

Abstract LB141: Harmonization and integration of regional based prospective cohorts: A preliminary report of the African Caribbean Cancer Consortium

DRC-EDI: An integrity protection scheme based on data right confirmation for mobile edge computing

Synthetic data in cancer and cerebrovascular disease research: A novel approach to big data.

The Attitude of The Chinese and Vietnamese Ruling Class Towards Western Astronomy From the 16th to the 18th Centuries

Workarounds produce pseudo-data quality: Insights from case studies.

Helicopter Aerial Work: Technology to Meet Growing Needs in Critical Missions

基于InVEST模型的2000–2020年乌鲁木齐碳储量分布数据集