Internal validation protocol for large collaborative clinical data sets: assessment of the CONGRESS database.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Multicentre clinical research collaboratives collect large, generalisable data sets. However, data are often collected by trainees who may lack clinical or academic experience, raising concerns about data quality and potential reporting bias. Validation practices in such studies are variable. This study outlines the methods, feasibility, and outcomes of internal data validation using the CONGRESS database. The multicentre CONGRESS data set of early oesophagogastric cancer was assessed. A random 20% sample of patients was selected to meet a >15% target validation size. Patient, disease and outcome data were re-abstracted from medical records and entered into a validation data set, which was compared with the original database. Cohen's kappa coefficient (κ) and Pearsons corelation (r) were calculated to express the strength of agreement between categorical and continuous variables, respectively. In total, 302 patients (18.1%) from the original CONGRESS database were included in the validation data set and 3,320 data points were compared between data sets (6,640 total). The percentage of exact agreement for variables ranged from 82.5% to 98.7% (median 92.3%, interquartile range 86.3%-95.7%). Nine variables (1,645 of 2,946, 55.8% data points) showed 'almost perfect' agreement (κ or r > 0.8), and five (1,301 of 2,946, 44.2%) showed substantial agreement (κ > 0.6). None showed weak or poor agreement. This study proposes a reproducible framework and benchmarks for validating large collaborative clinical data sets, using the national CONGRESS data set as an example. This approach offers a standard for ensuring reliable, high-quality research outcomes across multicentre databases.

Similar Papers
  • Research Article
  • 10.1093/bjs/znaf270.145
75 Internal Validity for Large Collaborative Clinical Datasets: Assessment of the CONGRESS Database
  • Dec 29, 2025
  • British Journal of Surgery
  • Kirsty Cole + 12 more

Aim Multi-centre clinical research collaboratives generate large, generalisable datasets, but concerns remain regarding data quality due to variability in validation practices and the involvement of trainees with limited clinical or academic experience. This study aims to evaluate the feasibility, methodology, and effectiveness of internal data validation within the CONGRESS database, to inform best practices for ensuring data integrity in collaborative research settings. Method The multicentre CONGRESS dataset of early oesophago-gastric cancer was assessed. A random 20% sample of patients was selected to meet a >15% target validation size. Patient, disease and outcome data were re-abstracted from medical records and entered into a validation dataset which was compared to the original database. Cohen’s kappa coefficient (κ) and Pearsons corelation (r) were calculated to express the strength of agreement between categorical and continuous variables, respectively. Results In total, 302 patients (18.1%) from the original CONGRESS database were included in the validation dataset and 3320 data points were compared between datasets (6640 total). The percentage of exact agreement for variables ranged from 82.5-98.7% (median 92.3%, IQR 86.3-95.7%). 9 variables (1645/2946, 55.8% data points) showed “almost-perfect” agreement (κ or r >0.8), 5 (1301/2946, 44.2%) showed substantial agreement (κ > 0.6). None showed weak or poor agreement. Conclusions This study provides strong evidence of internal validity for the CONGRESS collaborative clinical database. It presents key learning points and a methodology for data validation, which can be applied to other large collaborative databases. This approach aims to enhance confidence in the quality and reliability of research conducted through these platforms.

  • Research Article
  • Cite Count Icon 100
  • 10.1111/j.1365-2044.2005.04121.x
Assessing the applicability of scoring systems for predicting postoperative nausea and vomiting
  • Mar 14, 2005
  • Anaesthesia
  • J E Van Den Bosch + 6 more

We have validated two scoring systems for predicting postoperative nausea and vomiting, derived by Apfel et al. and Koivuranta et al. from 1388 adult inpatients undergoing a wide range of surgical procedures. The predictive accuracy of the scoring systems was evaluated in terms of the ability to discriminate between patients with and without postoperative nausea and vomiting (discrimination) and agreement between observed and predicted outcomes (calibration). Discrimination and calibration were less than expected based on previous reports, with both scoring systems providing risk predictions that were too extreme. The area under the ROC curve was 0.63 for Apfel et al.'s scoring system and 0.66 for Koivuranta et al.'s scoring system. Neither of the scoring systems provided a risk threshold for administering anti-emetic prophylaxis that yielded satisfying results in terms of predictive values, sensitivity and specificity. Hence, in their original forms, the scoring systems do not guarantee accurate prediction of the risk of postoperative nausea and vomiting in other patient populations. Koivuranta et al.'s scoring system appears to be more robust across different populations.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.3389/fmed.2023.1188542
Deep learning system for distinguishing optic neuritis from non-arteritic anterior ischemic optic neuropathy at acute phase based on fundus photographs.
  • Jun 29, 2023
  • Frontiers in Medicine
  • Kaiqun Liu + 11 more

To develop a deep learning system to differentiate demyelinating optic neuritis (ON) and non-arteritic anterior ischemic optic neuropathy (NAION) with overlapping clinical profiles at the acute phase. We developed a deep learning system (ONION) to distinguish ON from NAION at the acute phase. Color fundus photographs (CFPs) from 871 eyes of 547 patients were included, including 396 ON from 232 patients and 475 NAION from 315 patients. Efficientnet-B0 was used to train the model, and the performance was measured by calculating the sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). Also, Cohen's kappa coefficients were obtained to compare the system's performance to that of different ophthalmologists. In the validation data set, the ONION system distinguished between acute ON and NAION achieved the following mean performance: time-consuming (23 s), AUC 0.903 (95% CI 0.827-0.947), sensitivity 0.796 (95% CI 0.704-0.864), and specificity 0.865 (95% CI 0.783-0.920). Testing data set: time-consuming (17 s), AUC 0.902 (95% CI 0.832-0.944), sensitivity 0.814 (95% CI 0.732-0.875), and specificity 0.841 (95% CI 0.762-0.897). The performance (κ = 0.805) was comparable to that of a retinal expert (κ = 0.749) and was better than the other four ophthalmologists (κ = 0.309-0.609). The ONION system performed satisfactorily distinguishing ON from NAION at the acute phase. It might greatly benefit the challenging differentiation between ON and NAION.

  • Research Article
  • Cite Count Icon 34
  • 10.2169/internalmedicine.51.6718
Validity and Reliability Assessment of a Japanese Version of the Snaith-Hamilton Pleasure Scale
  • Jan 1, 2012
  • Internal Medicine
  • Hiroshi Nagayama + 18 more

Anhedonia is one of the main non-motor symptoms in Parkinson's disease (PD); it is assessed using the Snaith-Hamilton pleasure scale (SHAPS). To assess anhedonia in the Japanese population, we prepared a Japanese language version of SHAPS (SHAPS-J), and evaluated its validity and reliability in 8 neurological centers. Seventy subjects (48 patients with PD and 22 healthy subjects) were enrolled in this study. The validity of the test was assessed by the correlation between SHAPS-J and the apathy scale, based on the fact that anhedonia is considered a symptom of apathy syndrome. Test-retest reliability and internal consistency were assessed by Cohen's kappa and Cronbach's alpha coefficients, respectively. In the evaluation of validity, the total scores obtained on SHAPS-J during the test and retest significantly correlated with scores on Item 4 in Part 1 of the unified Parkinson's disease rating scale (p<0.0008 and p<0.0036, respectively). Cohen's kappa coefficient was >0.3 on all items (p<0.0005 on all items). Cronbach's alpha coefficient was 0.90 at the baseline and 0.88 at the retest. These results indicate that SHAPS-J has good validity, test-retest reliability, and internal consistency, thus establishing an available measure of anhedonia in Japanese.

  • Research Article
  • Cite Count Icon 39
  • 10.1016/j.jadohealth.2018.08.015
Parent and Adolescent Attitudes Towards Preventive Care and Confidentiality
  • Nov 3, 2018
  • Journal of Adolescent Health
  • Xiaoyu Song + 8 more

Parent and Adolescent Attitudes Towards Preventive Care and Confidentiality

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.arcped.2005.04.004
Adoption internationale : vision de deux pédiatres québécoises
  • Apr 30, 2005
  • Archives de pédiatrie
  • L Auger + 1 more

Adoption internationale : vision de deux pédiatres québécoises

  • Research Article
  • Cite Count Icon 6
  • 10.1088/1741-2552/ab4af3
Consistency of quantitative electroencephalography features in a large clinical data set
  • Nov 12, 2019
  • Journal of Neural Engineering
  • David O Nahmias + 3 more

Consistency of quantitative electroencephalography features in a large clinical data set

  • Research Article
  • Cite Count Icon 12
  • 10.4040/jkan.2006.36.4.652
Knowledge Discovery in Nursing Minimum Data Set Using Data Mining
  • Jan 1, 2006
  • Journal of Korean Academy of Nursing
  • Myonghwa Park + 4 more

The purposes of this study were to apply data mining tool to nursing specific knowledge discovery process and to identify the utilization of data mining skill for clinical decision making. Data mining based on rough set model was conducted on a large clinical data set containing NMDS elements. Randomized 1,000 patient data were selected from year 1998 database which had at least one of the five most frequently used nursing diagnoses. Patient characteristics and care service characteristics including nursing diagnoses, interventions and outcomes were analyzed to derive the meaningful decision rules. Number of comorbidity, marital status, nursing diagnosis related to risk for infection and nursing intervention related to infection protection, and discharge status were the predictors that could determine the length of stay. Four variables (age, impaired skin integrity, pain, and discharge status) were identified as valuable predictors for nursing outcome, relieved pain. Five variables (age, pain, potential for infection, marital status, and primary disease) were identified as important predictors for mortality. This study demonstrated the utilization of data mining method through a large data set with standardized language format to identify the contribution of nursing care to patient's health.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-642-32183-2_82
Data Mining in Uniform Hospital Discharge Data Set Using Rough Set Model
  • Jan 1, 2013
  • M Park

Purpose: The purpose of this study were to apply rough set model to nursing knowledge discovery process. Method: Data mining based on rough set model was conducted on a large clinical data set containing Nursing Minimum Data Set elements. Randomized patient data were selected from Uniform Hospital Discharge Data which had the frequently used nursing diagnoses. Patient and care characteristics including nursing diagnoses, interventions and outcomes were analyzed to derive the decision rules. Results: Number of comorbidity, marital status, nursing diagnosis related to risk for infection and nursing intervention related to infection protection, and discharge status were the predictors to determine the length of stay. Age, impaired skin integrity, pain, and discharge status were identified as valuable predictors for nursing outcome, relived pain. Age, pain, potential for infection, marital status, and primary disease were identified as important predictors for mortality. Conclusion: This study demonstrated the utilization of Rough Set Model through a large data set with standardized language format to identify the contribution of specific care to patient’s health.

  • Research Article
  • Cite Count Icon 64
  • 10.1016/j.ebiom.2020.103146
Development and validation of a real-time artificial intelligence-assisted system for detecting early gastric cancer: A multicentre retrospective diagnostic study.
  • Nov 27, 2020
  • eBioMedicine
  • Dehua Tang + 12 more

Development and validation of a real-time artificial intelligence-assisted system for detecting early gastric cancer: A multicentre retrospective diagnostic study.

  • Research Article
  • 10.1118/1.2962239
SU‐GG‐T‐490: Analysis of Clinical Data Sets by Comparative DVH Analysis
  • Jun 1, 2008
  • Medical Physics
  • W Bice

Analysis of large clinical dosimetry data sets has been limited to evaluation of single number dosimetric quantifiers (like V20 or D90). The quantifiers are linked with clinical outcomes and conclusions drawn based on comparative statistical tests. These quantifiers are often a matter of guesswork, at best chosen by recommendation of investigators who have performed similar analyses. A tool for comparative analysis of the entire dose volume histogram for large data sets has been developed and is described. This tool has been used as an interface between several treatment planning systems, external beam and brachytherapy. The downloaded structure dose volume histograms and data set analysis and comparison are performed within the tool. As an example of this process, a series of implants, performed at multiple institutions, has been analyzed following import from Variseed (Varian Medical Systems). Ten patients who had RTOG Grade 2 rectal complications following the implant were compared to twenty‐three patients who did not experience any rectal complications. While a similar analysis has been performed previously, this work was limited to analyzing rectal dosimetry in terms of 2 or 3 rectal dose quantifiers. While the previous effort was able to show statistical significance with R100, it is apparent from the present work that the critical doses for rectal complications are much more likely associated with doses to the rectum which are much lower than the prescription dose, in the range of 40–50 Gy. This is verified by improved significance of the correlation between the two arms of the cohort at these dose levels. The monotherapy dataset demonstrates the utility of the tool and the associated process. The tool has also been used to analyze external beam data sets to demonstrate the dosimetric improvement of inverse planning over conventional planning for external beam radiotherapy.

  • Research Article
  • Cite Count Icon 24
  • 10.1007/s10815-018-1396-x
A comparison of morphokinetic markers predicting blastocyst formation and implantation potential from two large clinical data sets.
  • Jan 22, 2019
  • Journal of Assisted Reproduction and Genetics
  • N Zaninovic + 6 more

To demonstrate whether the standard morphokinetic markers used for embryo selection have a similar relationship to blastocyst formation and implantation in two large clinical data sets. This is a retrospective cohort analysis striving to answer two distinct questions utilizing data sets from two large IVF clinics. Blastocysts (BL) and implanted blastocysts (I) in both clinics, IVI-Valencia (BL = 11,414, I= 479) and WMC (BL = 15,902; I= 337), were cultured in a time-lapse system (EmbryoScope, Vitrolife, Sweden). The study was designed to assess the relationship between early morphokinetic hallmarks and BL development, with a secondary analysis of implantation rates following single-embryo day 3 and day 5 transfers. We performed a detailed graphical analysis for t3, t5, duration of the second cell cycle (cc2) (t3-t2), and the ratio (t5-t3)/(t5-t2). The t5 timing was not affected between the clinics. However, Weill Cornell Medicine's (WCM) proportions were significantly affected by having BL vs. not. A significant decrease of blastocysts with longer t5 in WCM data, while t5 was more informative in the IVI data set for the implantation rate. Morphokinetic intervals for early cleavages were distributed differently between the clinics. Incorporation of embryo-selection algorithms depends on the individual clinic's selected developmental hallmarks, all of which must be validated before incorporation into clinical practice.

  • Research Article
  • Cite Count Icon 1
  • 10.2166/wst.2020.299
Bayesian network-based methodology for selecting a cost-effective sewer asset management model.
  • Jun 23, 2020
  • Water Science and Technology
  • Julián Guzmán-Fierro + 6 more

This paper presents a methodology based on Bayesian networks (BN) to prioritize and select the minimal number of variables that allows predicting the structural condition of sewer assets to support the strategies in proactive management. The integration of BN models, statistical measures of agreement (Cohen's Kappa coefficient) and a statistical test (Wilcoxon test) were useful for a robust and straightforward selection of a minimum number of variables (qualitative and quantitative) that ensure a suitable prediction level of the structural conditions of sewer pipes. According to the application of the methodology to a specific case study (Bogotás sewer network, Colombia), it found that with only two variables (age and diameter) the model could achieve the same capacity of prediction (Cohen's Kappa coefficient = 0.43) as a model considering several variables. Furthermore, the methodology allows finding the calibration and validation percentage subsets that best fit (80% for calibration and 20% for validation data in the case study) in the model to increase the capacity of prediction with low variations. Furthermore, it found that a model, considering only pipes in critical and excellent conditions, increases the capacity of successful predictions (Cohen's Kappa coefficient from 0.2 to 0.43) for the proposed case study.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 42
  • 10.1371/journal.pone.0146474
Estrogen-Receptor, Progesterone-Receptor and HER2 Status Determination in Invasive Breast Cancer. Concordance between Immuno-Histochemistry and MapQuant™ Microarray Based Assay.
  • Feb 1, 2016
  • PLOS ONE
  • D Mouttet + 10 more

BackgroundHormone receptor status and HER2 status are of critical interest in determining the prognosis of breast cancer patients. Their status is routinely assessed by immunohistochemistry (IHC). However, it is subject to intra-laboratory and inter-laboratory variability. The aim of our study was to compare the estrogen receptor, progesterone receptor and HER2 status as determined by the MapQuant™ test to the routine immuno-histochemical tests in early stage invasive breast cancer in a large comprehensive cancer center.Patients and MethodsWe retrospectively studied 163 invasive early-stage breast carcinoma with standard IHC status. The genomic status was determined using the MapQuant™ test providing the genomic grade index.ResultsWe found only 4 tumours out of 161 (2.5%) with discrepant IHC and genomic results concerning ER status. The concordance rate between the two methods was 97.5% and the Cohen’s Kappa coefficient was 0.89.Comparison between the MapQuant™ PR status and the PR IHC status gave more discrepancies. The concordance rate between the two methods was 91.4% and the Cohen’s Kappa coefficient was 0.74.The HER2 MapQuant™ test was classified as « undetermined » in 2 out of 163 cases (1.2%). One HER2 IHC-negative tumour was found positive with a high HER2 MapQuant™ genomic score. The concordance rate between the two methods was 99.3% and the Cohen’s Kappa coefficient was 0.86.ConclusionOur results show that the MapQuant™ assay, based on mRNA expression assay, provides an objective and quantitative assessment of Estrogen receptor, Progesterone receptor and HER2 status in invasive breast cancer.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 41
  • 10.3389/fimmu.2019.00376
Reliability of Lupus Anticoagulant and Anti-phosphatidylserine/prothrombin Autoantibodies in Antiphospholipid Syndrome: A Multicenter Study
  • Mar 5, 2019
  • Frontiers in Immunology
  • Savino Sciascia + 15 more

Background: Is it well-known that one of the major drawbacks of Lupus Anticoagulant (LA) test is their sensitivity to anticoagulant therapy, due to the coagulation based principle. In this study we aimed to assess the reproducibility of LA testing and to evaluate the performance of solid assay phosphatidylserine/prothrombin (aPS/PT) antibodies.Methods: We included 60 patients that fulfilled the following inclusion criteria: (I) diagnosis of thrombotic antiphospholipid syndrome (APS); (II) patients with thrombosis and (a) inconstant previous LA positivity and/or (b) positivity for antiphospholipid antibodies (aPL) at low-medium titers [defined as levels of anti-β2Glycoprotein-I or anticardiolipin (IgG/IgM) 10–30 GPL/MPL] with no previous evidence of LA positivity. aPL testing was performed blindly in 4 centers undertaking periodic external quality assessment.Results: The 60 patients enrolled were distributed as follows: 43 (71.7%) with thrombotic APS, 7 (11.7%) with thrombosis and inconstant LA positivity and 10 (16.7%) with low-medium aPL titers. Categorical agreement for LA among the centers ranged from 0.41 to 0.60 (Cohen's kappa coefficient; moderate agreement). The correlation determined at the 4 sites for aPS/PT was strong, both quantitatively (Spearman rho 0.84) and when dichotomized (Cohen's kappa coefficients = 0.81 to 1.0). Discordant (as defined by lack of agreement in ≥3 laboratories) or inconclusive LA results were observed in 27/60 (45%) cases; when limiting the analysis to those receiving vitamin K antagonist (VKA), the level of discordant LA results was as high as 75%(15/20). Conversely, aPS/PT testing showed an overall agreement of 83% (up to 90% in patients receiving VKA), providing an overall increase in test reproducibility of +28% when compared to LA, becoming even more evident (+65%) when analyzing patients on VKA. In patients treated with VKA, we observed a good correlation for aPS/PT IgG testing (Cohen's kappa coefficients = 0.81–1; Spearman rho 0.86).Conclusion: Despite the progress in the standardization of aPL testing, we observed up to 45% of overall discrepant results for LA, even higher in patients on VKA. The introduction of aPS/PT testing might represent a further diagnostic tool, especially when LA testing is not available or the results are uncertain.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.