Assessing the performance of emerging and existing continuous monitoring solutions under a single-blind controlled testing protocol
Continuous monitoring (CM) solutions can facilitate faster detection and repair of emissions compared to traditional survey methods. This study tested 13 CM solutions over 12 weeks using single-blind controlled testing. Controlled release rates ranged from 0.08 to 6.75 kg CH4 h−1 and lasted 18 min to 8 h. Six solutions demonstrated 90% method detection limits (DL90s) ranging from 0.5 [0.3, 0.6] kg CH4 h−1 to 6.7 [5.9, 8.0] kg CH4 h−1. Of the 6 solutions, 5 had low False Positive (FP) rates (7.8%–18.9%), and 4 had low False Negative (FN) rates (8.0%–34.1%). Similar to Ilonze et al., the results show that the tested solutions balance method sensitivity with low FP and FN rates. All scanning/imaging solutions achieved high localization precision and accuracy (≥40%) at the equipment unit level. Single quantification estimates exhibited high relative quantification errors, ranging from 33 [0.9, 66]%, 95% confidence interval (CI) to 1326 [1003, 1648]%, 95% CI for small emissions (between 0.1 and 1 kg CH4 h−1) and from 3 [−20, 26]%, 95% CI to 3578 [−2832, 9988]%, 95% CI for large emissions (>1 kg CH4 h−1). The mean detection time for all solutions ranged from 5 h to 5 days. Relative to previous studies, errors in quantification estimates decreased, as did FN and FP rates, with improved DL90s for 2 of the 4 retested solutions. However, the mean detection times increased for 2 solutions, remained constant for one solution, and decreased for 1 of the 4 retested solutions. These findings highlight that continuous, rigorous testing enhances solution performance, with notable improvements observed across multiple testing programs using the same test protocol.
- Research Article
8
- 10.1007/s00180-016-0690-2
- Oct 22, 2016
- Computational Statistics
We assessed the ability of several penalized regression methods for linear and logistic models to identify outcome-associated predictors and the impact of predictor selection on parameter inference for practical sample sizes. We studied effect estimates obtained directly from penalized methods (Algorithm 1), or by refitting selected predictors with standard regression (Algorithm 2). For linear models, penalized linear regression, elastic net, smoothly clipped absolute deviation (SCAD), least angle regression and LASSO had a low false negative (FN) predictor selection rates but false positive (FP) rates above 20 % for all sample and effect sizes. Partial least squares regression had few FPs but many FNs. Only relaxo had low FP and FN rates. For logistic models, LASSO and penalized logistic regression had many FPs and few FNs for all sample and effect sizes. SCAD and adaptive logistic regression had low or moderate FP rates but many FNs. 95 % confidence interval coverage of predictors with null effects was approximately 100 % for Algorithm 1 for all methods, and 95 % for Algorithm 2 for large sample and effect sizes. Coverage was low only for penalized partial least squares (linear regression). For outcome-associated predictors, coverage was close to 95 % for Algorithm 2 for large sample and effect sizes for all methods except penalized partial least squares and penalized logistic regression. Coverage was sub-nominal for Algorithm 1. In conclusion, many methods performed comparably, and while Algorithm 2 is preferred to Algorithm 1 for estimation, it yields valid inference only for large effect and sample sizes.
- Research Article
1
- 10.1167/iovs.66.1.4
- Jan 2, 2025
- Investigative ophthalmology & visual science
When treating amblyopia, it is important to define when visual acuity (VA) is no longer improving (i.e., stable) because treatment decisions may be altered based on this determination. Simulated observed VAs, incorporating measurement error, were compared with simulated true VAs to determine false-positive and false-negative rates for stable VA for six rules (using single VA or test/retest measurements, with or without averaging, over two or three visits). Four HOTV VA profiles were modeled: stable or improving VA over time with each of patching and spectacles. Across six rules and two treatments, when true VA was stable, false-negative rates for stability ranged from 26% to 67%; when true VA was improving, false-positive rates for stability ranged from 0% to 38%. Single VA measurements at consecutive visits had a false-negative rate of 30% with patching and 29% with spectacles, a false-positive rate of 38% with patching and 35% with spectacles. Averaging two VA tests at each visit slightly increased the false-negative rate (35% with patching and 36% with spectacles), while reducing the false-positive rate (22% with patching and 21% with spectacles). Comparing false-negative and false-positive rates for stability across rules allows selection of the most appropriate rule for clinical practice or research. When considering less desirable treatments, a rule with a lower false-negative rate is preferable, whereas a rule with a lower false-positive rate would be preferred when it is important to correctly classify improving VA.
- Research Article
5
- 10.1016/s2589-7500(24)00243-7
- Jan 1, 2025
- The Lancet. Digital health
A prospectively deployed deep learning-enabled automated quality assurance tool for oncological palliative spine radiation therapy.
- Research Article
63
- 10.3310/hta10470
- Nov 1, 2006
- Health Technology Assessment
To review for acute abdominal pain (AAP), the diagnostic accuracies of combining decision tools (DTs) and doctors aided by DTs compared with those of unaided doctors. Also to evaluate the impact of providing doctors with an AAP DT on patient outcomes, clinical decisions and actions, what factors are likely to determine the usage rates and usability of a DT and the associated costs and likely cost-effectiveness of these DTs in routine use in the UK. Electronic databases were searched up to 1 July 2003. Data from each eligible study were extracted. Potential sources of heterogeneity were extracted for both questions. For the accuracy review, meta-analysis was conducted. Among studies comparing diagnostic accuracies of DTs with unaided doctors, error rate ratios provided estimates of the differences between the false-negative and false-positive rates of the DT and unaided doctors' performance. Pooled error rate ratios and 95% confidence intervals (CIs) for false-negative rates and false-positive rates were computed. Metaregression was used to explore heterogeneity. Thirty-two studies from 27 articles, all based in secondary care, were eligible for the review of DT accuracies, while two were eligible for the review of the accuracy of hospital doctors aided by DTs. Sensitivities and specificities for DTs ranged from 53 to 99% and from 30 to 99%, respectively. Those for unaided doctors ranged from 64 to 93% and from 39 to 91%, respectively. Thirteen studies reported false-positive and false-negative rates for both DTs and unaided doctors, enabling a direct comparison of their performance. In random effects meta-analyses, DTs had significantly lower false-positive rates (error rate ratio 0.62, 95% CI 0.46 to 0.83) than unaided doctors. DTs may have higher false-negative rates than unaided doctors (error rate ratio 1.34, 95% CI 0.93 to 1.93). Significant heterogeneity was present. Two studies compared the diagnostic accuracies of doctors aided by DTs to unaided doctors. In a multiarm cluster randomised controlled trial (n = 5193), the diagnostic accuracy of doctors not given access to DTs was not significantly worse (sensitivity 28.4% and specificity 96.0%) than that of three groups of aided doctors (sensitivities of 42.4-47.9%, and specificities of 95.5-96.5%, respectively). In an uncontrolled before-and-after study (n = 1484), the sensitivities and specificities of aided and unaided doctors were 95.5% and 91.5% (p = 0.24) and 78.1% and 86.4% (p < 0.001), respectively. The metaregression of DTs showed that prospective test-set validation at the site of the tool's development was associated with considerably higher diagnostic accuracy than prospective test-set validation at an independent centre [relative diagnostic odds ratio (RDOR) 8.2; 95% CI 3.1 to 14.7]. It also showed that the earlier in the year the study was performed the higher the performance (RDOR 0.88, 0.83 to 0.92), that when developers evaluated their own DT there was better performance than when independent evaluators carried out the study (RDOR = 3.0, 1.3 to 6.8), and that there was no evidence of association between other quality indicators and DT accuracy. The one eligible study of the impact study review, a four-arm cluster randomised trial (n = 5193), showed that hospital admission rates of patients by doctors not allocated to a DT (42.8%) were significantly higher than those by doctors allocated to three combinations of decision support (34.2-38.5%) (p < 0.001). There was no evidence of a difference between perforation rates (p = 0.19) and negative laparotomy rates in the four trial arms (p = 0.46). Usage rates of DTs by doctors in accident and emergency departments ranged from 10 to 77% in the six studies that reported them. Possible determinants of usability include the reasoning method used, the number of items used and the output format. A deterministic cost-effectiveness comparison demonstrated that a paper checklist is likely to be 100-900 times more cost-effective than a computer-based DT, under stated assumptions. With their significantly greater specificity and lower false-positive rates than doctors, DTs are potentially useful in confirming a diagnosis of acute appendicitis, but not in ruling it out. The clinical use of well-designed, condition-specific paper or computer-based structured checklists is promising as a way to improve impact on patient outcomes, subject to further research.
- Conference Article
18
- 10.1109/ccece.2004.1345313
- May 2, 2004
We present an application of a honeypot in detection collaboration with an intrusion detection system. We have designed and implemented a honeypot port-scan detection system for scan detection, which can work as a module of the intrusion detection system and can also run independently. Nowadays, intrusion detection systems face more challenges, such as data overload, high false positives and negatives, and being incapable of understanding the encrypted or IPv6 packets. We introduce new data structures (such as a new link structure for slow scan) and new event mechanisms in our system, and present a new method to solve some weaknesses in known techniques, so our system can provide an early scan warning and detect some new attacks. Our tests on this system in a typical network environment show that the system has very low false positives and false negatives.
- Research Article
127
- 10.1101/gr.192502
- Nov 1, 2002
- Genome Research
Recent advances in microarray technology have opened new ways for functional annotation of previously uncharacterised genes on a genomic scale. This has been demonstrated by unsupervised clustering of co-expressed genes and, more importantly, by supervised learning algorithms. Using prior knowledge, these algorithms can assign functional annotations based on more complex expression signatures found in existing functional classes. Previously, support vector machines (SVMs) and other machine-learning methods have been applied to a limited number of functional classes for this purpose. Here we present, for the first time, the comprehensive application of supervised neural networks (SNNs) for functional annotation. Our study is novel in that we report systematic results for ~100 classes in the Munich Information Center for Protein Sequences (MIPS) functional catalog. We found that only ~10% of these are learnable (based on the rate of false negatives). A closer analysis reveals that false positives (and negatives) in a machine-learning context are not necessarily "false" in a biological sense. We show that the high degree of interconnections among functional classes confounds the signatures that ought to be learned for a unique class. We term this the "Borges effect" and introduce two new numerical indices for its quantification. Our analysis indicates that classification systems with a lower Borges effect are better suitable for machine learning. Furthermore, we introduce a learning procedure for combining false positives with the original class. We show that in a few iterations this process converges to a gene set that is learnable with considerably low rates of false positives and negatives and contains genes that are biologically related to the original class, allowing for a coarse reconstruction of the interactions between associated biological pathways. We exemplify this methodology using the well-studied tricarboxylic acid cycle.
- Research Article
- 10.1203/00006450-197704000-00788
- Apr 1, 1977
- Pediatric Research
229 urines obtained from college coeds with acute urinary tract infections were examined by comparing the standard calibrated loop culture method with a combination of qualitative Greiss Test (Bac-U-Dip) and a semiquantitative (Bacturcult) method. Analysis of the data demonstrated a small false positive (F.P.) but a marked false negative (F.N.) rate for Bac-U-Dip (BUD). The false negative rate for Bacturcult (B/C) was zero and the false positive rate, when compared to a no-growth culture (NG) was also low (1/70). The combination of Bacturcult and Bac-U-Dip provides both low false positive and false negative rates.
- Research Article
21
- 10.1136/bjophthalmol-2018-312385
- Oct 11, 2018
- The British Journal of Ophthalmology
BackgroundGlaucoma referral filtering schemes have operated in the UK for many years. However, there is a paucity of data on the false-negative (FN) rate. This study evaluated the clinical effectiveness...
- Conference Article
3
- 10.1117/12.2613222
- Apr 4, 2022
PURPOSE: As medical education adopts a competency-based training method, experts are spending substantial amounts of time instructing and assessing trainees’ competence. In this study, we look to develop a computer-assisted training platform that can provide instruction and assessment of open inguinal hernia repairs without needing an expert observer. We recognize workflow tasks based on the tool-tissue interactions, suggesting that we first need a method to identify tissues. This study aims to train a neural network in identifying tissues in a low-cost phantom as we work towards identifying the tool-tissue interactions needed for task recognition. METHODS: Eight simulated tissues were segmented throughout five videos from experienced surgeons who performed open inguinal hernia repairs on phantoms. A U-Net was trained using leave-one-user-out cross validation. The average F-score, false positive rate and false negative rate were calculated for each tissue to evaluate the U-Net’s performance. RESULTS: Higher F-scores and lower false negative and positive rates were recorded for the skin, hernia sac, spermatic cord, and nerves, while slightly lower metrics were recorded for the subcutaneous tissue, Scarpa’s fascia, external oblique aponeurosis and superficial epigastric vessels. CONCLUSION: The U-Net performed better in recognizing tissues that were relatively larger in size and more prevalent, while struggling to recognize smaller tissues only briefly visible. Since workflow recognition does not require perfect segmentation, we believe our U-Net is sufficient in recognizing the tissues of an inguinal hernia repair phantom. Future studies will explore combining our segmentation U-Net with tool detection as we work towards workflow recognition.
- Research Article
4
- 10.1016/j.cgh.2025.02.018
- Nov 1, 2025
- Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association
Effectiveness of Six International Guidelines Using Fibrosis-4 and FibroScan for Risk Stratification of Metabolic Dysfunction-associated Steatotic Liver Disease in Type 2 Diabetes.
- Abstract
2
- 10.1016/j.ultrasmedbio.2011.05.732
- Jul 26, 2011
- Ultrasound in Medicine & Biology
The Diagnosis Performance of Ultrasonic Transient Elastography for Noninvasive Assessment of Liver Fibrosis in 1138 Chronic Hepatitis C Patients
- Research Article
7
- 10.1176/appi.ps.61.9.923
- Sep 1, 2010
- Psychiatric Services
Validation of Brief Screening Tools for Mental Disorders Among New Zealand Prisoners
- Research Article
25
- 10.1021/acs.est.3c08511
- Jun 12, 2024
- Environmental science & technology
The recent regulatory spotlight on continuous monitoring (CM) solutions and the rapid development of CM solutions have demanded the characterization of solution performance through regular, rigorous testing using consensus test protocols. This study is the second known implementation of such a protocol involving single-blind controlled testing of 9 CM solutions. Controlled releases of rates (6-7100 g) CH4/h over durations (0.4-10.2 h) under a wind speed range of (0.7-9.9 m/s) were conducted for 11 weeks. Results showed that 4 solutions achieved method detection limits (DL90s) within the tested emission rate range, with all 4 solutions having both the lowest DL90s (3.9 [3.0, 5.5] kg CH4/h to 6.2 [3.7, 16.7] kg CH4/h) and false positive rates (6.9-13.2%), indicating efforts at balancing low sensitivity with a low false positive rate. These results are likely best-case scenario estimates since the test center represents a near-ideal upstream field natural gas operation condition. Quantification results showed wide individual estimate uncertainties, with emissions underestimation and overestimation by factors up to >14 and 42, respectively. Three solutions had >80% of their estimates within a quantification factor of 3 for controlled releases in the ranges of [0.1-1] kg CH4/h and > 1 kg CH4/h. Relative to the study by Bell et al., current solutions performance, as a group, generally improved, primarily due to solutions from the study by Bell et al. that were retested. This result highlights the importance of regular quality testing to the advancement of CM solutions for effective emissions mitigation.
- Supplementary Content
7
- 10.1111/tmi.13193
- Jan 8, 2019
- Tropical Medicine & International Health
To evaluate three non-invasive assays for the diagnosis of schistosomiasis mansoni in an Egyptian village. Urine was collected for the detection of circulating cathodic antigen (CCA) and cell-free parasite DNA (cfpd) by Point-of-contact (POC)-cassette assay and PCR, respectively. These tests were compared to Kato-Katz (KK) faecal thick smear for detection of Schistosoma mansoni eggs. Disease prevalence by POC-CCA assay was 86%; by PCR it was 39% vs. 27% by KK. Compared to KK, the sensitivity of POC-CCA reached 100%, but its specificity was only 19.2% with 41% accuracy. Sensitivity of the PCR assay for cfpd was 55.56%, and specificity was 67.12% with 64% accuracy. A new end point was calculated for combined analysis of KK, POC-CCA assay and PCR. Sensitivity for the three tests was 52.94%, 90.2% and 76.47%; specificity was 100% for KK and PCR and 18.37% for POC-CCA. The accuracy calculated for the three tests at the end point was 76% for KK, 55% for POC-CCA assay and 88% for PCR. Conventional PCR assay for detection of cfpd provides a potential screening tool for intestinal schistosomiasis with reliable specificity, reasonable accuracy and affordable financial and technical cost.
- Research Article
4
- 10.1002/agm2.12254
- May 19, 2023
- Aging medicine (Milton (N.S.W))
This methodological research aimed to investigate and compare the sensitivity and specificity of conventional and new face validation in identifying incomprehensible items empirically. A purposive sample of 15 older people living in three residential care homes (RCHs) in Hong Kong was used to evaluate a newly developed 106 items covering seven quality-of-life dimensions. The abbreviated Mental Test (Hong Kong version; AMT) was used as a screening tool for excluding those with impaired cognition. The interview was audiotaped, and incomprehensible items were identified by the research panel accordingly (served as the gold standard). The socio-demographics of the respondents were described. Understandability (yes/no, conventional face validation method) and interpretability (4-point Likert scale, new method) were compared and used to compute the Kappa value (representing chance agreement), sensitivity, and specificity analysis. Fifteen older people were interviewed and responded to the structured interview of 106 items regarding understandability and interpretability. 61 items (57%) obtained 100% positive understandability while only 35 items (33%) obtained 100% correct interpretability.The Kappa coefficient was 0.388 (P < 0.001) of the chance agreement between understandability and interpretability. The panel confirmed that 32% of items required revision (i.e., incomprehensible items). The false negative rate of using the conventional approach was up to 70.59% while both the false positive and negative rates of using the new approach were low (0%-5.88%). This empirical evidence indicated that the conventional approach of face validation for checking incomprehensible items by older people encountered a high false negative rate. On the contrary, the new approach was recommended because it demonstrated high sensitivity and specificity and low false positive and negative rates in identifying incomprehensible items.