A trait-based rapid assessment framework to estimate fire impacts on data-poor Australian invertebrate taxa.
Following large-scale threatening events, a key challenge is to rapidly establish which species have been most affected and are in need of urgent conservation. For data-poor taxa, such assessments are challenging. In Australia, invertebrates represent over 90% of faunal diversity and are critical for ecosystem function, yet most are undescribed, and, of the described, most are poorly known. Thus, it is important to have a way to estimate susceptibility to major disturbance of data-deficient taxa. We developed a novel trait-based method for assessing the impact of a major wildfire on invertebrates. We applied it to 1220 species that showed high distributional overlap with the 2019-2020 Australian megafires. We estimated susceptibility based on the microhabitat species occupy, their life-history and ecological traits, and mechanisms that account for key data uncertainties (number of usable occurrence records, availability of traits data, and recency of taxonomic work). We found 748 species likely to be of potential conservation concern following the megafires; 169, 579, and 454 were highly, moderately, and mildly threatened by a major fire, respectively. Most species (867) were associated with poor or very poor data quality. Of the 867 poorly known species, 97 were most at risk from a major fire. Our approach is generalizable to other data-deficient taxa and to major disturbance events globally and can be used to improve representation of poorly known species in conservation assessments and threat mitigation decisions. If the uncertainties and knowledge gaps we identified are addressed, it is likely risk prediction could be improved.
- Research Article
- Feb 27, 2017
- Annals of infectious disease and epidemiology
Introduction:Effective allocation of resources and investments heavily rely on good quality data. As global investments in vaccines increases, particularly by organisations such as Gavi, The Vaccine Alliance, Switzerland, the demand for data which is accurate and representative is urgent. Understanding what causes poor immunisation data and how to address these problems are therefore key in maximizing investments, improving coverage and reducing risks of outbreaks.Objective:Identify the root causes of poor immunisation data quality and proven solutions for guiding future data quality interventions.Methods and Results:Qualitative systematic review of both scientific and grey literature using key words on immunisation and health information systems. Once screened, articles were classified either as identifying root causes of poor data quality or as an intervention to improve data quality. A total of 8,646 articles were initially identified which were screened and reduced to 26. Results were heterogeneous in methodology, settings and conclusions with a variation of outcomes. Key themes were underperformance in health facilities and limited Human Resource (HR) capacity at the peripheral level leading to data of poor quality. Repeated reference to a “culture” of poor data collection, reporting and use in low-income countries was found implying that it is the attitudes and subsequent behaviour of staff that prevents good quality data. Documented interventions mainly involved implementing Information Communication Technology (ICT) at the district level. However, without changes in HR capacity the skills and practices of staff remain a key impediment to reaching its full impact.Discussion:There was a clear incompatibility between identified root causes, mainly being behavioural and organizational factors, and interventions introducing predominantly technical factors. More emphasis should be placed on interventions that build on current practices and skills in a gradual process in order to be more readily adopted by health workers. Major gaps in the literature exist mainly in the lack of assessment at central and intermediate levels and association between inaccurate target setting from outdated census data and poor data quality as well as limited documentation of interventions that target behaviour change and policy change. This prevents the ability to make informed decisions on best methodology for improving data quality.
- Research Article
114
- 10.3926/jiem.2011.v4n2.p168-193
- Jul 14, 2011
- Journal of Industrial Engineering and Management
Purpose: The technological developments have implied that companies store increasingly more data. However, data quality maintenance work is often neglected, and poor quality business data constitute a significant cost factor for many companies. This paper argues that perfect data quality should not be the goal, but instead the data quality should be improved to only a certain level. The paper focuses on how to identify the optimal data quality level. Design/methodology/approach: The paper starts with a review of data quality literature. On this basis, the paper proposes a definition of the optimal data maintenance effort and a classification of costs inflicted by poor quality data. These propositions are investigated by a case study. Findings: The paper proposes: (1) a definition of the optimal data maintenance effort and (2) a classification of costs inflicted by poor quality data. A case study illustrates the usefulness of these propositions. Research limitations/implications: The paper provides definitions in relation to the costs of poor quality data and the data quality maintenance effort. Future research may build on these definitions. To further develop the contributions of the paper, more studies are needed. Practical implications: As illustrated by the case study, the definitions provided by this paper can be used for determining the right data maintenance effort and costs inflicted by poor quality data. In many companies, such insights may lead to significant savings. Originality/value: The paper provides a clarification of what are the costs of poor quality data and defines the relation to data quality maintenance effort. This represents an original contribution of value to future research and practice.
- Research Article
30
- 10.3926/jiem..v4n2.p168-193
- Jul 21, 2011
- Journal of Industrial Engineering and Management
Purpose: The technological developments have implied that companies store increasingly more data. However, data quality maintenance work is often neglected, and poor quality business data constitute a significant cost factor for many companies. This paper argues that perfect data quality should not be the goal, but instead the data quality should be improved to only a certain level. The paper focuses on how to identify the optimal data quality level. Design/methodology/approach: The paper starts with a review of data quality literature. On this basis, the paper proposes a definition of the optimal data maintenance effort and a classification of costs inflicted by poor quality data. These propositions are investigated by a case study. Findings: The paper proposes: (1) a definition of the optimal data maintenance effort and (2) a classification of costs inflicted by poor quality data. A case study illustrates the usefulness of these propositions. Research limitations/implications: The paper provides definitions in relation to the costs of poor quality data and the data quality maintenance effort. Future research may build on these definitions. To further develop the contributions of the paper, more studies are needed. Practical implications: As illustrated by the case study, the definitions provided by this paper can be used for determining the right data maintenance effort and costs inflicted by poor quality data. In many companies, such insights may lead to significant savings. Originality/value: The paper provides a clarification of what are the costs of poor quality data and defines the relation to data quality maintenance effort. This represents an original contribution of value to future research and practice.
- Research Article
- 10.1158/1538-7445.am2025-2489
- Apr 21, 2025
- Cancer Research
High-quality RNA is crucial for obtaining reliable RNA sequencing (RNA-seq) data, with metrics like RNA Integrity Number (RIN). However, these metrics, while effective for evaluating RNA integrity, do not always correlate with RNA-seq data quality, especially at the transcript level. This gap is particularly evident in total RNA-seq, where existing measures such as the coefficient of variation (CV) for read coverage fail to fully capture data quality and are influenced by confounding factors like read coverage depth. To address this, we developed a novel method to assess RNA-seq data quality by quantifying nonuniformity in read coverage while minimizing the influence of read coverage depth. In this study we invented new matric called windowCV (wCV) and applied to a diverse range of RNA-seq datasets, including fresh frozen (FF) and FFPE total RNA-seq data, as well as poly(A)-enriched mRNA-seq data. In mRNA-seq, our method captured 3' read coverage bias, a hallmark of RNA degradation, particularly in longer transcripts. For total RNA-seq data, we identified noisy coverage patterns associated with poor data quality, even in samples with high RIN values. By fitting regression lines between wCV and mean coverage depth (MCD) and calculating the area under the curve (wCVAUC), we refined the assessment to account for RNA quality variability. Using the TCGA pilot study and our own datasets, we demonstrated that wCVAUC reliably identified low quality RNA-seq data and highlighted its impact on downstream analyses, including gene expression quantification and clustering. Importantly, we observed that low RIN values do not always predict poor RNA-seq data quality, as some samples with RIN values below 7 exhibited high-quality RNA-seq data based on wCVAUC. It means that our analyses showed that wCVAUC effectively distinguished high-quality from low-quality samples, including cases where traditional metrics like RIN and CV were insufficient. Additionally, our investigation into the relationships between nonuniformity of read coverage, exon GC content, and RNA localization revealed that the transcript-level RNA-seq data quality of lncRNA genes in FFPE samples is influenced by low exon GC content and nuclear localization. In conclusion, our method provides robust, transcript-level metrics for assessing RNA-seq data quality across platforms, enabling more accurate identification of low-quality data and minimizing biases in downstream analyses. This approach offers a new standard for integrating RNA-seq data quality with sample variability, particularly for challenging datasets such as FFPE and total RNA-seq. Citation Format: Wonyoung Choi, Miyeon Yeon, Jay Lee, Hyo Young Choi, David Neil Hayes. The new approach for measuring nonuniformity of read coverages reveals the quality of RNA-seq data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 2489.
- Book Chapter
6
- 10.1016/b978-0-12-373717-5.00001-4
- Nov 18, 2010
- The Practitioner's Guide to Data Quality Improvement
Chapter 1 - Business Impacts of Poor Data Quality
- Research Article
2
- 10.1016/0895-7177(92)90048-p
- Mar 1, 1992
- Mathematical and Computer Modelling
Fitting straight lines to poor quality ( χ, y) data
- Conference Article
11
- 10.1109/ictai.2015.39
- Nov 1, 2015
Performing sentiment analysis of tweets by training a classifier is a challenging and complex task, requiring that the classifier can correctly and reliably identify the emotional polarity of a tweet. Poor data quality, due to class imbalance or mislabeled instances, may negatively impact classification performance. Ensemble learning techniques combine multiple models in an attempt to improve classification performance, especially on poor quality or imbalanced data, however, these techniques do not address the concern of high dimensionality present in tweets sentiment data and may require a prohibitive amount of resources to train on high dimensional data. This work addresses these issues by studying bagging and boosting combined with feature selection. These two techniques are denoted as Select-Bagging and Select-Boost, and seek to address both poor data quality and high dimensionality. We compare the performance of Select-Bagging and Select-Boost against feature selection alone. These techniques are tested with four base learners, two datasets and ten feature subset sizes. Our results show that Select-Boost offers the highest performance, is significantly better than using no ensemble technique, and is significantly better than Select-Bagging for most learners on both datasets. To the best of our knowledge, this is the first study to focus on the effects of using ensemble learning in combination with feature selection for the purpose of tweet sentiment classification.
- Research Article
- 10.1007/s11414-023-09875-y
- Dec 28, 2023
- The journal of behavioral health services & research
Child welfare decisions have life-impacting consequences which, often times, are underpinned by limited or inadequate data and poor quality. Though research of data quality has gained popularity and made advancements in various practical areas, it has not made significant inroads for child welfare fields or data systems. Poor data quality can hinder service decision-making, impacting child behavioral health and well-being as well as increasing unnecessary expenditure of time and resources. Poor data quality can also undermine the validity of research and slow policymaking processes. The purpose of this commentary is to summarize the data quality research base in other fields, describe obstacles and uniqueness to improve data quality in child welfare, and propose necessary steps to scientific research and practical implementation that enables researchers and practitioners to improve the quality of child welfare services based on the enhanced quality of data.
- Conference Article
- 10.1109/iscc-c.2013.48
- Dec 1, 2013
In order to solve the reconstructed image problem with poor image quality and long data scan time, 3D-EVDRS algorithm is proposed in this paper. By using 3D sampling trajectory in EVDRS algorithm, 3D-EVDRS algorithm can rationally solve the reconstructed image problem with poor image quality and long data scan time. For illustration, the medical diagnosis spine data is utilized to show the feasibility of the 3D-EVDRS algorithm in solving the reconstructed image problem with poor image quality and long data scan time. Experiments show that 3D-EVDRS algorithm will be used as an efficient algorithm in solving the reconstructed image problem with poor image quality and long data scan time. 3D-EVDRS algorithm can effectively solve the problem with image quality and imaging speed and thus a class of the reconstructed image problem with poor image quality and long data scan time are solved.
- Research Article
17
- 10.1145/2641575
- Mar 2, 2015
- Journal of Data and Information Quality
The data extracted from electronic archives is a valuable asset; however, the issue of the (poor) data quality should be addressed before performing data analysis and decision-making activities. Poor data quality is frequently cleansed using business rules derived from domain knowledge. Unfortunately, the process of designing and implementing cleansing activities based on business rules requires a relevant effort. In this article, we illustrate a model-based approach useful to perform inconsistency identification and corrective interventions, thus simplifying the process of developing cleansing activities. The article shows how the cleansing activities required to perform a sensitivity analysis can be easily developed using the proposed model-based approach. The sensitivity analysis provides insights on how the cleansing activities can affect the results of indicators computation. The approach has been successfully used on a database describing the working histories of an Italian area population. A model formalizing how data should evolve over time (i.e., a data consistency model) in such domain was created (by means of formal methods) and used to perform the cleansing and sensitivity analysis activities.
- Research Article
8
- 10.3390/electronics10172049
- Aug 25, 2021
- Electronics
Nowadays, the internet of things (IoT) is used to generate data in several application domains. A logistic regression, which is a standard machine learning algorithm with a wide application range, is built on such data. Nevertheless, building a powerful and effective logistic regression model requires large amounts of data. Thus, collaboration between multiple IoT participants has often been the go-to approach. However, privacy concerns and poor data quality are two challenges that threaten the success of such a setting. Several studies have proposed different methods to address the privacy concern but to the best of our knowledge, little attention has been paid towards addressing the poor data quality problems in the multi-party logistic regression model. Thus, in this study, we propose a multi-party privacy-preserving logistic regression framework with poor quality data filtering for IoT data contributors to address both problems. Specifically, we propose a new metric gradient similarity in a distributed setting that we employ to filter out parameters from data contributors with poor quality data. To solve the privacy challenge, we employ homomorphic encryption. Theoretical analysis and experimental evaluations using real-world datasets demonstrate that our proposed framework is privacy-preserving and robust against poor quality data.
- Research Article
8
- 10.1071/wf17048
- Jun 4, 2018
- International Journal of Wildland Fire
Intense fire is a key threatening process for the endangered Blue Mountains water skink, Eulamprus leuraensis. This species is restricted to isolated, densely vegetated and waterlogged peat swamps in montane south-eastern Australia. We surveyed 11 swamps (5 unburnt, 6 burnt) over 2 years, before and after the intense spring bushfires of 2013, to quantify the fires’ impacts on these skinks, other lizards and the habitat upon which they depend. Trapping revealed no direct effect of fire on E. leuraensis populations, with skinks persisting in all burnt swamps. Fire modified ground vegetation, virtually eliminating live plants and the dense understorey. Despite the conflagration, vegetation regrowth was rapid with swamp habitat largely recovering in just over 1 year post-fire. Fire thus had only a transitory effect on lizard habitat and a non-significant impact on E. leuraensis numbers. Nonetheless, broader-scale analyses suggest a different story: skinks were more abundant in swamps that had experienced a longer time since major fire. Although the ability of this endangered reptile to survive even intense wildfires is encouraging, fire during prolonged dry periods or an intensified fire regime might imperil skink populations.
- Research Article
97
- 10.1097/tp.0000000000001710
- Jul 1, 2017
- Transplantation
Cardiovascular events represent a major source of morbidity and mortality after liver transplantation and will likely increase given the aging population and nonalcoholic fatty liver disease as a leading indication for transplant. The optimal cardiovascular risk stratification approach in this evolving patient population remains unclear. The aims of this systematic review are to: (1) refine the definition, (2) characterize the incidence, and (3) identify risk factors for cardiovascular events post-liver transplantation. Additionally, we evaluated performance characteristics of different cardiac testing modalities. MEDLINE via PubMed, EMBASE, Web of Science, and Scopus were searched for studies published between 2002 and 2016 (model of end-stage liver disease era). Two authors independently reviewed articles to select eligible studies and performed data abstraction. Twenty-nine studies representing 57 493 patients from 26 unique cohorts were included. Definitions of cardiovascular outcomes were highly inconsistent. Incidence rates were widely variable: 1% to 41% for outcomes of 6 months or shorter and 0% to 31% for outcomes longer than 6 months. Multivariate analyses demonstrated that older age and history of cardiac disease were the most consistent predictors of cardiovascular events posttransplant (significant in 8/23 and 7/22, studies, respectively). Predictive capacity of various cardiac imaging modalities was also discrepant. The true incidence of cardiovascular outcomes post-liver transplant remains unknown in large part due to lack of consensus regarding outcome definition. Overall, poor data quality and gaps in knowledge limit the ability to clearly identify predictors of outcomes, but existing data support a more aggressive risk stratification protocol for patients of advanced age and/or with preexisting cardiac disease.
- Research Article
4
- 10.1093/jncimonographs/lgad032
- Nov 8, 2023
- Journal of the National Cancer Institute. Monographs
Despite significant progress in cancer research and treatment, a persistent knowledge gap exists in understanding and addressing cancer care disparities, particularly among populations that are marginalized. This knowledge deficit has led to a "data divide," where certain groups lack adequate representation in cancer-related data, hindering their access to personalized and data-driven cancer care. This divide disproportionately affects marginalized and minoritized communities such as the U.S. Black population. We explore the concept of "data deserts," wherein entire populations, often based on race, ethnicity, gender, disability, or geography, lack comprehensive and high-quality health data. Several factors contribute to data deserts, including underrepresentation in clinical trials, poor data quality, and limited access to digital technologies, particularly in rural and lower-socioeconomic communities.The consequences of data divides and data deserts are far-reaching, impeding equitable access to precision medicine and perpetuating health disparities. To bridge this divide, we highlight the role of the Cancer Intervention and Surveillance Modeling Network (CISNET), which employs population simulation modeling to quantify cancer care disparities, particularly among the U.S. Black population. We emphasize the importance of collecting quality data from various sources to improve model accuracy. CISNET's collaborative approach, utilizing multiple independent models, offers consistent results and identifies gaps in knowledge. It demonstrates the impact of systemic racism on cancer incidence and mortality, paving the way for evidence-based policies and interventions to eliminate health disparities. We suggest the potential use of voting districts/precincts as a unit of aggregation for future CISNET modeling, enabling targeted interventions and informed policy decisions.
- Research Article
5
- 10.1007/s13165-016-0147-5
- Jan 29, 2016
- Organic Agriculture
Members of the organic supply chain need high-quality data to make correct investment decisions, but data with sufficient depth and quality are not widely available in Europe. The quality of available data is a key concern for both data collectors and data users. The aim of this study is to identify whether the commonly used quality attributes (accuracy, coherence, comparability, timeliness, punctuality, accessibility, relevance), which have been developed from the perspective of data collectors, are also appropriate from the perspective of end users of organic market data. A further aim is to assess whether the data quality needs of end users are being met by the existing data. The results of two surveys carried out in Europe, one of data collectors and one of end users, are presented. Sales data at retail level (values and volumes) are used as an illustrative example and the perceptions of end users are compared with the reported data collection approaches, quality checks and availability of data. Correlation analysis and principal component analysis were used to investigate the relationship between users’ perceptions of the data quality attributes and their overall perceptions of data quality. The findings suggest that data quality checks do help to improve the quality of data as perceived by end users but that people will use whatever data they can get, even if it has poor quality. This could have potentially negative consequences, such as a lack of confidence in the organic market, if important decisions are based on poor quality data. The analysis also suggests that the commonly used attributes represent two dimensions of data quality: ‘fitness for use’ which encompasses accuracy, relevance, comparability and punctuality; and ‘convenience’, which encompasses affordability, comparability, timeliness and accessibility. The attribute of comparability belongs to both dimensions as it contributes to both fitness for use and convenience. Data collectors wishing to improve the quality of their data should focus on enhancing fitness for use first and then on the convenience of their data for users.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.