QUALITY ASSESSMENT OF SINGLE-PHOTON LIDAR DATA
This publication assesses the quality of airborne laser scanning using Single-photon LiDAR technology for Pamplona, located in Navarre, northern Spain. The paper uses reference data sets acquired with traditional (linear) scanners to compare the new single-photon method with the multi-photon technology that has been known on the market for many years. The main issue of the research was to evaluate the quality of the new single-photon scanner method concerning the accuracy and parameters obtained by scanners representing the older and better-known multi-photon method. The single-photon scanner data set was subjected to a detailed and thorough evaluation of the accuracy and quality of the data in terms of vegetation penetration, bathymetric measurement capabilities, and the quality of the height models created compared with the reference data. The research made it possible to draw several conclusions, which led to the determination of the final position of the studied single-photon data, along with demonstrating their strengths and advantages. The analyses also made it possible to indicate the range of applications in which typical linear (multi-photon) scanners still have their advantage.
- Research Article
2
- 10.1111/jpn.13603
- Jul 14, 2021
- Journal of animal physiology and animal nutrition
The genomic breed composition (GBC) reflects the genetic relationship between individual animal and ancestor breeds in composite or hybrid breeds. Also, it can estimate the genomic contribution of each breed (ancestor) to the genome of each individual animal. Using genomic SNP information to estimate Ningxiang pig GBC is of great significance. First of all, GBC was widely used in cattle and had significant effects, but there is almost no using experience in Chinese endemic pig breeds. Importantly, High-density SNPs are expensive but can be economized by deploying a relatively small number of highly informative SNP scattered evenly across the genome. Moreover, the impact of low-density SNPs selection strategy on estimating the GBC of individual animals has not been fully explained. Using SNP data from different databases and organizations, we established reference (N=2015) and verification (N=302) data sets. Twelve successively smaller SNP panels (500, 1K, 5K, 10K) were built from those SNP in the reference data by three selection methods (uniform, maximized the Euclidean distance (MED) and random distribution method). For each panel, the GBC of Ningxiang pigs in the reference dataset was estimated. Then combining Shannon entropy and the GBC results, the optimal panel (the 10K SNP panel constructed by MED method) was picked out to estimate the GBC of verification Ningxiang pig, which detected that 230 individuals were purebred Ningxiang pigs and the remaining 72 impure individuals contained 6.44% blood related with Rongchang pigs and 4.09% with Bamaxiang pigs in the verification Ningxiang population. Finally, the genetic structure analysis of verification population was performed combining with the results of GBC, multi-dimensional scaling (MDS) analysis and hierarchical cluster analysis. These results showed: (a) GBC could accurately identify purebred Ningxiang pigs and, scientifically, calculate the genomic contribution of each breed of each hybrid animal. (b) GBC could carry out population genetic structure and understand the genetic background of Ningxiang pigs. Such findings highlight a variety of opportunities to better protect and identify other endangered local breeds in China facing the same situation as Ningxiang pig and provide more accurate, economical and efficient new technical support in GBC estimation breeding work.
- Research Article
1
- 10.56553/popets-2024-0031
- Jan 1, 2024
- Proceedings on Privacy Enhancing Technologies
Within the realm of privacy-preserving machine learning, empirical privacy defenses have been proposed as a solution to achieve satisfactory levels of training data privacy without a significant drop in model utility. Most existing defenses against membership inference attacks assume access to reference data, defined as an additional dataset coming from the same (or a similar) underlying distribution as training data. Despite the common use of reference data, previous works are notably reticent about defining and evaluating reference data privacy. As gains in model utility and/or training data privacy may come at the expense of reference data privacy, it is essential that all three aspects are duly considered. In this paper, we conduct the first comprehensive analysis of empirical privacy defenses. First, we examine the availability of reference data and its privacy treatment in previous works and demonstrate its necessity for fairly comparing defenses. Second, we propose a baseline defense that enables the utility-privacy tradeoff with respect to both training and reference data to be easily understood. Our method is formulated as an empirical risk minimization with a constraint on the generalization error, which, in practice, can be evaluated as a weighted empirical risk minimization (WERM) over the training and reference datasets. Although we conceived of WERM as a simple baseline, our experiments show that, surprisingly, it outperforms the most well-studied and current state-of-the-art empirical privacy defenses using reference data for nearly all relative privacy levels of reference and training data. Our investigation also reveals that these existing methods are unable to trade off reference data privacy for model utility and/or training data privacy, and thus fail to operate outside of the high reference data privacy case. Overall, our work highlights the need for a proper evaluation of the triad model utility / training data privacy / reference data privacy when comparing privacy defenses.
- Book Chapter
13
- 10.1079/9780851994499.0127
- Jan 1, 2000
A model describing amino acid (AA) metabolism by the mammary glands of the lactating cow has been constructed (Hanigan et al., 2000a). Milk protein production was predicted using a mathematical procedure to determine which among histidine (His), lysine (Lys), methionine (Met), threonine (Thr) and tyrosine plus phenylalanine (TP) was most limiting for milk protein synthesis. The minimum protein synthetic flux determined the overall rate of protein synthesis. The ability of the model to predict substrate removal and milk protein output was assessed, using the parameterization data (reference data) and an independent data set assembled from the literature (literature data). When the reference data were simulated, the model generally fitted the uptake data well. However, the model predicted milk protein yields poorly. Of the four experiments contained in the reference data set, only one experiment (C6) contained complete data for all driving AA. The model accounted for 53% of the observed variation in milk protein yields for C6, suggesting that inadequate data were the cause of inaccurate simulations for the remaining experiments in the reference data set. The model explained 43% of the observed variation in milk protein yields when the literature data set was simulated. Adoption of an alternative representation of milk protein synthesis, wherein all five driving AA affected milk protein synthesis simultaneously in a linear additive manner, resulted in a reduction in the accuracy of predictions of milk protein yields when C6 or the literature data set was simulated. Use of a Michaelis-Menten equation form to describe milk protein synthesis resulted in slight improvements in accuracy when the C6 data set was simulated and a reduction in accuracy when the literature data set was simulated. After fitting sensitivity coefficients for a modified Michaelis-Menten equation to the literature data, the model described 60% of the observed variation in milk protein output. Attempts to derive sensitivity coefficients for the linear additive equation were unsuccessful, due to model instability caused by the equation. Based on the results herein, a modified version of the Michaelis-Menten equation appeared to represent the effects of essential AA effects on milk protein synthesis better than an equation considering a single limiting AA.
- Preprint Article
1
- 10.5194/egusphere-egu21-10871
- Mar 4, 2021
<div> <p>Geo-Wiki is an online platform for involving citizens in the visual interpretation of very high-resolution satellite imagery to collect reference data on land cover and land use. Instead of being an ongoing citizen science project, short intensive campaigns are organized in which citizens participate. The advantage of this approach is that large amounts of data are collected in a short amount of time with a clearly defined data collection target to reach. Participants can also schedule their time accordingly, with their past feedback indicating that this intensive approach was preferred. The reference data are then used in further scientific research to answer a range of questions such as: How much of the land’s surface is wild or impacted by humans?  What is the size of agricultural fields globally? The campaigns are organized as competitions with prizes that include Amazon vouchers and co-authorship on a scientific publication. The scientific publication is the mechanism by which the data are openly shared so that other researchers can use this reference data set in other applications. The publication is usually in the form of a data paper, which explains the campaign in detail along with the data set collected. The data are uploaded to a repository such as Pangaea, ZENODO or IIASA’s own data repository, DARE.  This approach from data collection, to opening up the data, to documentation via a scientific data paper also ensures transparency in the data collection process. There have been several Geo-Wiki citizen science campaigns that have been run over the last decade. Here we provide examples of experiences from five recent campaigns: (i) the Global Cropland mapping campaign to build a cropland validation data set; (ii) the Global Field Size campaign to characterize the size of agricultural fields around the world; (iii) the Human Impact on Forests campaign to produce the first global map of forest management; (iv) the Global Built-up Surface Validation campaign to collect data on built-up surfaces for validation of global built-up products such as the Global Human Settlement Layer (https://ghsl.jrc.ec.europa.eu/); and (v) the Drivers of Tropical Forest Loss campaign, which collected data on the main causes of deforestation in the tropics. In addition to outlining the campaign, the data sets collected and the sharing of the data online, we provide lessons learned from these campaigns, which have built upon experiences collected over the last decade. These include insights related to the quality and consistency of the classifications of the volunteers including different volunteer behaviors; best practices in creating control points for use in the gamification and quality assurance of the campaigns; different methods for training the volunteers in visual interpretation; difficulties in the interpretation of some features, which may need expert input instead as well as the inability of some features to be recognized from satellite imagery; and limitations in the approach regarding change detection due to temporal availability of open satellite imagery, among several others. </p> </div>
- Research Article
- 10.1158/1538-7445.am2024-80
- Mar 22, 2024
- Cancer Research
FDA anticipates that immune cell receptor profiling will become crucial to the Agency’s evaluation of the efficacy and safety of emerging cancer immunotherapies and immunomodulatory drugs by advancing precision medicine and providing orthogonal, functional evidence. High-throughput sequencing of B-cell receptor sequencing (BCR) gene rearrangements and downstream analysis empowers researchers to define the B-cell clonal landscape, as well as identify biomarkers for minimal residual disease (MRD). Although this field has made vast strides over the last decade, it lacks standardized assay controls, adequate sensitivity and specificity in gene mapping/alignment, and appropriately qualified reference data sets, materials, and validation methods. Moreover, detailed comparisons of wet-bench analytical methods or bioinformatic procedures have not appeared. The FDA led B-cell receptor sequencing quality control (BCR-SEQC) consortium is establishing protocols for the development of reference materials and data sets for the evaluation of NGS-based BCR repertoire reconstruction. The consortium is also developing materials for performance bench-marking studies. To this end, we performed whole-genome sequencing, RNAseq, and BCR-seq on 50 B-cell lines to determine the clonotype, full-length transcripts, and expression abundance of BCR genes in each cell line. Two batches of reference materials (DNA, RNA, and cells) were generated from each of nine cell lines that were determined unambiguously to be monoclonal in BCR expression of both heavy and light chains. These materials were distributed to ten companies to test up to 15 BCR-seq products. Our benchmarking studies include six widely-used sequencing technologies. The overarching objectives of this research were to elucidate current capabilities and limitations, address fundamental technical needs, provide reference materials and data sets, and establish actionable best practices for reconstructing B cell receptor repertoires from NGS data. Citation Format: Wenming Xiao, BCR-SEQC consortium. Toward best practices for B-cell receptor repertoire profiling [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 80.
- Research Article
10
- 10.1007/s00198-005-1901-9
- Jun 4, 2005
- Osteoporosis International
Manufacturers of bone densitometry devices have been moving from manufacturer-specific reference values to data derived from larger population-based cohorts such as the National Health and Nutrition Evaluation Survey (NHANES) III. One bone densitometer manufacturer has released software that provides hip subregion T-score calculations based upon four slightly different versions of hip reference data. Our aim was to determine how changes in hip reference data affect diagnostic classification based on minimum T-scores in older women. We extracted results for lumbar spine and hip bone density measurements from the Manitoba Bone Density database for women aged 50 years or older who had baseline scans on the manufacturer's equipment (n=17,053). T-scores were calculated using manufacturer-specific non-NHANES data and three software implementations of NHANES reference data. One software version gave results at subregions of the hip that were significantly lower than with the three other sets of reference data from the same manufacturer (mean femoral neck T-score absolute difference 0.23-0.48, P<0.00001; mean trochanter T-score absolute difference 0.49-0.70, P<0.00001). As a result the proportion of measurements with a T-score of -2.5 or lower almost doubled at the femoral neck (14.3 versus 27.7%, P<0.00001) and approximately tripled at the trochanter (8.1 versus 24.0%, P<0.00001). The final patient classification of osteoporosis based on a minimum T-score of -2.5 or lower from all four measured sites differed significantly between the four versions (absolute difference 7.9 to 10.4%, P<0.00001). Small changes in the reference data used in T-score calculations had large effects on patient categorization and the calculated prevalence of osteoporosis. The impact of changes in reference data need to be carefully evaluated by users and manufacturers before widespread clinical dissemination.
- Research Article
- 10.1096/fasebj.29.1_supplement.381.5
- Apr 1, 2015
- The FASEB Journal
In the diverse US Affiliated Pacific Centers for Disease Control (CDC) 2000 and 2006 World Health Organization (WHO) reference data are available to assess child growth. WHO reference data includes children who met recommended breastfeeding guidelines from six countries, while CDC data describes growth of US children where few were breastfed. Baseline data from the CHL program in Hawai'i (n=941) was used to calculate BMI z‐scores and percentiles to compare growth relative to the two reference data sets and identify the role of breastfeeding history, age, sex, and ethnicity on growth assessment. Mean BMI z‐scores calculated with CDC reference data were lower than WHO (z‐score difference=‐0.31, p<0.001). A general linear model on BMI z differences against child age, sex, ethnicity and breastfeeding history showed the difference by CDC reference was greater in boys than girls (z score difference = 0.20, p<0.001). No difference was found by breastfeeding history, age or ethnicity. Kappa statistics showed strong agreement between the two references on CDC defined BMI categories (weighted Kappa=0.50, p<0.001), with percentage of agreement highest for healthy weight (92%), followed by obese (85%), then overweight (76%). CDC and WHO growth reference data samples differ by breastfeeding history and ethnicity; differences between BMI z‐scores calculated from the two reference data sets were not related to either factor. Researchers and practitioners can choose either reference data to analyze BMI and child growth in NHPI children but should be cautious that CDC growth references may underestimate childhood overweight and obesity, and delay early intervention.
- Research Article
12
- 10.1016/j.palaeo.2011.10.006
- Dec 1, 2011
- Palaeogeography, Palaeoclimatology, Palaeoecology
Precise timing of the Upper Taghanic Biocrisis, Geneseo Bioevent, in the Middle–Upper Givetian (Middle Devonian) boundary in Northern Spain using biostratigraphic and magnetic susceptibility data sets
- Research Article
2
- 10.1088/2632-2153/abe663
- Apr 22, 2021
- Machine Learning: Science and Technology
In recent years the development of machine learning potentials (MLPs) has become a very active field of research. Numerous approaches have been proposed, which allow one to perform extended simulations of large systems at a small fraction of the computational costs of electronic structure calculations. The key to the success of modern MLPs is the close-to first principles quality description of the atomic interactions. This accuracy is reached by using very flexible functional forms in combination with high-level reference data from electronic structure calculations. These data sets can include up to hundreds of thousands of structures covering millions of atomic environments to ensure that all relevant features of the potential energy surface are well represented. The handling of such large data sets is nowadays becoming one of the main challenges in the construction of MLPs. In this paper we present a method, the bin-and-hash (BAH) algorithm, to overcome this problem by enabling the efficient identification and comparison of large numbers of multidimensional vectors. Such vectors emerge in multiple contexts in the construction of MLPs. Examples are the comparison of local atomic environments to identify and avoid unnecessary redundant information in the reference data sets that is costly in terms of both the electronic structure calculations as well as the training process, the assessment of the quality of the descriptors used as structural fingerprints in many types of MLPs, and the detection of possibly unreliable data points. The BAH algorithm is illustrated for the example of high-dimensional neural network potentials using atom-centered symmetry functions for the geometrical description of the atomic environments, but the method is general and can be combined with any current type of MLP.
- Research Article
4
- 10.3389/fgene.2021.584355
- Mar 23, 2021
- Frontiers in Genetics
Several studies have evaluated computational methods that infer the haplotypes from population genotype data in European cattle populations. However, little is known about how well they perform in African indigenous and crossbred populations. This study investigates: (1) global and local ancestry inference; (2) heterozygosity proportion estimation; and (3) genotype imputation in West African indigenous and crossbred cattle populations. Principal component analysis (PCA), ADMIXTURE, and LAMP-LD were used to analyse a medium-density single nucleotide polymorphism (SNP) dataset from Senegalese crossbred cattle. Reference SNP data of East and West African indigenous and crossbred cattle populations were used to investigate the accuracy of imputation from low to medium-density and from medium to high-density SNP datasets using Minimac v3. The first two principal components differentiated Bos indicus from European Bos taurus and African Bos taurus from other breeds. Irrespective of assuming two or three ancestral breeds for the Senegalese crossbreds, breed proportion estimates from ADMIXTURE and LAMP-LD showed a high correlation (r ≥ 0.981). The observed ancestral origin heterozygosity proportion in putative F1 crosses was close to the expected value of 1.0, and clearly differentiated F1 from all other crosses. The imputation accuracies (estimated as correlation) between imputed and the real data in crossbred animals ranged from 0.142 to 0.717 when imputing from low to medium-density, and from 0.478 to 0.899 for imputation from medium to high-density. The imputation accuracy was generally higher when the reference data came from the same geographical region as the target population, and when crossbred reference data was used to impute crossbred genotypes. The lowest imputation accuracies were observed for indigenous breed genotypes. This study shows that ancestral origin heterozygosity can be estimated with high accuracy and will be far superior to the use of observed individual heterozygosity for estimating heterosis in African crossbred populations. It was not possible to achieve high imputation accuracy in West African crossbred or indigenous populations based on reference data sets from East Africa, and population-specific genotyping with high-density SNP assays is required to improve imputation.
- Research Article
32
- 10.1016/0169-7439(93)80080-2
- May 1, 1993
- Chemometrics and Intelligent Laboratory Systems
Reference data sets for chemometrical methods testing
- Book Chapter
3
- 10.1007/978-3-030-90998-7_6
- Jan 1, 2022
We present an approach that is widely used in the field of remote sensing for the validation of single LUC maps. Unlike other chapters in this book, where maps are validated by comparison with other maps with better resolution and/or quality, this approach requires a ground sample dataset, i.e. a set of sites where LUC can be observed in the field or interpreted from high-resolution imagery. Map error is assessed using techniques based on statistical sampling. In general terms, in this approach, the accuracy of single LUC maps is assessed by comparing the thematic map against the reference data and measuring the agreement between the two. When assessing thematic accuracy, three stages can be identified: the design of the sample, the design of the response, and the estimation and analysis protocols. Sample design refers to the protocols used to define the characteristics of the sampling sites, including sample size and distribution, which can be random or systematic. Response design involves establishing the characteristics of the reference data, such as the size of the spatial assessment units, the sources from which the reference data will be obtained, and the criteria for assigning labels to spatial units. Finally, the estimation and analysis protocols include the procedures applied to the reference data to calculate accuracy indices, such as user’s and producer’s accuracy, the estimated areas covered by each category and their respective confidence intervals. This chapter has two sections in which we present a couple of exercises relating to sampling and response design; the sample size will be calculated, the distribution of sampling sites will be obtained using a stratified random scheme, and finally, a set of reference data will be obtained by photointerpretation at the sampling sites (spatial units). The accuracy statistics will be calculated later in Sect. 5 in chapter “Metrics Based on a Cross-Tabulation Matrix to Validate Land Use Cover Maps” as part of the cross-tabulation exercises. The exercises in this chapter use fine-scale LUC maps obtained for the municipality of Marqués de Comillas in Chiapas, Mexico.
- Dissertation
2
- 10.11588/heidok.00016193
- Jan 1, 2013
This thesis investigates methods for the creation of reference datasets for image processing, especially for the dense correspondence problem. Three types of reference data can be identified: Real datasets with dense ground truth, real datasets with sparse or missing ground truth and synthetic datasets. For the creation of real datasets with ground truth a existing method based on depth map fusion was evaluated. The described method is especially suited for creating large amounts of reference data with known accuracy. The creation of reference datasets with missing ground truth was examined on the example of multiple datasets for the automotive industry. The data was used succesfully for verification and evaluation by multiple image processing projects. Finally, it was investigated how methods from computer graphics can be used for creating synthetic reference datasets. Especially the creation of photorealistic image sequences using global illumination has been examined for the task of evaluating algorithms. The results show that while such sequences can be used for evaluation, their creation is hindered by practicallity problems. As an application example, a new simulation method for Time-of-Flight depth cameras which can simulate all relevant error sources of these systems was developed.
- Research Article
16
- 10.1118/1.3603198
- Jun 30, 2011
- Medical Physics
In computed tomography (CT), metal objects in the region of interest introduce data inconsistencies during acquisition. Reconstructing these data results in an image with star shaped artifacts induced by the metal inconsistencies. To enhance image quality, the influence of the metal objects can be reduced by different metal artifact reduction (MAR) strategies. For an adequate evaluation of new MAR approaches a ground truth reference data set is needed. In technical evaluations, where phantoms can be measured with and without metal inserts, ground truth data can easily be obtained by a second reference data acquisition. Obviously, this is not possible for clinical data. Here, an alternative evaluation method is presented without the need of an additionally acquired reference data set. The proposed metric is based on an inherent ground truth for metal artifacts as well as MAR methods comparison, where no reference information in terms of a second acquisition is needed. The method is based on the forward projection of a reconstructed image, which is compared to the actually measured projection data. The new evaluation technique is performed on phantom and on clinical CT data with and without MAR. The metric results are then compared with methods using a reference data set as well as an expert-based classification. It is shown that the new approach is an adequate quantification technique for artifact strength in reconstructed metal or MAR CT images. The presented method works solely on the original projection data itself, which yields some advantages compared to distance measures in image domain using two data sets. Beside this, no parameters have to be manually chosen. The new metric is a useful evaluation alternative when no reference data are available.
- Research Article
27
- 10.3389/fmars.2021.643381
- Mar 25, 2021
- Frontiers in Marine Science
Our ability to completely and repeatedly map natural environments at a global scale have increased significantly over the past decade. These advances are from delivery of a range of on-line global satellite image archives and global-scale processing capabilities, along with improved spatial and temporal resolution satellite imagery. The ability to accurately train and validate these global scale-mapping programs from what we will call “reference data sets” is challenging due to a lack of coordinated financial and personnel resourcing, and standardized methods to collate reference datasets at global spatial extents. Here, we present an expert-driven approach for generating training and validation data on a global scale, with the view to mapping the world’s coral reefs. Global reefs were first stratified into approximate biogeographic regions, then per region reference data sets were compiled that include existing point data or maps at various levels of accuracy. These reference data sets were compiled from new field surveys, literature review of published surveys, and from individually sourced contributions from the coral reef monitoring and management agencies. Reference data were overlaid on high spatial resolution satellite image mosaics (3.7 m × 3.7 m pixels; Planet Dove) for each region. Additionally, thirty to forty satellite image tiles; 20 km × 20 km) were selected for which reference data and/or expert knowledge was available and which covered a representative range of habitats. The satellite image tiles were segmented into interpretable groups of pixels which were manually labeled with a mapping category via expert interpretation. The labeled segments were used to generate points to train the mapping models, and to validate or assess accuracy. The workflow for desktop reference data creation that we present expands and up-scales traditional approaches of expert-driven interpretation for both manual habitat mapping and map training/validation. We apply the reference data creation methods in the context of global coral reef mapping, though our approach is broadly applicable to any environment. Transparent processes for training and validation are critical for usability as big data provide more opportunities for managers and scientists to use global mapping products for science and conservation of vulnerable and rapidly changing ecosystems.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.