QPredict: Using low quality volunteered geospatial data to evaluate high quality authority data

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

High-quality, typically administrative, geospatial data should adhere to established measurement and representation practices and be protected from malicious attacks. However, this kind of geospatial data may only be infrequently updated due to its often prolonged production process compared to a data source of volunteered geographic information such as OpenStreetMap. Existing approaches typically try to quality-assure geospatial data by comparing it to another reference dataset of perceived higher quality - often another administrative dataset facing a similar update cycle. In contrast, this article tries to determine whether actual changes present in volunteered geographic information data such as OpenStreetMap, which also need to be applied in an administrative dataset (i.e., consists of actual changes in the real world), can be identified automatically. To that end, we present QPredict, a machine learning approach observing changes in volunteered geospatial data such as OpenStreetMap to predict issues with a target (administrative) dataset. The algorithm is trained by exploiting geospatial object characteristics, intrinsic and extrinsic quality metrics and their respective changes over time. We evaluate the effectiveness of our approach on two datasets representing two mid-size cities in Germany and discuss our findings in terms of their applicability in use cases.

Similar Papers
  • Research Article
  • Cite Count Icon 54
  • 10.1016/j.amepre.2009.12.015
Administrative Data Sets and Health Services Research on Hemoglobinopathies: A Review of the Literature
  • Mar 20, 2010
  • American Journal of Preventive Medicine
  • Scott D Grosse + 3 more

Administrative Data Sets and Health Services Research on Hemoglobinopathies: A Review of the Literature

  • Research Article
  • Cite Count Icon 14
  • 10.1016/s0897-1897(03)00040-5
The use of large administrative data sets in nursing research
  • Aug 1, 2003
  • Applied Nursing Research
  • Arlene Merne Smaldone + 1 more

The use of large administrative data sets in nursing research

  • Research Article
  • Cite Count Icon 127
  • 10.1177/1062860609339936
Assessing Surgical Quality Using Administrative and Clinical Data Sets: A Direct Comparison of the University HealthSystem Consortium Clinical Database and the National Surgical Quality Improvement Program Data Set
  • Jul 7, 2009
  • American Journal of Medical Quality
  • Daniel L Davenport + 2 more

The use of "clinical" versus "administrative" data sets for health care quality assessment continues to be debated. This study directly compares the University HealthSystem Consortium Clinical Database (UHC CDB) and the National Surgical Quality Improvement Program (NSQIP) in terms of their assessment of complications and death for 26 322 surgery patients using analyses of variance, correlation, and multivariable logistic regression. The NSQIP had more variables with significant correlation with outcomes. The NSQIP was better at predicting death (c-index 0.94 vs 0.90, P < .05) and complications (c-index 0.78 vs 0.76, P = .07), especially for higher risk patients. The UHC CDB missed and misclassified several major complications. The data sets are similar in their explanatory power relative to outcomes, but the clinical data set is better, particularly at identifying higher risk patients and specific complications. It should prove more useful for initiating and monitoring clinical process improvements because of more clinically relevant variables.

  • Abstract
  • 10.14309/01.ajg.0000798768.43978.95
P042 Early Versus Later Use of Vedolizumab In IBD: Patient Characteristics And Treatment Patterns In The Real World (RALEE).
  • Dec 1, 2021
  • American Journal of Gastroenterology
  • Maja Kuharic + 7 more

Pivotal trials in inflammatory bowel disease (IBD) demonstrate that earlier use of biologics is associated with greater likelihood of response/remission, but multiple studies have identified that in the real world, biologic treatment is often delayed, thereby limiting optimal effectiveness and increasing likelihood of adverse outcomes. Further assessment of patient, provider, and payor factors that contribute to therapy choice is needed. We assessed utilization of vedolizumab (VDZ) and performed a real-world assessment using administrative datasets. Here, we describe the different treatment patterns and demographics of patients who received VDZ. We identified VDZ-treated patients (aged ≥18 years) with Crohn's disease (CD) or ulcerative colitis (UC) in the MarketScan commercial and Medicare claims databases from 2017 to 2019 and included those who had continuous enrollment in the same health plan for ≥12 months prior to their initial IBD diagnostic claim, ≥1 VDZ claim after the initial IBD diagnosis, and continuous enrollment for ≥12 months prior to and after their initial UC or CD diagnosis. Patients exposed to VDZ, anti-TNF, or other biologic therapy in the 12-month pre-index period were excluded. We pre-defined 5 treatment pathways: (1) EARLY VDZ - VDZ within 30 days of first IBD diagnostic claim; (2) DELAYED VDZ 1 - immunomodulators and then switch to VDZ; (3) DELAYED VDZ 2 - corticosteroids with immunomodulators prior to VDZ; (4) DELAYED VDZ 3 - 5-ASA with corticosteroids prior to VDZ; or (5) DELAYED VDZ 4 - 5-ASA with corticosteroids and immunomodulators prior to VDZ. Differences in patient baseline characteristics among these treatment pathways were analyzed descriptively. We identified 136,315 patients with UC and 103,591 with CD, from which 1,342 patients with UC (median age 43 years; 51.0% male; 96.4% commercially insured; 86.4% diagnosed in 2017) and 964 with CD (median age 45 years; 43.6% male; 94.6% commercially insured; 88.6% diagnosed in 2017) received VDZ and met criteria. The proportions of patients by treatment pathway were (UC|CD): EARLY VDZ (6.6%|9.6%); DELAYED VDZ 1 (7.5%|19.0%); DELAYED VDZ 2 (14.8%|36.8%); DELAYED VDZ 3 (37.6%|19.0%); DELAYED VDZ 4 (33.4%|15.6%). Among patients with UC, EARLY VDZ vs DELAYED VDZ cohorts had median age of 40 vs 44 years and proportion of men of 46.1% vs 51.4%. Among patients with CD, EARLY VDZ vs DELAYED VDZ had median age of 43 vs 45 years and proportion of men of 39.8%% vs 43.9%. For both indications, no meaningful differences among treatment groups by geographic region, payor type (i.e., commercial vs Medicare), and year of diagnosis were observed. In this administrative real-world dataset, fewer than 10% of patients with IBD were treated with VDZ within 30 days of diagnosis, and these patients were more likely to be younger and women. These findings are distinct from guidelines suggesting VDZ may be used earlier, or due to its safety profile, preferentially in older patients at higher risk for infection. Further analyses of safety and effectiveness outcomes are underway.

  • Research Article
  • Cite Count Icon 4
  • 10.1002/hep.28500
Quality of care metrics in chronic hepatitis B.
  • Apr 1, 2016
  • Hepatology
  • Jessica Mellinger + 1 more

Quality of care metrics in chronic hepatitis B.

  • Research Article
  • Cite Count Icon 6
  • 10.5555/3019046.3019052
Klimatic: a virtual data lake for harvesting and distribution of geospatial data
  • Nov 13, 2016
  • Tyler J Skluzacek + 2 more

Many interesting geospatial datasets are publicly accessible on web sites and other online repositories. However, the sheer number of datasets and locations, plus a lack of support for cross-repository search, makes it difficult for researchers to discover and integrate relevant data. We describe here early results from a system, Klimatic, that aims to overcome these barriers to discovery and use by automating the tasks of crawling, indexing, integrating, and distributing geospatial data. Klimatic implements a scalable crawling and processing architecture that uses an elastic container-based model to locate and retrieve relevant datasets and to extract metadata from headers and within files to build a global index of known geospatial data. In so doing, we create an expansive geospatial virtual data lake that records the location, formats, and other characteristics of large numbers of geospatial datasets while also caching popular data subsets for rapid access. A flexible query interface allows users to request data that satisfy supplied type, spatial, temporal, and provider specifications; in processing such queries, the system uses interpolation and aggregation to combine data of different types, data formats, resolutions, and bounds. Klimatic has so far incorporated more than 10,000 datasets from over 120 sources and has been demonstrated to scale well with data size and query complexity.

  • Research Article
  • 10.6100/ir597646
Remote plasma deposition of textured zinc oxide with focus on thin film solar cell applications : material properties, plasma processes and film growth
  • Nov 18, 2015
  • R Groenen

Remote plasma deposition of textured zinc oxide with focus on thin film solar cell applications : material properties, plasma processes and film growth

  • Research Article
  • Cite Count Icon 22
  • 10.1177/2150135114534274
The Impact of Differential Case Ascertainment in Clinical Registry Versus Administrative Data on Assessment of Resource Utilization in Pediatric Heart Surgery.
  • Jun 23, 2014
  • World Journal for Pediatric and Congenital Heart Surgery
  • David W Jantzen + 11 more

Resource utilization in congenital heart surgery is typically assessed using administrative data sets. Recent analyses have called into question the accuracy of coding of cases in administrative data; however, it is unclear whether miscoding impacts assessment of associated resource use. We merged data coded within both an administrative data set and clinical registry on children undergoing heart surgery (2004-2010) at 33 hospitals. The impact of differences in coding of operations between data sets on reporting of postoperative length of stay (PLOS) and total hospital costs associated with these operations was assessed. For each of the eight operations of varying complexity evaluated (total n = 57,797), there were differences in coding between data sets, which translated into differences in the reporting of associated resource utilization for the cases coded in either data set. There were statistically significant differences in PLOS and cost for seven of the eight operations, although most PLOS differences were relatively small with the exception of the Norwood operation and truncus repair (differences of two days, P < .001). For cost, there was a >5% difference for three of the eight operations and >10% difference for truncus repair (US$10,570; P < .01). Grouping of operations into categories of similar risk appeared to mitigate many of these differences. Differences in coding of cases in administrative versus clinical registry data can translate into differences in assessment of associated PLOS and cost for certain operations. This may be minimized through evaluating larger groups of operations when using administrative data or using clinical registry data to accurately identify operations of interest.

  • Research Article
  • Cite Count Icon 362
  • 10.1016/j.bdr.2015.01.003
Geospatial Big Data: Challenges and Opportunities
  • Jun 1, 2015
  • Big Data Research
  • Jae-Gil Lee + 1 more

Geospatial Big Data: Challenges and Opportunities

  • Research Article
  • Cite Count Icon 13
  • 10.1109/jstars.2014.2315593
Enhancing Agricultural Geospatial Data Dissemination and Applications Using Geospatial Web Services
  • Nov 1, 2014
  • IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
  • Weiguo Han + 4 more

There are many important publicly available agricultural geospatial data products for the agriculture-related research, applications, and educational outreach programs. The traditional data distribution method cannot fully meet users' on-demand geospatial data needs. This paper presents interoperable, standard-compliant Web services developed for geospatial data access, query, retrieval, statistics, mapping, and comparison. Those standard geospatial Web services can be integrated in scientific workflows to accomplish specific tasks or consumed over the Web to create value-added new geospatial application by users. In addition, this paper demonstrates, via real world use cases, applications of those services and potential impacts on facilitating geospatial Cropland Data Layer (CDL) retrieval, analysis, visualization, dissemination and integration in agricultural industry, government, research, and educational communities. This paper also shows that the geospatial Web service approach helps improve the reusability, interoperability, dissemination, and utilization of agricultural geospatial data. It allows for integrating multiple online applications and different geospatial data sources, and enables automated retrieving and delivery of agricultural geospatial information for decision-making support.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.5194/isprsarchives-xl-3-w3-543-2015
RASTER DATA PARTITIONING FOR SUPPORTING DISTRIBUTED GIS PROCESSING
  • Aug 20, 2015
  • The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
  • B Nguyen Thai + 1 more

Abstract. In the geospatial sector big data concept also has already impact. Several studies facing originally computer science techniques applied in GIS processing of huge amount of geospatial data. In other research studies geospatial data is considered as it were always been big data (Lee and Kang, 2015). Nevertheless, we can prove data acquisition methods have been improved substantially not only the amount, but the resolution of raw data in spectral, spatial and temporal aspects as well. A significant portion of big data is geospatial data, and the size of such data is growing rapidly at least by 20% every year (Dasgupta, 2013). The produced increasing volume of raw data, in different format, representation and purpose the wealth of information derived from this data sets represents only valuable results. However, the computing capability and processing speed rather tackle with limitations, even if semi-automatic or automatic procedures are aimed on complex geospatial data (Krist´of et al., 2014). In late times, distributed computing has reached many interdisciplinary areas of computer science inclusive of remote sensing and geographic information processing approaches. Cloud computing even more requires appropriate processing algorithms to be distributed and handle geospatial big data. Map-Reduce programming model and distributed file systems have proven their capabilities to process non GIS big data. But sometimes it’s inconvenient or inefficient to rewrite existing algorithms to Map-Reduce programming model, also GIS data can not be partitioned as text-based data by line or by bytes. Hence, we would like to find an alternative solution for data partitioning, data distribution and execution of existing algorithms without rewriting or with only minor modifications. This paper focuses on technical overview of currently available distributed computing environments, as well as GIS data (raster data) partitioning, distribution and distributed processing of GIS algorithms. A proof of concept implementation have been made for raster data partitioning, distribution and processing. The first results on performance have been compared against commercial software ERDAS IMAGINE 2011 and 2014. Partitioning methods heavily depend on application areas, therefore we may consider data partitioning as a preprocessing step before applying processing services on data. As a proof of concept we have implemented a simple tile-based partitioning method splitting an image into smaller grids (NxM tiles) and comparing the processing time to existing methods by NDVI calculation. The concept is demonstrated using own development open source processing framework.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 9
  • 10.1007/s12186-023-09338-7
Changing Occupations or Changing Companies—Predictors of Different Types of Premature Contract Terminations in Dual Vocational Education and Training Programs
  • Oct 4, 2023
  • Vocations and Learning
  • Stefanie Findeisen + 2 more

In Switzerland, access to non-academic occupations requires the completion of a vocational education and training (VET) program. Over two-thirds of adolescents choose to start a dual VET program after compulsory education. However, this path from school to work is not always linear, and changes can be a means of adjusting wrong career choices. In the context of dual VET, two types of adjustments that occur frequently can be distinguished: (1) change of occupations and (2) change of companies. The present study aims to examine the predictors of each of those two types of changes. First, we are interested in the link between individuals’ intentions to change their career paths and actual changes. When changes are intended by the trainee and aimed at correcting wrong career choices, actual changes can generally be expected to be predicted by change intentions. Second, we are interested in the role of person-job fit (P-J fit) as well as trainees’ socialization and performance indicators. Third, we examine to what extent trainees’ decisions to change occupations or companies can be predicted by pre-entry factors (perceived P-J fit and effort during compulsory education before the transition to VET). We used a longitudinal sample of adolescents at the end of compulsory school and at the end of their first year in a dual VET program in Switzerland. This data set is combined with government data on actual changes regarding individuals’ training companies and their occupations. The two types of adjustments were examined in separate structural equation models that compared trainees without any types of adjustments during their training program (1) to those who changed occupations (N = 417) and (2) to those who changed training companies (N = 378). The results show that actual occupational changes and actual company changes of trainees are affected by the same work-context predictors (negative effect of trainees’ self-perceived work performance) and pre-entry predictors (negative effect of effort during compulsory education). However, in contrast to changes of training companies, changes of occupations are significantly predicted by trainees’ intentions to change. Moreover, while P-J fit during the VET program is the only direct predictor of trainees’ intentions to change occupations, intentions to change companies are not significantly predicted by P-J fit. Intentions to change companies are negatively affected by companies’ socialization tactics and positively affected by adolescents’ pre-entry effort. Overall, the results call for a more differentiated assessment of changes/ premature contract terminations in future studies. Whether change intentions are a valid proxy for actual change behavior seems to depend on the type of changes that trainees decide to make.

  • Research Article
  • Cite Count Icon 143
  • 10.1016/j.ahj.2010.08.010
Linking clinical registry data with administrative data using indirect identifiers: Implementation and validation in the congenital heart surgery population
  • Dec 1, 2010
  • American Heart Journal
  • Sara K Pasquali + 10 more

Linking clinical registry data with administrative data using indirect identifiers: Implementation and validation in the congenital heart surgery population

  • Research Article
  • Cite Count Icon 1
  • 10.5194/gmd-17-815-2024
GEO4PALM v1.1: an open-source geospatial data processing toolkit for the PALM model system
  • Jan 31, 2024
  • Geoscientific Model Development
  • Dongqi Lin + 4 more

Abstract. A geospatial data processing tool, GEO4PALM, has been developed to generate geospatial static input for the Parallelized Large-Eddy Simulation (PALM) model system. PALM is a community-driven large-eddy simulation model for atmospheric and environmental research. Throughout PALM's 20-year development, research interests have been increasing in its application to realistic conditions, especially for urban areas. For such applications, geospatial static input is essential. Although abundant geospatial data are accessible worldwide, geospatial data availability and quality are highly variable and inconsistent. Currently, the geospatial static input generation tools in the PALM community heavily rely on users for data acquisition and pre-processing. New PALM users face large obstacles, including significant time commitments, to gain the knowledge needed to be able to pre-process geospatial data for PALM. Expertise beyond atmospheric and environmental research is frequently needed to understand the data sets required by PALM. Here, we present GEO4PALM, which is a free and open-source tool. GEO4PALM helps users generate PALM static input files with a simple, homogenised, and standardised process. GEO4PALM is compatible with geospatial data obtained from any source, provided that the data sets comply with standard geo-information formats. Users can either provide existing geospatial data sets or use the embedded data interfaces to download geo-information data from free online sources for any global geographic area of interest. All online data sets incorporated in GEO4PALM are globally available, with several data sets having the finest resolution of 1 m. In addition, GEO4PALM provides a graphical user interface (GUI) for PALM domain configuration and visualisation. Two application examples demonstrate successful PALM simulations driven by geospatial input generated by GEO4PALM using different geospatial data sources for Berlin, Germany, and Ōtautahi / Christchurch, New Zealand. GEO4PALM provides an easy and efficient way for PALM users to configure and conduct PALM simulations for applications and investigations such as urban heat island effects, air pollution dispersion, renewable energy resourcing, and weather-related hazard forecasting. The wide applicability of GEO4PALM makes PALM more accessible to a wider user base in the scientific community.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.7764/ijanr.v48i2.2285
Factors that influence French consumer satisfaction in the preference for Chilean avocados (Persea Americana Mill.)
  • Jan 1, 2021
  • International Journal of Agriculture and Natural Resources
  • Ricardo Cabana

Most avocados consumed in France come from Peru or Chile. French suppliers vary based on the season; however, Chile is the main supplier of avocados during the winter season. The aim of this study is to analyze the key variables that influence French consumer satisfaction in the preference for Chilean avocados.This is substantiated using an exploratory multivariate analysis, which was performed on a causal model comprised of endogenous constructs: perceived extrinsic and intrinsic quality, perceived risk and perceived value. The sample consists of 346 French consumers of Chilean avocados in supermarkets of the Auvergne-Rhône-Alpes region. The main results show that the risk perceived by French consumers can only be related to the perceived extrinsic quality of the product. On the other hand, both perceived intrinsic and extrinsic quality are directly related to perceived value. Finally, it is concluded that French consumer satisfaction in the preference for Chilean avocados can be explained by the variables perceived intrinsic quality, perceived extrinsic quality and perceived value.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon