Transformation of RELAC Data to Group Level to Match with the A-Religion Data Set
A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.
- Research Article
7
- 10.1093/jee/54.5.859
- Oct 1, 1961
- Journal of Economic Entomology
The procedure used to determine the appropriate transformation of insect control data consisted of testing the untransformed data for fulfillment of the basic assumptions underlying the analysis of variance procedure and its related tests of significance. The data were tested two more times, once when the data were transformed to the square-root scale and once when transformed to the log scale. A nonparametric, rank-correlation coefficient was computed for the ranges and means of each set of data for evidence of normality. A nonsignificant coefficient was an indication that the ranges and means were independent of each other, thus indicating normality of the data. After normality had been established, Bliss and Calhoun’s modification to Hartley’s test for heterogeneity of variance and Tukey’s test for nonadditivity were computed for each set of data. A set of data was considered appropriate for use in the analysis of variance when the data fulfilled the basic assumptions of the analysis procedure. For illustrative purposes this procedure was used on two control experiments, one on the potato leafhopper ( Ernpoasca fabae (Harris)) and one on the alfalfa snout beetle ( Brachyrhinus ligustici (L)) larva. The leafhopper data transformed both to the square-root and log scales were found to fulfill the assumptions while a log transformation appeared to be the best scale to use for the snout beetle data.
- Research Article
122
- 10.1016/s0377-8398(97)00047-9
- Jun 1, 1998
- Marine Micropaleontology
Logratio transformation of compositional data: a resolution of the constant sum constraint
- Research Article
1
- 10.1682/jrrd.2010.08.0150
- Jan 1, 2010
- The Journal of Rehabilitation Research and Development
The Veterans Health Administration (VHA) is the largest integrated healthcare system in the United States, caring for nearly 6 million of the nation's 23 million veterans annually. VHA provides a full range of primary care, mental health, medical specialty, surgical, and rehabilitative services to enrolled veterans. It employs over 240,000 staff (including 65,000 health professionals) and manages over 1,400 sites of care (medical centers, outpatient clinics, nursing homes, and counseling centers). The critical need for a comprehensive electronic medical record system is demonstrated by the observation that approximately 130 million outpatient encounters, 600,000 hospitalizations, over 400 million laboratory tests and radiological procedures, and 170 million prescription fills are provided annually. In addition, nearly 1 billion clinically related free-text notes are recorded annually, representing a small fraction of the total electronic data collected over the past two decades. VHA's effective use of health information technology for patient care and healthcare administration, combined with expert data analyses by VHA researchers and managers, has been an important contributing factor to the success of VHA quality improvement efforts and its ability to advance evidence-based care [1-5]. Researchers have a long history of using the VHA's vast data resources to better understand the VHA's and the nation's healthcare system on individual patient, regional, and population levels. For example, recent publications have utilized VHA databases to address issues as diverse as the impact of resident duty hour reform on surgical and procedural patient safety indicators [6], the influence of obesity on quality of care [7], the impact of clopidogrel and proton pump inhibitor treatment on outcome of acute coronary syndrome [8], the impact of increasing medication copayments on medication adherence [9], and the proportion of patients at least 65 years of age hospitalized for pneumonia who are subsequently diagnosed with pulmonary malignancy [10]. The results of VHA data-based publications are often relevant to non-VHA practices and healthcare organizations. The research value of VHA databases depends on not only the quality and accessibility of the underlying data but also an understanding of the clinical and administrative processes that produce data, of how best to use data to measure important constructs, and of limitations inherent in using administrative and clinical data for research objectives. VHA researchers have greatly enhanced the value of VHA health information technology by the skilled use of data and analytic techniques to provide answers to important research questions. VHA researchers, by conducting systematic data evaluations, energize efforts to address data quality and clinical coding and thereby contribute to system-level improvements. Going forward, data quality evaluations inform efforts to improve the usefulness of VHA data for both research and management purposes. The results of formal analyses of data quality and completeness are disseminated through journal publications, web-based seminars on data quality that are subsequently archived in a readily retrievable format, and education of staff in the use and interpretation of data and analytic tools. Further, to facilitate access and analysis, VHA is implementing structured documentation using emerging clinical data standards to complement or replace unstructured text-based medical notes and reports, centralizing the most important clinical and administrative data sets, and evaluating the feasibility of using natural language processing to abstract clinically relevant concepts from its extensive free-text data. The success of the VHA healthcare system depends importantly on the transformation of data into information. …
- Research Article
- 10.21456/vol2iss1pp036-042
- Jan 27, 2014
- JURNAL SISTEM INFORMASI BISNIS
A company, majorly company that active in commercial (profit orientation) need to analyze their sales performance. By analyzing sales performance, company can increase their sales performance. One of method to analyze sales performance is by collecting historical data that relates to sales and then process that data so that produce information that show company sales performance. A data warehouse is a set of data that has characteristic subject oriented, time variant, integrated, and nonvolatile that help company management in processing of decision making. Design of data warehouse is started from collecting data that relate to sales such as product, customer, sales area, sales transaction, etc. After collecting the data, next is data extraction and transformation. Data extraction is a process f or selecting data that will be loaded into data warehouse. Data transformation is making some change to the data afte r extracted to be more consistent. After transformation processing, data are loaded into data warehouse. Data in data warehouse is processed by OLAP (On Line Analytical Processing) to produce information. Information that are produced from data processing by OLAP are chart and query reporting. Chart reporting are sales chart based on cement type, sales chart based on sales area, sales chart based on plant, monthly and year ly sales chart, and chart based on customer feedback. Query reporting are sales based on cement type, sales area, plant and customer. Keywords: Data warehouse; OLAP; Sales performance analysis; Ready mix market
- Research Article
1
- 10.1038/hdy.1997.66
- Apr 1, 1997
- Heredity
Sensitivity of segregation analysis for data structure and data transformation was studied using data from two trials in which mice were challenged at three months of age with a cloned isolate of Trypanosoma congolense and survival time was recorded. Data included records from three inbred strains (C57BL/6 (tolerant), A/J, and BALB/c (both susceptible)) and their crosses. Data were standardized and normalized using a modified power transformation. Segregation analysis was applied to both untransformed and transformed data to determine the genetic inheritance of trypanotolerance in these mice. Data from the two trials were analysed separately and combined. Four genetic models were compared; a one locus model, a polygenic model, a mixed model with common variance, and a mixed model with different variances for each major genotype. Even though the separate data sets and the combined data set all supported the hypothesis of a major gene (or a tightly linked cluster of genes) with different variances within each genotype, parameter estimates were highly sensitive to data transformation and several sets of parameter estimates gave similar likelihood values because of high dependency between parameters. Based on the results segregation analysis can be very sensitive to data structure in a crossbreeding design and to data transformation. Interpretation of the results can be misleading if the entire parameter space is not studied carefully.
- Research Article
1
- 10.1038/sj.hdy.6881250
- Apr 1, 1997
- Heredity
Sensitivity of segregation analysis for data structure and data transformation was studied using data from two trials in which mice were challenged at three months of age with a cloned isolate of Trypanosoma congolense and survival time was recorded. Data included records from three inbred strains (C57BL/6 (tolerant), A/J, and BALB/c (both susceptible)) and their crosses. Data were standardized and normalized using a modified power transformation. Segregation analysis was applied to both untransformed and transformed data to determine the genetic inheritance of trypanotolerance in these mice. Data from the two trials were analysed separately and combined. Four genetic models were compared; a one locus model, a polygenic model, a mixed model with common variance, and a mixed model with different variances for each major genotype. Even though the separate data sets and the combined data set all supported the hypothesis of a major gene (or a tightly linked cluster of genes) with different variances within each genotype, parameter estimates were highly sensitive to data transformation and several sets of parameter estimates gave similar likelihood values because of high dependency between parameters. Based on the results segregation analysis can be very sensitive to data structure in a crossbreeding design and to data transformation. Interpretation of the results can be misleading if the entire parameter space is not studied carefully.
- Peer Review Report
1
- 10.7554/elife.78717.sa2
- Jul 26, 2022
Author response: Robust group- but limited individual-level (longitudinal) reliability and insights into cross-phases response prediction of conditioned fear
- Research Article
31
- 10.4025/actasciagron.v40i1.35300
- Mar 1, 2018
- Acta Scientiarum. Agronomy
There are researchers who do not recommend data transformation arguing it causes problems in inferences and mischaracterises data sets, which can hinder interpretation. There are other researchers who consider data transformation necessary to meet the assumptions of parametric models. Perhaps the largest group of researchers who make use of data transformation are concerned with experimental accuracy, which provokes the misuse of this tool. Considering this, our paper offer a study about the most frequent situations related to data transformation and how this tool can impact ANOVA assumptions and experimental accuracy. Our database was obtained from measurements of seed physiology and seed technology. The coefficient of variation cannot be used as an indicator of data transformation. Data transformation might violate the assumptions of analysis of variance, invalidating the idea that its use will provoke fail the inferences, even if it does not improve the quality of the analysis. The decision about when to use data transformation is dichotomous, but the criteria for this decision are many. The unit (percentage, day or seedlings per day), the experimental design and the possible robustness of F-statistics to ‘small deviations’ to Normal are among the main indicators for the choice of the type of transformation.
- Conference Article
2
- 10.1109/hipc56025.2022.00038
- Dec 1, 2022
With the rise of machine learning and data analytics, the ability to process large and diverse sets of data efficiently has become crucial. Research has shown that data transformation is a key performance bottleneck for applications across a variety of domains, from data analytics to scientific computing. Custom hardware accelerators and GPU implementations targeting specific data transformation tasks can alleviate the problem, but suffer from narrow applicability and lack of generality.To tackle this problem, we propose a GPU-accelerated data transformation engine grounded on pushdown transducers. We define an extended pushdown transducer abstraction (effPDT) that allows expressing a wide range of data transformations in a memory-efficient fashion, and is thus amenable for GPU deployment. The effPDT execution engine utilizes a data streaming model that reduces the application’s memory requirements significantly, facilitating deployment on high- and low-end systems. We showcase our GPU-accelerated engine on a diverse set of transformation tasks covering data encoding/decoding, parsing and querying of structured data, and matrix transformation, and we evaluate it against publicly available CPU and GPU library implementations of the considered data transformation tasks. To understand the benefits of the effPDT abstraction, we extend our data transformation engine to also support finite state transducers (FSTs), we map the considered data transformation tasks on FSTs, and we compare the performance and resource requirements of the FST-based and the effPDT-based implementations.
- Book Chapter
1
- 10.1016/b978-0-12-809633-8.20459-7
- May 16, 2018
- Reference Module in Life Sciences
Data Integration and Transformation
- Research Article
19
- 10.1016/j.eswa.2018.05.017
- May 26, 2018
- Expert Systems with Applications
Study of data transformation techniques for adapting single-label prototype selection algorithms to multi-label learning
- Conference Article
11
- 10.1145/1378773.1378784
- Jan 13, 2008
We are building a smart visual dialog system that aids users in investigating large and complex data sets. Given a user's data request, we automate the generation of a visual response that is tailored to the user's context. In this paper, we focus on the problem of data transformation, which is the process of preparing the raw data (e.g., cleaning and scaling) for effective visualization. Specifically, we develop an optimization-based approach to data transformation. Compared to existing approaches, which normally focus on specific transformation techniques, our work addresses how to dynamically determine proper data transformations for a wide variety of visualization situations. As a result, our work offers two unique contributions. First, we provide a general computational framework that can dynamically derive a set of data transformations to help optimize the quality of the target visualization. Second, we provide an extensible, feature-based model to uniformly represent various data transformation operations and visualization quality metrics. Our evaluation shows that our work significantly improves visualization quality and helps users to better perform their tasks.
- Book Chapter
2
- 10.5772/9165
- Apr 1, 2010
Large quantities of multivariate data generated in scientific, engineering, business and other fields triggered exciting developments in information visualization, data mining and knowledge discovery with the objective to identify and describe properties of data sets. Selforganizing map (SOM) is an unsupervised neural network technique capable of analyzing large and complex multivariate data. It attempts to address the problems of highdimensional data and identify the underlying patterns by reducing the dimensionality achieved through grouping of similar objects and mapping them to a low-dimensional space, usually to a two-dimensional surface also known as a topological map. The results of a SOM could be misinterpreted if taken out of context. For example, the distance between neighbouring weight vectors does not correspond to the physical location of those vectors on the matrix of output nodes as described by Ultsch (Ultsch & Vetter, 1994). The widespread use of the algorithm is attributed to its simplicity. The analytic and graphical Kohonen SOMs and its variations have been successfully applied to the analysis of complex, large-dimensional data sets from diverse sources, including biomedical data, such as by (Durbin & Mitchison, 1990; Tamayo et al., 1999; Van Osdol et al., 1994), just to name a few. The quest for effective and efficient visualization techniques capable of displaying large numbers of high-dimensional records has been formally underway since 1987 when the National Science Foundation sponsored a workshop on Visualization in Scientific Computing as Wong and Bergeron pointed out (Wong & Bergeron, 1997). The problem dates back to the first graphical representation of various types of data sets. Information visualization, as the field is often named, is summarized by the transformation of data – in whatever form – into pictures, with pictures being interpreted by a human being (Spence, 2007). The main advantage of visual displays is the ability to harness the human perceptual system, improving over tabular or other data representation forms. We utilize the benefits of the SOM and visual displays through a linked technique that integrates the SOM with twoand three-dimensional information visualization techniques (a.k.a., iNNfovis), which serves as a model for constrained self-organization. Properties of iNNfovis environments are harnessed through interactive analysis of large data sets for nontrivial feature extraction. Various iNNfovis configurations provide unique environments for 10
- Book Chapter
6
- 10.1016/b978-012763560-6/50032-5
- Jan 1, 2001
- The Handbook of Organic Compounds, Three-Volume Set
29 - Review of Chemometrics Applied to Spectroscopy: Data Preprocessing
- Research Article
14
- 10.1007/s12065-019-00267-w
- Jul 27, 2019
- Evolutionary Intelligence
Most of the organizations are mainly focusing on large datasets for automatic mining of necessary information from big medical data. The major issue of the big medical data is about its complex data sets and volume, which is gradually increasing. This paper intends to propose a big data classification model (heart disease) in health care, which includes certain phases or steps. The steps are as follows: (1) Map-reduce framework (2) support vector machine (SVM) (3) optimized decision tree classifier (DT). Initially, the big data is supplied as the input to the MapReduce Framework, where it reduces the data content through some major operations. This framework uses the principle component analysis to reduce the dimensions of data. The reduced data is subjected to SVM, where it outputs the classes. The output data from SVM is processed with a new contribution called ‘Data transformation’ that paves way for optimal rule generation in decision tree classifier. The advanced optimization concept is involved in this process to optimize the weight and integer in data transformation. This paper introduces a new algorithm namely divergence based grey wolf optimization (DGWO). Finally, the transformed data is subjected to DT, where the classification takes place. The proposed DGWO model is compared over other conventional methods like firefly algorithm, artificial bee colony algorithm, particle swarm optimization algorithm, genetic algorithm and grey wolf optimizer algorithms.