Large Multivariate Data Sets Research Articles

The use of omics data for monitoring the microbial flow of fresh meat products along a production line and the development of spoilage prediction tools from these data is a promising but challenging task. In this context, we produced a large multivariate dataset (over 600 samples) obtained on the production lines of two similar types of fresh meat products (poultry and raw pork sausages). We describe a full analysis of this dataset in order to decipher how the spoilage microbial ecology of these two similar products may be shaped differently depending on production parameter characteristics. Our strategy involved a holistic approach to integrate unsupervised and supervised statistical methods on multivariate data (OTU-based microbial diversity; metabolomic data of volatile organic compounds; sensory measurements; growth parameters), and a specific selection of potential uncontrolled (initial microbiota composition) or controlled (packaging type; lactate concentration) drivers. Our results demonstrate that the initial microbiota, which is shown to be very different between poultry and pork sausages, has a major impact on the spoilage scenarios and on the effect that a downstream parameter such as packaging type has on the overall evolution of the microbial community. Depending on the process, we also show that specific actions on the pork meat (such as deboning and defatting) elicit specific food spoilers such as Dellaglioa algida, which becomes dominant during storage. Finally, ecological network reconstruction allowed us to map six different metabolic pathways involved in the production of volatile organic compounds involved in spoilage. We were able connect them to the different bacterial actors and to the influence of packaging type in an overall view. For instance, our results demonstrate a new role of Vibrionaceae in isopropanol production, and of Latilactobacillus fuchuensis and Lactococcus piscium in methanethiol/disylphide production. We also highlight a possible commensal behavior between Leuconostoc carnosum and Latilactobacillus curvatus around 2,3-butanediol metabolism. We conclude that our holistic approach combined with large-scale multi-omic data was a powerful strategy to prioritize the role of production parameters, already known in the literature, that shape the evolution and/or the implementation of different meat spoilage scenarios.

Read full abstract

Artificial intelligence has been applied in wildfire science and management since the 1990s, with early applications including neural networks and expert systems. Since then, the field has rapidly progressed congruently with the wide adoption of machine learning (ML) methods in the environmental sciences. Here, we present a scoping review of ML applications in wildfire science and management. Our overall objective is to improve awareness of ML methods among wildfire researchers and managers, as well as illustrate the diverse and challenging range of problems in wildfire science available to ML data scientists. To that end, we first present an overview of popular ML approaches used in wildfire science to date and then review the use of ML in wildfire science as broadly categorized into six problem domains, including (i) fuels characterization, fire detection, and mapping; (ii) fire weather and climate change; (iii) fire occurrence, susceptibility, and risk; (iv) fire behavior prediction; (v) fire effects; and (vi) fire management. Furthermore, we discuss the advantages and limitations of various ML approaches relating to data size, computational requirements, generalizability, and interpretability, as well as identify opportunities for future advances in the science and management of wildfires within a data science context. In total, to the end of 2019, we identified 300 relevant publications in which the most frequently used ML methods across problem domains included random forests, MaxEnt, artificial neural networks, decision trees, support vector machines, and genetic algorithms. As such, there exists opportunities to apply more current ML methods — including deep learning and agent-based learning — in the wildfire sciences, especially in instances involving very large multivariate datasets. We must recognize, however, that despite the ability of ML models to learn on their own, expertise in wildfire science is necessary to ensure realistic modelling of fire processes across multiple scales, while the complexity of some ML methods such as deep learning requires a dedicated and sophisticated knowledge of their application. Finally, we stress that the wildfire research and management communities play an active role in providing relevant, high-quality, and freely available wildfire data for use by practitioners of ML methods.

Read full abstract

Large Multivariate Data Sets Research Articles

Articles published on Large Multivariate Data Sets

Effective Data Reduction Using Discriminative Feature Selection Based on Principal Component Analysis

Holistic integration of omics data reveals the drivers that shape the ecology of microbial meat spoilage scenarios.

Anomaly Detection Paradigm for Multivariate Time Series Data Mining for Healthcare

Optimal Data Reduction of Training Data in Machine Learning-Based Modelling: A Multidimensional Bin Packing Approach

Variable targeting and reduction in large vector autoregressions with applications to workforce indicators

A semi-parametric, state-space compartmental model with time-dependent parameters for forecasting COVID-19 cases, hospitalizations and deaths.

Improving Subsurface Characterisation with ‘Big Data’ Mining and Machine Learning

Students’ informal statistical inferences through data modeling with a large multivariate dataset

Predictive geologic mapping from geophysical data using self-organizing maps: A case study from Baie Verte, Newfoundland, Canada

High Performance Multivariate Geospatial Statistics on Manycore Systems

A nonparametric algorithm for automatic classification of large multivariate statistical data sets and its application

Learning multivariate new physics

An integrative machine learning approach to discovering multi-level molecular mechanisms of obesity using data from monozygotic twin pairs.

A review of machine learning applications in wildfire science and management

Bilinear and trilinear modelling of three-way data obtained in two factor designed metabolomics studies

Insights from Self-Organizing Maps for Predicting Accessibility Demand for Healthcare Infrastructure

Enhancing the Radiometric Map of Australia

A Model for Large Multivariate Spatial Datasets

Self Organising Maps - A Case Study of Broken Hill

Computational method for discovery of biomarker signatures from large, complex data sets

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large Multivariate Data Sets Research Articles

Articles published on Large Multivariate Data Sets

Effective Data Reduction Using Discriminative Feature Selection Based on Principal Component Analysis

Holistic integration of omics data reveals the drivers that shape the ecology of microbial meat spoilage scenarios.

Anomaly Detection Paradigm for Multivariate Time Series Data Mining for Healthcare

Optimal Data Reduction of Training Data in Machine Learning-Based Modelling: A Multidimensional Bin Packing Approach

Variable targeting and reduction in large vector autoregressions with applications to workforce indicators

A semi-parametric, state-space compartmental model with time-dependent parameters for forecasting COVID-19 cases, hospitalizations and deaths.

Improving Subsurface Characterisation with ‘Big Data’ Mining and Machine Learning

Students’ informal statistical inferences through data modeling with a large multivariate dataset

Predictive geologic mapping from geophysical data using self-organizing maps: A case study from Baie Verte, Newfoundland, Canada

High Performance Multivariate Geospatial Statistics on Manycore Systems

A nonparametric algorithm for automatic classification of large multivariate statistical data sets and its application

Learning multivariate new physics

An integrative machine learning approach to discovering multi-level molecular mechanisms of obesity using data from monozygotic twin pairs.

A review of machine learning applications in wildfire science and management

Bilinear and trilinear modelling of three-way data obtained in two factor designed metabolomics studies

Insights from Self-Organizing Maps for Predicting Accessibility Demand for Healthcare Infrastructure

Enhancing the Radiometric Map of Australia

A Model for Large Multivariate Spatial Datasets

Self Organising Maps - A Case Study of Broken Hill

Computational method for discovery of biomarker signatures from large, complex data sets