High-quality Training Datasets Research Articles

Artificial Neural Networks (ANN) are trained to simulate two-phase capillary pressure and relative permeability data in bundles of capillary tubes with non-uniform arbitrary wettability conditions and cross-sectional shapes of different irregular convex polygons. All polygons with variable number of corners are randomly generated for a given range of inscribed radii, shape, and elongation factors. To generate the data for the training of ANNs, the minimization of Helmholtz free energy and Mayer-Stowe-Princen (MS-P) method are combined to find thermodynamically consistent threshold capillary pressures for two-phase flow. These capillary pressures are then used to determine the sequence of displacements in different capillary tubes. We calculate saturations and phase conductance at each quasi steady-state condition where no more displacements can be done for a given capillary pressure. The generated two-phase capillary pressure and relative permeability curves are then used for the training of ANNs. We test different designs of ANNs to find the optimal workflow for the training and predicting of petrophysical properties related to multiphase flow. In this work, we present the results of two different neural network structures. In the first structure, we use ANN to predict threshold capillary pressures of different capillary tubes during a drainage process (i.e., oil-to-water displacements). In the second structure, we predict capillary pressure and relative permeability curves for an arbitrary bundle of capillary tubes. The first structure of ANNs simulates a fixed property for a given capillary tube, whereas the second structure simulates time-series data format (i.e., for a given bundle of capillary tubes calculated properties vary with saturation). To do so, we have generated multi-phase flow properties for two large datasets consisting of 40,000 and 60,000 capillary tubes each. High-quality training datasets are critical in the training of high-fidelity ANN models. These models can then learn the impact of a wide variety of pore geometries (i.e., shape factors and elongations). Additionally, feature selection and preprocessing of the input data could significantly impact ANN's predictions. The multi-layer perceptron (MLP) neural network with three hidden layers with four outputs is adequate for predicting capillary pressure and relative permeability curves during drainage. This model is approximately an order of magnitude faster than conventional direct calculations using a desktop computer with four cores CPU. Such improvement in the speed of calculations becomes significant when dealing with larger models, more dimensions, and/or introducing pore connectivity in 3D.

BackgroundTwo component systems (TCS) are signalling complexes manifested by a histidine kinase (receptor) and a response regulator (effector). They are the most abundant signalling pathways in prokaryotes and control a wide range of biological processes. The pairing of these two components is highly specific, often requiring costly and time-consuming experimental characterisation. Therefore, there is considerable interest in developing accurate prediction tools to lessen the burden of experimental work and cope with the ever-increasing amount of genomic information.ResultsWe present a novel meta-predictor, MetaPred2CS, which is based on a support vector machine. MetaPred2CS integrates six sequence-based prediction methods: in-silico two-hybrid, mirror-tree, gene fusion, phylogenetic profiling, gene neighbourhood, and gene operon. To benchmark MetaPred2CS, we also compiled a novel high-quality training dataset of experimentally deduced TCS protein pairs for k-fold cross validation, to act as a gold standard for TCS partnership predictions. Combining individual predictions using MetaPred2CS improved performance when compared to the individual methods and in comparison with a current state-of-the-art meta-predictor.ConclusionWe have developed MetaPred2CS, a support vector machine-based metapredictor for prokaryotic TCS protein pairings. Central to the success of MetaPred2CS is a strategy of integrating individual predictors that improves the overall prediction accuracy, with the in-silico two-hybrid method contributing most to performance. MetaPred2CS outperformed other available systems in our benchmark tests, and is available online at http://metapred2cs.ibers.aber.ac.uk, along with our gold standard dataset of TCS interaction pairs.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0741-7) contains supplementary material, which is available to authorized users.

High-quality Training Datasets Research Articles

Related Topics

Articles published on High-quality Training Datasets

Methods, New Software Tools, and Best Practices for Developing High-quality Training Data for Machine Learning-based Image Analysis in Biodiversity Research

Application of neural networks in multiphase flow through porous media: Predicting capillary pressure and relative permeability curves

A low-cost photorealistic CG dataset rendering pipeline for facial landmark localization

Cooperative Hybrid Semi-Supervised Learning for Text Sentiment Classification

An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets.

The Optimal ANN Model for Predicting Bearing Capacity of Shallow Foundations trained on Scarce Data

HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold.

An Unsupervised Approach to Inferring the Localness of People Using Incomplete Geotemporal Online Check-In Data

Multi-level hybrid support vector machine and extreme learning machine based on modified K-means for intrusion detection system

Genome-wide prediction of prokaryotic two-component system networks using a sequence-based meta-predictor.

An unsupervised hierarchical clustering based heuristic algorithm for facilitated training of electricity consumption disaggregation systems

Neural network Jacobian analysis for high-resolution profiling of the atmosphere

A novel data selection method based on shadowed sets

Learning from imperfect data

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

High-quality Training Datasets Research Articles

Related Topics

Articles published on High-quality Training Datasets

Methods, New Software Tools, and Best Practices for Developing High-quality Training Data for Machine Learning-based Image Analysis in Biodiversity Research

Application of neural networks in multiphase flow through porous media: Predicting capillary pressure and relative permeability curves

A low-cost photorealistic CG dataset rendering pipeline for facial landmark localization

Cooperative Hybrid Semi-Supervised Learning for Text Sentiment Classification

An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets.

The Optimal ANN Model for Predicting Bearing Capacity of Shallow Foundations trained on Scarce Data

HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold.

An Unsupervised Approach to Inferring the Localness of People Using Incomplete Geotemporal Online Check-In Data

Multi-level hybrid support vector machine and extreme learning machine based on modified K-means for intrusion detection system

Genome-wide prediction of prokaryotic two-component system networks using a sequence-based meta-predictor.

An unsupervised hierarchical clustering based heuristic algorithm for facilitated training of electricity consumption disaggregation systems

Neural network Jacobian analysis for high-resolution profiling of the atmosphere

A novel data selection method based on shadowed sets

Learning from imperfect data