Volume Of Training Data Research Articles

e13592 Background: The emergence of digital pathology — an image-based environment for the acquisition, management and interpretation of pathology information supported by computational techniques for data extraction and analysis — is changing the pathology ecosystem. The development of machine-learning approaches for the extraction of information from image data, allows for tissue interrogation in a way that was not previously possible. However, creating digital pathology algorithms requires large volumes of training data, often on the order of thousands of histopathology slides. This becomes problematic for rare diseases, where imaging datasets of such size do not exist. This makes it impossible to train digital pathology models for these rare conditions. However, recent advances in generative deep learning models may provide a method for overcoming this lack of histology data for rare diseases. Pre-trained diffusion-based probabilistic models can be used to create photorealistic variations of existing images. In this study, we explored the potential of using a deep generative model created by OpenAI for the purpose of producing synthetic histopathology images, using chondrosarcoma as our rare tumor of interest. Methods: Our team compiled a dataset of 55 chondrosarcoma histolopathology images from the annotated records of Dr. Henry Jaffe, a pioneering authority in musculoskeletal pathology. We built a deep learning image-generation application in a Jupyter notebook environment, iterating upon OpenAI’s DALL-E application processing interface (API) with python programming language. Using the chondrosarcoma histology dataset and NVIDIA GPUs, we trained the deep learning application to generate multiple synthetic variations of each real chondrosarcoma image. Results: After several hours, the deep learning model successfully generated 1,000 images of chondrosarcoma from 55 original images. The synthetic histology images retained photorealistic quality and displayed characteristic cellular features of chondrosarcoma tumor tissue. Conclusions: Deep generative models may be useful in addressing issues of data scarcity in rare diseases, such as chondrosarcoma. For example, in situations where existing imaging data is insufficient for training diagnostic computer vision models, diffusion-based generative models could be applied to create training datasets. However, further exploration of ethical considerations and qualitative analyses of these generated data are needed.

Purpose of research: reduction of additional errors in measuring gas concentrations in gas analytical systems (GS) caused by the sensitivity of semiconductor sensors to non-target components of gas mixtures, ambient temperature and humidity. To develop and test a two-module neural network method for processing information in a GS, which allows automating the processes of generating training data and searching for the optimal structure of artificial neural networks (ANNs), reducing errors in reproducing the characteristics of sensors by replacing their mathematical models with neural networks.Methods. Theory of artificial neural networks, numerical methods, simulation methods. To evaluate the effectiveness of the proposed solution, the relative error (d), standard deviation (RMS) were calculated, and comparison with analogues was carried out.Results: a two-module neural network method for processing information in a GS has been studied. Numerical modeling was used to carry out experimental studies on the choice of optimal ANN structures, the volume and composition of training data. In the course of experimental studies, the errors of generating training data using ANN (less than 5%) and determining the concentrations of detected gases under conditions of fluctuations in the parameters of the air environment and the composition of the gas mixture (less than 4%) were calculated.Conclusion. A two-module neural network method for information processing is proposed, which is distinguished by the use of two successive modules of multilayer neural networks for generating training data and processing information coming from the GS sensor unit. The use of an auxiliary module makes it possible to compress the initial data, unify and automate the process of their generation, as well as improve the accuracy of reproduction of multiparameter sensor conversion functions, in comparison with alternative methods. Results of experimental studies of the effectiveness of using the information processing method to reduce additional errors in the quantitative determination of the composition of the air environment under conditions of parameter fluctuations are presented.

Volume Of Training Data Research Articles

Related Topics

Articles published on Volume Of Training Data

Height estimation from single aerial imagery using contrastive learning based multi-scale refinement network

Breast mass segmentation using mammographic data: a systematic review

Dynamic Network-Assisted D2D-Aided Coded Distributed Learning

A Physics-Constrained Bayesian neural network for battery remaining useful life prediction

Generating synthetic samples of chondrosarcoma histopathology with a denoising diffusion probabilistic model.

Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly.

Combined visual and spatial-temporal information for appearance change person re-identification

Two-Module Neural Network Method of Information Processing in Gas Analysis Systems

CSI-Based Human Activity Recognition Using Multi-Input Multi-Output Autoencoder and Fine-Tuning.

Efficient Data-Driven Off-Design Constraint Modeling for Practical Aerodynamic Shape Optimization

Research on group distribution leakage analysis method for RSM

Brain-machine interface based on transfer-learning for detecting the appearance of obstacles during exoskeleton-assisted walking.

Using Geospatial Information to Map Yield Gain from the Use of Azospirillum brasilense in Furrow

Transferability investigation of a Sim2Real deep transfer learning framework for cross-building energy prediction

Blinded Predictions and Post Hoc Analysis of the Second Solubility Challenge Data: Exploring Training Data and Feature Set Selection for Machine and Deep Learning Models.

Classification of Complicated Urban Forest Acoustic Scenes with Deep Learning Models

Design De-Identification of Thermal History for Collaborative Process-Defect Modeling of Directed Energy Deposition Processes

Developing efficient deep learning model for predicting copolymer properties.

Data-Driven Optimal Power Dispatch for Distributed Energy Resources in Radial Feeder using Multi-Stage Regression

Development of domain-specific automatic speech recognition models based on open-source data

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Volume Of Training Data Research Articles

Related Topics

Articles published on Volume Of Training Data

Height estimation from single aerial imagery using contrastive learning based multi-scale refinement network

Breast mass segmentation using mammographic data: a systematic review

Dynamic Network-Assisted D2D-Aided Coded Distributed Learning

A Physics-Constrained Bayesian neural network for battery remaining useful life prediction

Generating synthetic samples of chondrosarcoma histopathology with a denoising diffusion probabilistic model.

Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly.

Combined visual and spatial-temporal information for appearance change person re-identification

Two-Module Neural Network Method of Information Processing in Gas Analysis Systems

CSI-Based Human Activity Recognition Using Multi-Input Multi-Output Autoencoder and Fine-Tuning.

Efficient Data-Driven Off-Design Constraint Modeling for Practical Aerodynamic Shape Optimization

Research on group distribution leakage analysis method for RSM

Brain-machine interface based on transfer-learning for detecting the appearance of obstacles during exoskeleton-assisted walking.

Using Geospatial Information to Map Yield Gain from the Use of Azospirillum brasilense in Furrow

Transferability investigation of a Sim2Real deep transfer learning framework for cross-building energy prediction

Blinded Predictions and Post Hoc Analysis of the Second Solubility Challenge Data: Exploring Training Data and Feature Set Selection for Machine and Deep Learning Models.

Classification of Complicated Urban Forest Acoustic Scenes with Deep Learning Models

Design De-Identification of Thermal History for Collaborative Process-Defect Modeling of Directed Energy Deposition Processes

Developing efficient deep learning model for predicting copolymer properties.

Data-Driven Optimal Power Dispatch for Distributed Energy Resources in Radial Feeder using Multi-Stage Regression

Development of domain-specific automatic speech recognition models based on open-source data