Large Test Data Sets Research Articles

Safety incidents have always been a crucial risk in work spaces, especially industrial sites. In the last few decades, significant efforts have been dedicated to incident control measures to reduce the rate of safety incidents. Despite all these efforts, the rate of decline in serious injuries and fatalities (SIFs) has been considerably lower than the rate of decline for non-critical incidents. This observation has led to a change of risk reduction paradigm for safety incidents. Under the new paradigm, more focus has been allocated to reducing the rate of critical/SIF incidents, as opposed to reducing the count of all incidents. One of the challenges in reducing the number of SIF incidents is the proper identification of the risk prior to materialization. One of the reasons for risk identification being a challenge is that companies usually only focus on incidents where SIF did occur reactively, and incidents that did not cause SIF but had the potential to do so go unnoticed. Identifying these potentially significant incidents, referred to as potential serious injuries and fatalities (PSIF), would enable companies to work on identifying critical risk and taking steps to prevent them preemptively. However, flagging PSIF incidents requires all incident reports to be analyzed individually by experts and hence significant investment, which is often not affordable, especially for small and medium sized companies. This study is aimed at addressing this problem through machine learning powered automation. We propose a novel approach based on binary classification for the identification of such incidents involving PSIF (potential serious injuries and fatalities). This is the first work towards automatic risk identification from incident reports. Our approach combines a pre-trained transformer model with XGBoost. We utilize advanced natural language processing techniques to encode an incident record comprising heterogeneous fields into a vector representation fed to XGBoost for classification. Moreover, given the scarcity of manually labeled incident records available for training, we leverage weak labeling to augment the label coverage of the training data. We utilize the F2 metric for hyperparameter tuning using Tree-structured Parzen Estimator to prioritize the detection of PSIF records over the avoidance of non-PSIF records being mis-classified as PSIF. The proposed methods outperform several baselines from other studies on a significantly large test dataset.

Abstract T cell inducing vaccines are key for the development of effective therapies against cancer and infectious diseases. Peptides, presented by Human Leukocyte Antigens (HLAs), are the targets of T cells, and their identification is therefore critical to the development of such vaccines. Here, we present the latest improvement in our EDGETM (Epitope Discovery for GEnomes) platform to address this critical need. EDGE is comprised of AI models that can predict peptide presentation by HLA class I and class II. Although the models are trained primarily using immunopeptidomics data, EDGE scores are predictive of peptide-HLA immunogenicity. There are three class I presentation models in EDGE: an allele-specific model, a pan-specific model, and a model specific for infectious diseases. The allele-specific model is applicable to a large but pre-defined set of HLA alleles. On a large test dataset, the allele-specific model achieved an average precision (AP) of 63% (PPV40=79%) compared to the AP of a standard best-available public model of 21% (PPV40=28%). A Ph1/2 clinical study of personalized cancer vaccines encoding neoantigens predicted from the allele-specific model demonstrated a ~50% molecular response (defined as &gt;=30% reduction in circulating tumor DNA relative to baseline) rate with associated extended overall survival (vs non-responders) in metastatic, microsatellite stable colorectal cancer patients. We observed that &gt;50% of the mutations were able to elicit T cell responses. The pan-specific class I model uses HLA sequences as input feature when training and, therefore, is applicable to any HLA. On the same test dataset as above, it achieved an AP of 65% (PPV40=81%) and performed better on average for ~40 less-common HLA alleles. Prediction of viral peptide presentation by HLA class I is challenging due to the lack of immunopeptidomics data. The class I model for infectious diseases was specifically optimized to predict for viral peptides and, therefore, performed better than available class I models on published HIV and Influenza A datasets. Prediction of peptide presentation by HLA class II is challenging due to the flexibility in how the longer peptides interact with open HLA grooves as well as the lack of immunopeptidomics data as compared to the class I peptides. The class II model in EDGE, EDGE-II, uses the latest developments in protein large language models, a novel learned HLA allele-deconvolution strategy, and in-house immunopeptidomics data, resulting in improved prediction of peptide presentation by HLA class II and immunogenicity driven by CD4+ T cells. On a benchmark validation dataset, EDGE-II achieved an AP of 71% as compared to AP of 62% of a leading published model. In summary, EDGETM provides a comprehensive state-of-the-art platform for the development of vaccines that can induce both CD8+ and CD4+ T cell responses to provide durable benefit to patients. Citation Format: Joshua Klein, Daniel Sprague, Monica Lane, Meghan Hart, Olivia Petrillo, Italo Faria do Valle, Matthew Davis, Andrew Ferguson, Andrew Allen, Karin Jooss, Ankur Dhanik. AI platform provides an EDGE and enables state-of-the-art identification of peptide-HLAs for the development of T cell inducing vaccines [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 904.

Large Test Data Sets Research Articles

Related Topics

Articles published on Large Test Data Sets

Understanding overfitting in random forest for probability estimation: a visualization and simulation study

A Critical Review of Emerging Technologies for Flash Flood Prediction: Examining Artificial Intelligence, Machine Learning, Internet of Things, Cloud Computing, and Robotics Techniques

Automatic identification of incidents involving potential serious injuries and fatalities (PSIF)

Abstract 904: AI platform provides an EDGE and enables state-of-the-art identification of peptide-HLAs for the development of T cell inducing vaccines

An Enhanced Automated Identification of Brain Tumor Cells Using Image Segmentation

Multi-UAV Cooperative Trajectory Planning Based on the Modified Cheetah Optimization Algorithm.

Efficient and scalable DBSCAN framework for clustering continuous trajectories in road networks

Geometric Deep Learning for Molecular Crystal Structure Prediction.

Large-scale demonstration of machine learning for the detection of volcanic deformation in Sentinel-1 satellite imagery

Understanding the role of sequence stratigraphy and diagenesis on the temporal and spatial distribution of carbonate reservoir quality: A conceptual modeling approach

Optimization of hyperparameters of Gaussian process regression with the help of а low-order high-dimensional model representation: application to a potential energy surface

Predicting the Stability of Hierarchical Triple Systems with Convolutional Neural Networks

STORM DRAIN DETECTION AND LOCALISATION ON MOBILE LIDAR DATA USING A PRE-TRAINED RANDLA-NET SEMANTIC SEGMENTATION NETWORK

Deep Learning Approach for Radical Sound Valuation of Fetal Weight

DEVELOPMENT OF TEST CORPUS WITH LARGE VOCABULARY FOR TURKISH SPEECH RECOGNITION SYSTEM AND A NEW TEST PROCEDURE

A new approach to modeling cycles with summer and winter demand peaks as input variables for deep neural networks

Prediction of frictional braking noise based on brake dynamometer test and artificial intelligent algorithms

Individual variations in 'brain age' relate to early-life factors more than to longitudinal brain change.

Computational analysis of axially loaded thin-walled rectangular concrete-filled stainless steel tubular short columns incorporating local buckling effects

ProFitFun: a protein tertiary structure fitness function for quantifying the accuracies of model structures.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large Test Data Sets Research Articles

Related Topics

Articles published on Large Test Data Sets

Understanding overfitting in random forest for probability estimation: a visualization and simulation study

A Critical Review of Emerging Technologies for Flash Flood Prediction: Examining Artificial Intelligence, Machine Learning, Internet of Things, Cloud Computing, and Robotics Techniques

Automatic identification of incidents involving potential serious injuries and fatalities (PSIF)

Abstract 904: AI platform provides an EDGE and enables state-of-the-art identification of peptide-HLAs for the development of T cell inducing vaccines

An Enhanced Automated Identification of Brain Tumor Cells Using Image Segmentation

Multi-UAV Cooperative Trajectory Planning Based on the Modified Cheetah Optimization Algorithm.

Efficient and scalable DBSCAN framework for clustering continuous trajectories in road networks

Geometric Deep Learning for Molecular Crystal Structure Prediction.

Large-scale demonstration of machine learning for the detection of volcanic deformation in Sentinel-1 satellite imagery

Understanding the role of sequence stratigraphy and diagenesis on the temporal and spatial distribution of carbonate reservoir quality: A conceptual modeling approach

Optimization of hyperparameters of Gaussian process regression with the help of а low-order high-dimensional model representation: application to a potential energy surface

Predicting the Stability of Hierarchical Triple Systems with Convolutional Neural Networks

STORM DRAIN DETECTION AND LOCALISATION ON MOBILE LIDAR DATA USING A PRE-TRAINED RANDLA-NET SEMANTIC SEGMENTATION NETWORK

Deep Learning Approach for Radical Sound Valuation of Fetal Weight

DEVELOPMENT OF TEST CORPUS WITH LARGE VOCABULARY FOR TURKISH SPEECH RECOGNITION SYSTEM AND A NEW TEST PROCEDURE

A new approach to modeling cycles with summer and winter demand peaks as input variables for deep neural networks

Prediction of frictional braking noise based on brake dynamometer test and artificial intelligent algorithms

Individual variations in 'brain age' relate to early-life factors more than to longitudinal brain change.

Computational analysis of axially loaded thin-walled rectangular concrete-filled stainless steel tubular short columns incorporating local buckling effects

ProFitFun: a protein tertiary structure fitness function for quantifying the accuracies of model structures.