Mass Spectrometry Datasets Research Articles

BackgroundThe use of artificial intelligence (AI) in the medical domain has attracted considerable research interest. Inference applications in the medical domain require energy-efficient AI models. In contrast to other types of data in visual AI, data from medical laboratories usually comprise features with strong signals. Numerous energy optimization techniques have been developed to relieve the burden on the hardware required to deploy a complex learning model. However, the energy efficiency levels of different AI models used for medical applications have not been studied.ObjectiveThe aim of this study was to explore and compare the energy efficiency levels of commonly used machine learning algorithms—logistic regression (LR), k-nearest neighbor, support vector machine, random forest (RF), and extreme gradient boosting (XGB) algorithms, as well as four different variants of neural network (NN) algorithms—when applied to clinical laboratory datasets.MethodsWe applied the aforementioned algorithms to two distinct clinical laboratory data sets: a mass spectrometry data set regarding Staphylococcus aureus for predicting methicillin resistance (3338 cases; 268 features) and a urinalysis data set for predicting Trichomonas vaginalis infection (839,164 cases; 9 features). We compared the performance of the nine inference algorithms in terms of accuracy, area under the receiver operating characteristic curve (AUROC), time consumption, and power consumption. The time and power consumption levels were determined using performance counter data from Intel Power Gadget 3.5.ResultsThe experimental results indicated that the RF and XGB algorithms achieved the two highest AUROC values for both data sets (84.7% and 83.9%, respectively, for the mass spectrometry data set; 91.1% and 91.4%, respectively, for the urinalysis data set). The XGB and LR algorithms exhibited the shortest inference time for both data sets (0.47 milliseconds for both in the mass spectrometry data set; 0.39 and 0.47 milliseconds, respectively, for the urinalysis data set). Compared with the RF algorithm, the XGB and LR algorithms exhibited a 45% and 53%-60% reduction in inference time for the mass spectrometry and urinalysis data sets, respectively. In terms of energy efficiency, the XGB algorithm exhibited the lowest power consumption for the mass spectrometry data set (9.42 Watts) and the LR algorithm exhibited the lowest power consumption for the urinalysis data set (9.98 Watts). Compared with a five-hidden-layer NN, the XGB and LR algorithms achieved 16%-24% and 9%-13% lower power consumption levels for the mass spectrometry and urinalysis data sets, respectively. In all experiments, the XGB algorithm exhibited the best performance in terms of accuracy, run time, and energy efficiency.ConclusionsThe XGB algorithm achieved balanced performance levels in terms of AUROC, run time, and energy efficiency for the two clinical laboratory data sets. Considering the energy constraints in real-world scenarios, the XGB algorithm is ideal for medical AI applications.

Read full abstract

BackgroundVaccines have little chance of destroying heterogeneous tumor cells since they rarely induce polyclonal T-cell responses against the tumor. The main challenge is the accurate identification of tumor targets recognizable by T cells. Presently, 6–8% of neoepitopes selected based on the patients‘ tumor biopsies are confirmed as real T cell targets.1 2. To overcome this limitation, we developed a computational platform called Personal Antigen Selection Calculator (PASCal) that identifies frequently presented immunogenic peptide sequences built on HLA-genetics and tumor profile of thousands of real individuals.3 Here we show the performance of PASCal for the identification of both shared and personalized tumor targets in metastatic colorectal cancer (mCRC) and breast cancer subjects.MethodsExpression frequency of the tumor-specific antigens (TSAs) ranked in PASCal’s database (based on 7,548 CRC specimen) was compared to the RNA-sequencing data of CRC tumors obtained from TCGA. Using PASCal, 12 shared PEPIs (epitopes restricted to at least 3 HLA class I alleles of a subject from an in silico cohort) derived from 7 TSAs were selected as frequent targets (calculated probability: average 2.5 [95%CI 2.4–2.8] TSAs/patient). Spontaneous immune responses against each of the twelve 9mer peptides were determined by ELISpot using PBMCs of 10 mCRC subjects who participated in the OBERTO-101 study.4 PEPIs selected for a breast cancer subject based on her HLA genotype were also tested.ResultsEach of the 106 tumors analyzed expressed at least 13, average 15 of the 20 top-ranked TSAs in PASCal’s database confirming their prevalence in CRC. 7/10 subjects had spontaneous CD8+ T-cell responses against at least one peptide selected with PASCal. Each peptide (12/12) was recognized by at least one patient. Patients‘ T-cells reacted with average 3.6/12 (30%) peptides confirming the expression of average 2.8 [95%CI 1.0–4.6] TSAs (n=10). After HLA-matching, among the subjects for whom we could select at least 4 PEPIs (average 5) from the list of 12 peptides (n=6), average 2.5 (50%) peptides were positive. Of the 12 PEPIs selected with PASCal for a breast cancer subject, we detected spontaneous T-cell responses against 9 PEPIs, indicating that at least 75% of the selected peptides were present in the subject’s tumor and were recognized by T-cells.ConclusionsPASCal platform accommodates both tumor- and patient heterogeneity and identifies non-mutated tumor targets that may trigger polyclonal cytotoxic T-cell responses. It is a rapid tool for the design of both off-the-shelf and personalized cancer vaccines negating the need for tumor biopsy.ReferencesWells DK, van Buuren MM, Dang KK, et al. Key parameters of tumor epitope immunogenicity revealed through a consortium approach improve neoantigen prediction. Cell 2020:183(3):818–34.e13.Bulik-Sullivan B, Busby J, Palmer CD, et al. Deep learning using tumor HLA peptide mass spectrometry datasets improves neoantigen identification. Nat Biotech 2018:37:55–63.Somogyi E, Csiszovszki Z, Lorincz O, et al. 1181PDPersonal antigen selection calculator (PASCal) for the design of personal cancer vaccines. Annal Oncol 2019:30(Supplement_5):v480-v81.Hubbard J, Cremolini C, Graham R, et al. P329 PolyPEPI1018 off-the shelf vaccine as add-on to maintenance therapy achieved durable treatment responses in patients with microsatellite-stable metastatic colorectal cancer patients (MSS mCRC). J ImmunoTher Cancer 2019:7(1):282.

Read full abstract

Mass Spectrometry Datasets Research Articles

Articles published on Mass Spectrometry Datasets

Likelihood-based bacterial identification approach for bimicrobial mass spectrometry data

Bayesian model calibration for vacuum-ultraviolet photoionisation mass spectrometry

Energy Efficiency of Inference Algorithms for Clinical Laboratory Data Sets: Green Artificial Intelligence Study.

Compound Dereplication and De Novo Characterization of Small Molecules by Mass Spectrometry

DbPepVar: A Novel Cancer Proteogenomics Database

A Peptide Encoded by a Long Non-Coding RNA DLX6-AS1 Facilitates Cell Proliferation, Migration, and Invasion by Activating the wnt/β-Catenin Signaling Pathway in Non-Small-Cell Lung Cancer Cell.

Mass Deconvolution of Top-Down Mass Spectrometry Datasets by FLASHDeconv.

INCA 2.0: A tool for integrated, dynamic modeling of NMR- and MS-based isotopomer measurements and rigorous metabolic flux analysis

Selection of Formal Baseline Correction Methods in Thermal Analysis

RTP: One Effective Platform to Probe Reactive Compound Transformation Products and Its Applications for a Reactive Plasticizer BADGE.

Probing the Metabolic Landscape of Plant Vascular Bundles by Infrared Fingerprint Analysis, Imaging and Mass Spectrometry.

Proteomic analysis of differential expression of lung proteins in response to highly pathogenic avian influenza virus infection in chickens

Abstract 14286: Crosstalk Between Methylglyoxal and Acetylation Modifications on Actin, Myosin, and Myosin Light Chain Proteins Affects Sarcomere Functionality

A Bi-directional Hierarchical Clustering (BHC) for Peak Matching of Large Mass Spectrometry Data Sets

65 Identification of frequently presented non-mutated tumor-specific immunogens for the development of both off-the-shelf and personalized vaccines without need for tumor biopsy

Revealing Contamination and Sequence of Overlapping Fingerprints by Unsupervised Treatment of a Hyperspectral Secondary Ion Mass Spectrometry Dataset.

Peptide Location Fingerprinting Reveals Tissue Region-Specific Differences in Protein Structures in an Ageing Human Organ.

Inter-laboratory mass spectrometry dataset based on passive sampling of drinking water for non-target analysis

Simulation Testbed for Evaluating Distributed Querying and Searching of Mass Spectrometry Big Data in a Network-based Infrastructure.

Proteomic Analysis of the Meniscus Cartilage in Osteoarthritis

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Mass Spectrometry Datasets Research Articles

Articles published on Mass Spectrometry Datasets

Likelihood-based bacterial identification approach for bimicrobial mass spectrometry data

Bayesian model calibration for vacuum-ultraviolet photoionisation mass spectrometry

Energy Efficiency of Inference Algorithms for Clinical Laboratory Data Sets: Green Artificial Intelligence Study.

Compound Dereplication and De Novo Characterization of Small Molecules by Mass Spectrometry

DbPepVar: A Novel Cancer Proteogenomics Database

A Peptide Encoded by a Long Non-Coding RNA DLX6-AS1 Facilitates Cell Proliferation, Migration, and Invasion by Activating the wnt/β-Catenin Signaling Pathway in Non-Small-Cell Lung Cancer Cell.

Mass Deconvolution of Top-Down Mass Spectrometry Datasets by FLASHDeconv.

INCA 2.0: A tool for integrated, dynamic modeling of NMR- and MS-based isotopomer measurements and rigorous metabolic flux analysis

Selection of Formal Baseline Correction Methods in Thermal Analysis

RTP: One Effective Platform to Probe Reactive Compound Transformation Products and Its Applications for a Reactive Plasticizer BADGE.

Probing the Metabolic Landscape of Plant Vascular Bundles by Infrared Fingerprint Analysis, Imaging and Mass Spectrometry.

Proteomic analysis of differential expression of lung proteins in response to highly pathogenic avian influenza virus infection in chickens

Abstract 14286: Crosstalk Between Methylglyoxal and Acetylation Modifications on Actin, Myosin, and Myosin Light Chain Proteins Affects Sarcomere Functionality

A Bi-directional Hierarchical Clustering (BHC) for Peak Matching of Large Mass Spectrometry Data Sets

65 Identification of frequently presented non-mutated tumor-specific immunogens for the development of both off-the-shelf and personalized vaccines without need for tumor biopsy

Revealing Contamination and Sequence of Overlapping Fingerprints by Unsupervised Treatment of a Hyperspectral Secondary Ion Mass Spectrometry Dataset.

Peptide Location Fingerprinting Reveals Tissue Region-Specific Differences in Protein Structures in an Ageing Human Organ.

Inter-laboratory mass spectrometry dataset based on passive sampling of drinking water for non-target analysis

Simulation Testbed for Evaluating Distributed Querying and Searching of Mass Spectrometry Big Data in a Network-based Infrastructure.

Proteomic Analysis of the Meniscus Cartilage in Osteoarthritis