Multi-core Machines Research Articles

Background: Machine learning can augment clinical decision support, especially in cases where complex diagnostic criteria are necessary. In making a diagnosis of LGL leukemia, a pathologist must review the peripheral smear thoroughly, which is a tedious process. Supervised machine learning has been successfully implemented in such cases by training on very large image-based datasets, followed by a corrective phase of updating the model weights and finally demonstrating a high degree of successful predictions in an independent dataset. We sought to assess whether a supervised machine learning model can automate histopathological image analysis and provide a degree of confidence in supporting a pathologist's manual review. Methods: We searched the published literature from 2016-2022 for high-definition peripheral smear images with confirmed cases. The samples were downloaded, pre-processed with the Segment Anything Model (SAM), and labeled as training, testing, or validation data. Non-pathological controls, defined as peripheral smears confirmed negative for LGLL, were also obtained from published literature and labeled manually. The neural network was written and trained on TensorFlow using the open-source Keras library with parallel threading and on a multicore GPU machine. The resulting predictive model had a binary classification task: Does the presented image have cells that resemble LGL cells, returning true or false. Every “true” label returned from the classifier also creates a stack of probable positives that need pathology review. Feedback from each pathologist review further updates the model weights to increase the probability of successful prediction in the future. Results: We screened 57 publications to obtain 163 high-definition peripheral smear images that had confirmed LGLL and 100 non-pathological smears. Two cohorts were created: 200 smears in a training cohort comprised of 100 LGLL smears and 100 non-pathological smears, and 63 smears in the validation cohort. To remain agnostic towards image orientation, we created 7 views of each smear, and 1400 total images underwent feature extraction using SAM. The resulting data was labeled and used for model training and our process is depicted visually below in Figure 1. During validation, our model accurately predicted LGL-cells in 56/63 cases with an 8% false positive rate, 2% false negatives and 2% unclassifiable. After updating model weights and re-training with Adaptive Boosting, the accuracy increased to 60/63 smears with a false-positive rate of 3%. Our trained classification model achieved an AUC of 0.8144. Conclusion: Supervised machine learning model with Adaptive Boosting can be used to label a peripheral smear image with a high likelihood of containing LGL cells and, in turn, help a pathologist review smears with higher priority in a timely fashion. Our initial training data was limited, which hindered the performance of our model; however, with a sufficiently large training dataset, we can drastically improve our accuracy. A more extensive training set with robust hardware can transform our image-recognition model into a foundational model where other clinical parameters can be added, and the final diagnostic probability will be based on multiple data streams. Our work takes a meaningful step in this direction, and future work at Karmanos Cancer Center will focus on extending our classifier to more sophisticated deep-learning algorithms. Our open-source platform will eventually support several niches in the future that are currently only served by commercial applications.

Read full abstract

Background: Prediction models to support clinical decision-making are an integral part of medicine. Image recognition and diagnosis are essential in multiple diseases, especially malignant hematologic diseases, where histopathological images are reviewed manually, and an opportunity to automate image analysis with diagnosis-supporting tools exists. Machine learning (ML) is a process where an algorithm creates a predictive model by learning from training data and uncovering relationships between input variables (X) and output variables (Y). For this, a training dataset with labeled examples is necessary. After the learning has occurred, a predictive model is generated that can now be used to make predictions on new input that the model has never seen before. To train a machine learning model appropriately, all input should be annotated in a standard manner, and this process is very time-consuming. One way to make it simpler is by using a schema. A schema is a collected set of rules that must be followed each time in describing the relevant features of an image. For every training image in a dataset, the relevant components are labeled manually to fit the schema and reviewed by trained human examiners, making the process tedious, time-consuming, and difficult to standardize. We created a programmatic approach to annotation where a machine learning algorithm can detect significant features in AML images, annotate the images with features that fit a schema, and feed them forward into training data. Moreover, we have created this on an open-source platform that can be widely used in a resource-limited setting across the globe. Methods: A supervised machine learning approach with a convolutional neural network (CNN) was used for image processing based on a dataset that contained 270 pathological images (confirmed AML) and 30 non-pathological images (no AML) from the public Munich AML Morphology Dataset. The pathological images were from peripheral blood smears of patients diagnosed with AML at Munich University Hospital between 2014 and 2017. The non-pathological controls were taken from patients without hematological malignancy. Initially, for training, all annotation features from the Munich AML dataset are included for both pathological and non-pathological images. The neural network was written and trained on open-source software, TensorFlow using the open-source Keras library with parallel threading and on a multicore GPU machine. The resulting predictive model answers two questions: Does a given image have cells that resemble a blast character, or if the image has cells that belong to non-pathological blood smears. Results: The neural network trained on 300 total images, and for external validation, we used another subset of 100 unlabeled images from the Munich AML Morphology Dataset. Our model could predict blast-like features accurately in 89% of the new images, 6% of new images were unclassifiable, the false positive rate was 3%, and the false negative rate was 2%. After initial prediction, the neural network could accurately annotate unlabeled images 86% of the time in pathological samples and 96% in non-pathological samples. Our training model achieved an AUC of 0.82. Conclusion: A CNN trained on a modest dataset with supervised learning, enhanced with ensemble learning and K-fold cross-validation, can be used to recognize features such as blast cells from histopathological images and label images with a high degree of accuracy. In a data-driven machine learning algorithm such as a neural network, classification performance significantly increases with more available training sample images. Therefore, a more extensive training dataset with more robust hardware is necessary to generate a more sophisticated predictive model. Using CNN-enhancing methodologies can allow for model training in a resource-limiting setting. Future work will focus on using a more extensive pre-trained database to evaluate the performance of our network in a real-world setting. The annotation framework can be expanded to include disease-associated features for use in other domains, such as hematological education, patient resources, and patient education. This open-source platform can support several niches in the future that are currently only served by expensive commercial applications. Figure 1View largeDownload PPTFigure 1View largeDownload PPT Close modal

Read full abstract

Multi-core Machines Research Articles

Related Topics

Articles published on Multi-core Machines

Efficient Workflow Scheduling for Minimizing Data Transfers and Enhancing Resource Utilization in Cloud IaaS Platforms

Load forecasting based on multi-core learning Support Vector Machine (SVM)

Reproducibility Report for the Paper: "Spatial/Temporal Locality-based Load-sharing in Speculative Discrete Event Simulation on Multi-core Machines"

A Conflict-Resilient Lock-Free Linearizable Calendar Queue

Efficiency of Various Tiling Strategies for the Zuker Algorithm Optimization

Formula omitted]: A simplified and abstract multicore hardware model for large scale system software formal verification

Automated Peripheral Smear Recognition and Classification of LGL Leukemia Using Machine Learning

Exploring temporal community evolution: algorithmic approaches and parallel optimization for dynamic community detection

Time–Energy Correlation for Multithreaded Matrix Factorizations

Characterization of Timing-based Software Side-channel Attacks and Mitigations on Network-on-Chip Hardware

Adaptively parallel runtime verification based on distributed network for temporal properties

A Parallel Computing Approach to Gene Expression and Phenotype Correlation for Identifying Retinitis Pigmentosa Modifiers in Drosophila

NPDP benchmark suite for the evaluation of the effectiveness of automatic optimizing compilers

Multi-modal feature selection with self-expression topological manifold for end-stage renal disease associated with mild cognitive impairment.

Design and Programming for Multicore machines: An Empirical study on time and effort required by programmer

A Distributed Network-Based Runtime Verification of Full Regular Temporal Properties

A Machine Learning Technique for Abstraction of Modules in Legacy System and Assigning them on Multicore Machines Using and Controlling p-threads

Predicting cardiac arrhythmia on ECG signal using an ensemble of optimal multicore support vector machines

Picasso: An Open-Source Machine Learning Schema for Annotating Images in Hematology

Pardinus: A Temporal Relational Model Finder

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multi-core Machines Research Articles

Related Topics

Articles published on Multi-core Machines

Efficient Workflow Scheduling for Minimizing Data Transfers and Enhancing Resource Utilization in Cloud IaaS Platforms

Load forecasting based on multi-core learning Support Vector Machine (SVM)

Reproducibility Report for the Paper: "Spatial/Temporal Locality-based Load-sharing in Speculative Discrete Event Simulation on Multi-core Machines"

A Conflict-Resilient Lock-Free Linearizable Calendar Queue

Efficiency of Various Tiling Strategies for the Zuker Algorithm Optimization

Formula omitted]: A simplified and abstract multicore hardware model for large scale system software formal verification

Automated Peripheral Smear Recognition and Classification of LGL Leukemia Using Machine Learning

Exploring temporal community evolution: algorithmic approaches and parallel optimization for dynamic community detection

Time–Energy Correlation for Multithreaded Matrix Factorizations

Characterization of Timing-based Software Side-channel Attacks and Mitigations on Network-on-Chip Hardware

Adaptively parallel runtime verification based on distributed network for temporal properties

A Parallel Computing Approach to Gene Expression and Phenotype Correlation for Identifying Retinitis Pigmentosa Modifiers in Drosophila

NPDP benchmark suite for the evaluation of the effectiveness of automatic optimizing compilers

Multi-modal feature selection with self-expression topological manifold for end-stage renal disease associated with mild cognitive impairment.

Design and Programming for Multicore machines: An Empirical study on time and effort required by programmer

A Distributed Network-Based Runtime Verification of Full Regular Temporal Properties

A Machine Learning Technique for Abstraction of Modules in Legacy System and Assigning them on Multicore Machines Using and Controlling p-threads

Predicting cardiac arrhythmia on ECG signal using an ensemble of optimal multicore support vector machines

Picasso: An Open-Source Machine Learning Schema for Annotating Images in Hematology

Pardinus: A Temporal Relational Model Finder