Initial Dataset Research Articles

Lung cancer is the deadliest and second most common cancer in the United States due to the lack of symptoms for early diagnosis. Pulmonary nodules are small abnormal regions that can be potentially correlated to the occurrence of lung cancer. Early detection of these nodules is critical because it can significantly improve the patient's survival rates. Thoracic thin-sliced computed tomography (CT) scanning has emerged as a widely used method for diagnosing and prognosis lung abnormalities. The standard clinical workflow of detecting pulmonary nodules relies on radiologists to analyze CT images to assess the risk factors of cancerous nodules. However, this approach can be error-prone due to the various nodule formation causes, such as pollutants and infections. Deep learning (DL) algorithms have recently demonstrated remarkable success in medical image classification and segmentation. As an ever more important assistant to radiologists in nodule detection, it is imperative ensure the DL algorithm and radiologist to better understand the decisions from each other. This study aims to develop a framework integrating explainable AI methods to achieve accurate pulmonary nodule detection. A robust and explainable detection (RXD) framework is proposed, focusing on reducing false positives in pulmonary nodule detection. Its implementation is based on an explanation supervision method, which uses nodule contours of radiologists as supervision signals to force the model to learn nodule morphologies, enabling improved learning ability on small dataset, and enable small dataset learning ability. In addition, two imputation methods are applied to the nodule region annotations to reduce the noise within human annotations and allow the model to have robust attributions that meet human expectations. The 480, 265, and 265 CT image sets from the public Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) dataset are used for training, validation, and testing. Using only 10, 30, 50, and 100 training samples sequentially, our method constantly improves the classification performance and explanation quality of baseline in terms of Area Under the Curve (AUC) and Intersection over Union (IoU). In particular, our framework with a learnable imputation kernel improves IoU from baseline by 24.0% to 80.0%. A pre-defined Gaussian imputation kernel achieves an even greater improvement, from 38.4% to 118.8% from baseline. Compared to the baseline trained on 100 samples, our method shows less drop in AUC when trained on fewer samples. A comprehensive comparison of interpretability shows that our method aligns better with expert opinions. A pulmonary nodule detection framework was demonstrated using public thoracic CT image datasets. The framework integrates the robust explanation supervision (RES) technique to ensure the performance of nodule classification and morphology. The method can reduce the workload of radiologists and enable them to focus on the diagnosis and prognosis of the potential cancerous pulmonary nodules at the early stage to improve the outcomes for lung cancer patients.

Read full abstract

The steel industry is typical process manufacturing, and the quality and cost of the products can be improved by efficient operation of equipment. This paper proposes an efficient diagnosis and monitoring method for the gearbox, which is a key piece of mechanical equipment in steel manufacturing. In particular, an equipment maintenance plan for stable operation is essential. Therefore, equipment monitoring and diagnosis to prevent unplanned plant shutdowns are important to operate the equipment efficiently and economically. Most plant data collected on-site have no precise information about equipment malfunctions. Therefore, it is difficult to directly apply supervised learning algorithms to diagnose and monitor the equipment with the operational data collected. The purpose of this paper is to propose a pseudo-label method to enable supervised learning for equipment data without labels. Pseudo-normal (PN) and pseudo-abnormal (PA) vibration datasets are defined and labeled to apply classification analysis algorithms to unlabeled equipment data. To find an anomalous state in the equipment based on vibration data, the initial PN vibration dataset is compared with a PA vibration dataset collected over time, and the equipment is monitored for potential failure. Continuous wavelet transform (CWT) is applied to the vibration signals collected to obtain an image dataset, which is then entered into a convolutional neural network (an image classifier) to determine classification accuracy and detect equipment abnormalities. As a result of Steps 1 to 4, abnormal signals have already been detected in the dataset, and alarms and warnings have already been generated. The classification accuracy was over 0.95 at d=4, confirming quantitatively that the status of the equipment had changed significantly. In this way, a catastrophic failure can be avoided by performing a detailed equipment inspection in advance. Lastly, a catastrophic failure occurred in Step 9, and the classification accuracy ranged from 0.95 to 1.0. It was possible to prevent secondary equipment damage, such as motors connected to gearboxes, by identifying catastrophic failures promptly. This case study shows that the proposed procedure gives good results in detecting operation abnormalities of key unit equipment. In the conclusion, further promising topics are discussed.

Read full abstract

Initial Dataset Research Articles

Articles published on Initial Dataset

The Effects of Genetic Distance and Genetic Diversity on Genomic Prediction Accuracy for Soybean Quantitative Disease Resistance to Phytophthora sojae

Systematic comparison of 3D Deep learning and classical machine learning explanations for Alzheimer’s Disease detection

Positive Mass Theorems for Spin Initial Data Sets With Arbitrary Ends and Dominant Energy Shields

Personalized Prediction of Parkinson's Disease Progression Based on Deep Gaussian Processes.

DATA SCIENCE: Data Visualization and Data Analytics in the Process of Data Mining

PelaSIG, a QGIS plugin for marine megafauna census: application to the aerial ACCOBAMS Survey Initiative dataset

Modeling genotype-protein interaction and correlation for Alzheimer's disease: a multi-omics imaging genetics study.

ESTSS—energy system time series suite: a declustered, application-independent, semi-artificial load profile benchmark set

An AI-based framework for earthquake relief demand forecasting: A case study in Türkiye

Multimodal diagnosis model of Alzheimer’s disease based on improved Transformer

Integrating Ensemble Weather Predictions in a Hydrologic-Hydraulic Modelling System for Fine-Resolution Flood Forecasting: The Case of Skala Bridge at Evrotas River, Greece

Machine learning assisted discovery of high-efficiency self-healing epoxy coating for corrosion protection

Machine Learning for Predicting Neurodevelopmental Disorders in Children

Ionic liquid-assisted preparation of Ag–Zn–In–S quaternary quantum dot thin films and luminescence performance optimized by machine learning

Robust explanation supervision for false positive reduction in pulmonary nodule detection.

Pengelolaan Daun Kering untuk Pupuk Kompos

Machine learning algorithms-based decision support model for diabetes

Anomaly detection in feature space for detecting changes in phytoplankton populations

An optimization framework for wind farm layout design using CFD-based Kriging model

Hot Strip Mill Gearbox Monitoring and Diagnosis Based on Convolutional Neural Networks Using the Pseudo-Labeling Method

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Initial Dataset Research Articles

Articles published on Initial Dataset

The Effects of Genetic Distance and Genetic Diversity on Genomic Prediction Accuracy for Soybean Quantitative Disease Resistance to Phytophthora sojae

Systematic comparison of 3D Deep learning and classical machine learning explanations for Alzheimer’s Disease detection

Positive Mass Theorems for Spin Initial Data Sets With Arbitrary Ends and Dominant Energy Shields

Personalized Prediction of Parkinson's Disease Progression Based on Deep Gaussian Processes.

DATA SCIENCE: Data Visualization and Data Analytics in the Process of Data Mining

PelaSIG, a QGIS plugin for marine megafauna census: application to the aerial ACCOBAMS Survey Initiative dataset

Modeling genotype-protein interaction and correlation for Alzheimer's disease: a multi-omics imaging genetics study.

ESTSS—energy system time series suite: a declustered, application-independent, semi-artificial load profile benchmark set

An AI-based framework for earthquake relief demand forecasting: A case study in Türkiye

Multimodal diagnosis model of Alzheimer’s disease based on improved Transformer

Integrating Ensemble Weather Predictions in a Hydrologic-Hydraulic Modelling System for Fine-Resolution Flood Forecasting: The Case of Skala Bridge at Evrotas River, Greece

Machine learning assisted discovery of high-efficiency self-healing epoxy coating for corrosion protection

Machine Learning for Predicting Neurodevelopmental Disorders in Children

Ionic liquid-assisted preparation of Ag–Zn–In–S quaternary quantum dot thin films and luminescence performance optimized by machine learning

Robust explanation supervision for false positive reduction in pulmonary nodule detection.

Pengelolaan Daun Kering untuk Pupuk Kompos

Machine learning algorithms-based decision support model for diabetes

Anomaly detection in feature space for detecting changes in phytoplankton populations

An optimization framework for wind farm layout design using CFD-based Kriging model

Hot Strip Mill Gearbox Monitoring and Diagnosis Based on Convolutional Neural Networks Using the Pseudo-Labeling Method