Heart Attack Risk Prediction via Stacked Ensemble Metamodeling: A Machine Learning Framework for Real-Time Clinical Decision Support
Cardiovascular diseases claim millions of lives each year, yet timely diagnosis remains a significant challenge due to the high number of patients and associated costs. Although various machine learning solutions have been proposed for this problem, most approaches rely on careful data preprocessing and feature engineering workflows that could benefit from more comprehensive documentation in research publications. To address this issue, this paper presents a machine learning framework for predicting heart attack risk online. Our systematic methodology integrates a unified pipeline featuring advanced data preprocessing, optimized feature selection, and an exhaustive hyperparameter search using cross-validated grid evaluation. We employ a metamodel ensemble strategy, testing and combining six traditional supervised models along with six stacking and voting ensemble models. The proposed system achieves accuracies ranging from 90.2% to 98.9% on three independent clinical datasets, outperforming current state-of-the-art methods. Additionally, it powers a deployable, lightweight web application for real-time decision support. By merging cutting-edge AI with clinical usability, this work offers a scalable solution for early intervention in cardiovascular care.
- Research Article
7
- 10.1136/bmjqs-2022-015713
- Nov 23, 2023
- BMJ Quality & Safety
Machine learning (ML) solutions are increasingly entering healthcare. They are complex, sociotechnical systems that include data inputs, ML models, technical infrastructure and human interactions. They have promise for improving care...
- Conference Article
2
- 10.1109/iri49571.2020.00029
- Aug 1, 2020
Machine learning is increasingly adopted in manufacturing use cases, e.g., for fault detection in a production line. Each new use case requires developing its own machine learning (ML) solution. A ML solution integrates different software components to read, process, and analyze all use case data, as well as to finally generate the output that domain experts need for their decision-making. The process to design a system specification for a ML solution is not straight-forward. It entails two types of complexity: (1) The technical complexity of selecting combinations of ML algorithms and software components that suit a use case; (2) the organizational complexity of integrating different requirements from a multidisciplinary team of, e.g., domain experts, data scientists, and IT specialists. In this paper, we propose several adaptations to Axiomatic Design in order to design ML solution specifications that handle these complexities. We call this Axiomatic Design for Machine Learning (AD4ML). We apply AD4ML to specify a ML solution for a fault detection use case and discuss to what extent our approach conquers the above-mentioned complexities. We also discuss how AD4ML facilitates the agile design of ML solutions.
- Research Article
3
- 10.1007/s41060-023-00417-5
- Jul 17, 2023
- International Journal of Data Science and Analytics
The adoption of machine learning (ML) in organizations is characterized by the use of multiple ML software components. When building ML systems out of these software components, citizen data scientists face practical requirements which go beyond the known challenges of ML, e. g., data engineering or parameter optimization. They are expected to quickly identify ML system options that strike a suitable trade-off across multiple performance criteria. These options also need to be understandable for non-technical users. Addressing these practical requirements represents a problem for citizen data scientists with limited ML experience. This calls for a concept to help them identify suitable ML software combinations. Related work, e. g., AutoML systems, are not responsive enough or cannot balance different performance criteria. This paper explains how AssistML, a novel concept to recommend ML solutions, i. e., software systems with ML models, can be used as an alternative for predictive use cases. Our concept collects and preprocesses metadata of existing ML solutions to quickly identify the ML solutions that can be reused in a new use case. We implement AssistML and evaluate it with two exemplary use cases. Results show that AssistML can recommend ML solutions in line with users’ performance preferences in seconds. Compared to AutoML, AssistML offers citizen data scientists simpler, intuitively explained ML solutions in considerably less time. Moreover, these solutions perform similarly or even better than AutoML models.
- Research Article
2
- 10.1016/j.jpdc.2021.10.008
- Nov 4, 2021
- Journal of Parallel and Distributed Computing
Algorithms for addressing line-of-sight issues in mmWave WiFi networks using access point mobility
- Conference Article
5
- 10.1109/itc44170.2019.9000109
- Nov 1, 2019
A machine learning (ML) solution can be non-robust and when it is deployed, can make mistakes on the future unseen data. Consequently, deployment of a ML solution might demand continuous service from its ML developer. Using wafer image classification as an example, this paper presents the design of a ML solution where its deployment is facilitated by the continuous service from its ML expert.
- Book Chapter
22
- 10.1007/978-3-030-62144-5_6
- Jan 1, 2020
Although machine learning (ML) solutions are prevalent, in order for them to be truly ‘business-grade’ and reliable, their performance must be shown to be robust for many different data subsets observations with similar feature values (which we call ‘slices’) which they are expected to encounter in deployment. However, ML solutions are often evaluated only based on aggregate performance (e.g., overall accuracy) and not on the variability on various slices. For example, a text classifier deployed on bank terms may have very high accuracy (e.g., \(98\% \pm 2\%\)) but might perform poorly for the data slice of terms that include short descriptions and originate from commercial accounts. Yet a business requirement may be for the classifier to perform well regardless of the text characteristics. In previous work [1] we demonstrated the effectiveness of using feature-based analysis to highlight such gaps in performance assessment. Here we demonstrate a novel technique, called IBM FreaAI, which automatically extracts explainable feature slices for which the ML solution’s performance is statistically significantly worse than the average. We demonstrate results of evaluating ML classifier models on seven open datasets.
- Research Article
8
- 10.1007/s10916-023-01928-1
- Jan 1, 2023
- Journal of Medical Systems
The self-proclaimed first publicly available dataset of Monkeypox skin images consists of medically irrelevant images extracted from Google and photography repositories through a process denominated web-scrapping. Yet, this did not stop other researchers from employing it to build Machine Learning (ML) solutions aimed at computer-aided diagnosis of Monkeypox and other viral infections presenting skin lesions. Neither did it stop the reviewers or editors from publishing these subsequent works in peer-reviewed journals. Several of these works claimed extraordinary performance in the classification of Monkeypox, Chickenpox and Measles, employing ML and the aforementioned dataset. In this work, we analyse the initiator work that has catalysed the development of several ML solutions, and whose popularity is continuing to grow. Further, we provide a rebuttal experiment that showcases the risks of such methodologies, proving that the ML solutions do not necessarily obtain their performance from the features relevant to the diseases at issue.
- Conference Article
1
- 10.1109/dsaa53316.2021.9564168
- Oct 6, 2021
The adoption of machine learning (ML) in organizations is characterized by the use of multiple ML software components. Citizen data scientists face practical requirements when building ML systems, which go beyond the known challenges of ML, e.g., data engineering or parameter optimization. They are expected to quickly identify ML system options that strike a suitable trade-off across multiple performance criteria. These options also need to be understandable for non-technical users. Addressing these practical requirements represents a problem for citizen data scientists with limited ML experience. This calls for a method to help them identify suitable ML software combinations. Related work, e.g., AutoML systems, are not responsive enough or cannot balance different performance criteria. In this paper, we introduce AssistML, a novel concept to recommend ML solutions, i.e., software systems with ML models, for predictive use cases. AssistML uses metadata of existing ML solutions to quickly identify and explain options for a new use case. We implement the approach and evaluate it with two exemplary use cases. Results show that AssistML proposes ML solutions that are in line with users' performance preferences in seconds.
- Conference Article
4
- 10.1145/3511808.3557195
- Oct 17, 2022
Data analytics including machine learning (ML) is essential to extract insights from production data in modern industries. However, industrial ML is affected by: the low transparency of ML towards non-ML experts; poor and non-unified descriptions of ML practices for reviewing or comprehension; ad-hoc fashion of ML solutions tailored to specific applications, which affects their re-usability. To address these challenges, we propose the concept and a system of executable knowledge graph (KG), which represent KGs that rely on semantic technologies to formally encode ML knowledge and solutions. These KGs can be translated to executable scripts in a reusable and modularised fashion. The demo attendees will use our system to modify, integrate and create executable KGs via a graphic user interface, which offer a user-friendly way to understand, configure, reuse, and create data analytics pipelines.
- Conference Article
- 10.2118/217294-ms
- Nov 12, 2023
The objective of this study is to summarize a proven solution workflow to address the challenges to handle the high volume of well tests daily incorporating information from operational activities, and especially, potential delays and errors in validation impacting other dependent business processes. The proposed solution aims to reduce processing time, minimize human error, and enhance accuracy in well test analysis. Having up-to-date and reliable well test data, engineers can improve engineering workflows, and optimize production. The solution covers data consumption, data preparation, machine learning (ML) solution, cooperating with dependent business processes, deployment and retrain strategy. The ML solution learns from historical well test data with accepted and rejected flag to build a rule-based deterministic ML model to automatically validate and detect the invalid well test with probability. The solution does not only consume structure data but also textual data with natural language processing (NLP), such as well test comments provided by well testing engineers and operational activities in Daily Operational Reports (DORs). Data consumption, operational activities, dependent workflow control are customizable based on different projects. Retrain strategy is based on model prediction accuracy trend and defined during deployment. The solution triggers insights with confidence scores, suggesting acceptance/rejection or review of new well tests. Early detection of possible rejections enables timely actions, including retesting if necessary. The solution was implemented and significantly reduces well test validation time from weeks to hours, enhancing the accuracy of production analysis and optimizations. The data-driven approach offers flexibility and adaptability to meet operation needs, presenting a robust alternative to rule-based validation. By integrating ML and NLP, the solution provides a comprehensive and efficient framework for well test validation, improving decision-making and ensuring compliance with Standard Operation Procedure (SOP). This study introduces a novel approach to well test validation by leveraging ML and NLP. By considering both historical data and manual operational event inputs from engineers, the solution enhances the accuracy and efficiency of the validation process. It contributes to improved production performance analysis, diagnostics, and issue detection. The solution deployment can be customized and adaptable to different data storage and availability, to automate well test validation process in the oil and gas industry.
- Research Article
20
- 10.1016/j.jcsr.2020.106394
- Oct 12, 2020
- Journal of Constructional Steel Research
Strength prediction of steel CHS X-joints via leveraging finite element method and machine learning solutions
- Conference Article
22
- 10.1145/3338906.3340442
- Aug 12, 2019
Machine Learning (ML) based solutions are becoming increasingly popular and pervasive. When testing such solutions, there is a tendency to focus on improving the ML metrics such as the F1-score and accuracy at the expense of ensuring business value and correctness by covering business requirements. In this work, we adapt test planning methods of classical software to ML solutions. We use combinatorial modeling methodology to define the space of business requirements and map it to the ML solution data, and use the notion of data slices to identify the weaker areas of the ML solution and strengthen them. We apply our approach to three real-world case studies and demonstrate its value.
- Research Article
298
- 10.1109/comst.2021.3053118
- Jan 1, 2021
- IEEE Communications Surveys & Tutorials
The Internet of Underwater Things (IoUT) is an emerging communication ecosystem developed for connecting underwater objects in maritime and underwater environments. The IoUT technology is intricately linked with intelligent boats and ships, smart shores and oceans, automatic marine transportations, positioning and navigation, underwater exploration, disaster prediction and prevention, as well as with intelligent monitoring and security. The IoUT has an influence at various scales ranging from a small scientific observatory, to a midsized harbor, and to covering global oceanic trade. The network architecture of IoUT is intrinsically heterogeneous and should be sufficiently resilient to operate in harsh environments. This creates major challenges in terms of underwater communications, whilst relying on limited energy resources. Additionally, the volume, velocity, and variety of data produced by sensors, hydrophones, and cameras in IoUT is enormous, giving rise to the concept of Big Marine Data (BMD), which has its own processing challenges. Hence, conventional data processing techniques will falter, and bespoke Machine Learning (ML) solutions have to be employed for automatically learning the specific BMD behavior and features facilitating knowledge extraction and decision support. The motivation of this paper is to comprehensively survey the IoUT, BMD, and their synthesis. It also aims for exploring the nexus of BMD with ML. We set out from underwater data collection and then discuss the family of IoUT data communication techniques with an emphasis on the state-of-the-art research challenges. We then review the suite of ML solutions suitable for BMD handling and analytics. We treat the subject deductively from an educational perspective, critically appraising the material surveyed.
- Research Article
22
- 10.1109/ojvt.2021.3110134
- Jan 1, 2021
- IEEE Open Journal of Vehicular Technology
Artificial intelligence and data-driven networks will be integral part of 6G systems. In this article, we comprehensively discuss implementation challenges and need for architectural changes in 5G radio access networks for integrating machine learning (ML) solutions. As an example use case, we investigate user equipment (UE) positioning assisted by deep learning (DL) in 5G and beyond networks. As compared to state of the art positioning algorithms used in today's networks, radio signal fingerprinting and machine learning (ML) assisted positioning requires smaller additional feedback overhead; and the positioning estimates are made directly inside the radio access network (RAN), thereby assisting in radio resource management. In this regard, we study ML-assisted positioning methods and evaluate their performance using system level simulations for an outdoor scenario. The study is based on the use of raytracing tool, a 3GPP 5G NR compliant system level simulator and DL framework to estimate positioning accuracy of the UE. We evaluate and compare performance of various DL models and show mean positioning error in the range of 1-1.5 m for a 2-hidden layer DL architecture with appropriate feature-modeling. Building on our performance analysis, we discuss pros and cons of various architectures to implement ML solutions for future networks and draw conclusions on the most suitable architecture.
- Research Article
7
- 10.1016/j.path.2022.08.001
- Nov 4, 2022
- Surgical Pathology Clinics
Applications of Digital and Computational Pathology and Artificial Intelligence in Genitourinary Pathology Diagnostics
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.