Resource optimization with MPI process malleability for dynamic workloads in HPC clusters

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Resource optimization with MPI process malleability for dynamic workloads in HPC clusters

Similar Papers
  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.future.2023.06.017
Making applications faster by asynchronous execution: Slowing down processes or relaxing MPI collectives
  • Jun 22, 2023
  • Future Generation Computer Systems
  • Ayesha Afzal + 3 more

Making applications faster by asynchronous execution: Slowing down processes or relaxing MPI collectives

  • Research Article
  • 10.1016/j.future.2025.107753
Energy–time modelling of distributed multi-population genetic algorithms with dynamic workload in HPC clusters
  • Jun 1, 2025
  • Future Generation Computer Systems
  • Juan José Escobar + 5 more

Energy–time modelling of distributed multi-population genetic algorithms with dynamic workload in HPC clusters

  • Research Article
  • 10.22452/mjcs.vol38no3.3
ENHANCING SMART FARMING WITH CONTAINERIZED DEEP LEARNING AND KUBERNETES: UTILIZING HIPPOPOTAMUS OPTIMIZED ATTENTION MODEL FOR PREDICTIVE AGRICULTURE
  • Dec 29, 2025
  • Malaysian Journal of Computer Science
  • Syed Humaid Hasan + 4 more

The integration of deep learning technologies into agriculture has the potential to revolutionize smart farming by enhancing efficiency, sustainability, and productivity. This study focuses on leveraging the Hippopotamus Optimized Attention Hierarchically Gated Recurrent Algorithm (HOA-HGRA) within a containerized environment to analyze and predict critical agricultural variables such as weather patterns, crop yield, and soil moisture. The proposed methodology involves containerizing deep learning models like HOA-HGRA and orchestrating them with Kubernetes on HPC clusters. This enables precise monitoring and management of crop growth, soil conditions, and livestock health, ensuring optimal resource utilization and enhanced productivity. The hyperparameters tuning and the performance optimization are performed by applying the Oppositional Hippopotamus optimization with opposition learning-based strategy. The overall performance of the AHGR-OH model is validated by utilizing the France-CGIAR BRIDGE, Smart Agriculture, Smart precision agriculture, Smart Farming Irrigation Systems, and IoT in Smart Farming Market Report datasets. Moreover, key metrics such as latency, precision, F1-score, recall, scalability, accuracy, MSE, and ROC are utilized to estimate the effectiveness of the AHGR-OH method. By comparing, the developed method grants 2s latency, 0.5 MSE, higher scalability, precision, F1-score, accuracy, and recall of 98.5%, 97.9%, 97.4%, 99.1%, and 97.9% respectively. This paper demonstrates the potential of the AHGR-OH Algorithm to revolutionize smart farming practices.

  • Conference Article
  • Cite Count Icon 22
  • 10.1145/2616498.2616532
Benefits of Cross Memory Attach for MPI libraries on HPC Clusters
  • Jul 13, 2014
  • Jerome Vienne

With the number of cores per node increasing in modern clusters, an efficient implementation of intra-node communications is critical for application performance. MPI libraries generally use shared memory mechanisms for communication inside the node, unfortunately this approach has some limitations for large messages. The release of Linux kernel 3.2 introduced Cross Memory Attach (CMA) which is a mechanism to improve the communication between MPI processes inside the same node. But, as this feature is not enabled by default inside MPI libraries supporting it, it could be left disabled by HPC administrators which leads to a loss of performance benefits to users. In this paper, we explain how to use CMA and present an evaluation of CMA using micro-benchmarks and NAS parallel benchmarks (NPB) which are a set of applications commonly used to evaluate parallel systems.Our performance evaluation reveals that CMA outperforms shared memory performance for large messages. Micro-benchmark level evaluations show that CMA can enhance the performance by as much as a factor of four. With NPB, we see up to 24.75% improvement in total execution time for FT and up to 24.08% for IS.

  • Research Article
  • 10.71097/ijsat.v16.i1.1624
Reinforcement Learning and Genetic Algorithm-Based Approach for Load Balancing and Resource Optimization in Cloud Data Centers
  • Feb 2, 2025
  • International Journal on Science and Technology
  • Swapnil R Kadam - + 2 more

This review explores the integration of Reinforcement Learning (RL) and Genetic Algorithms (GA) for load balancing and resource optimization in cloud data centers. The paper examines state-of-the-art approaches, their advantages, challenges, and potential hybrid methodologies combining RL's decision-making capabilities with GA's search optimization strengths. The survey aims to highlight how these techniques improve performance metrics like resource utilization, energy efficiency, and system reliability while addressing scalability and dynamic workload challenges.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/hoti.2017.12
MPI Process and Network Device Affinitization for Optimal HPC Application Performance
  • Aug 1, 2017
  • Ravindra Babu Ganapathi + 2 more

High Performance Computing(HPC) applications are highly optimized to maximize allocated resources for the job such as compute resources, memory and storage. Optimal performance for MPI applications requires the best possible affinity across all the allocated resources. Typically, setting process affinity to compute resources is well defined, i.e MPI processes on a compute node have processor affinity set for one to one mapping between MPI processes and the physical processing cores. Several well defined methods exist to efficiently map MPI processes to a compute node. With the growing complexity of HPC systems, platforms are designed with complex compute and I/O subsystems. Capacity of I/O devices attached to a node are expanded with PCIe switches resulting in large numbers of PCIe endpoint devices. With a lot of heterogeneity in systems, applications programmers are forced to think harder about affinitizing processesas it affects performance based on not only compute but also NUMA placement of IO devices. Mapping of process to processor cores and the closest IO device(s) is not straightforward. While operating systems do a reasonable job of trying to keep a process physically located near the processor core(s) and memory, they lack the application developer's knowledge of process workflow and optimal IO resource allocation when more than one IO device is connected to the compute node. In this paper we look at ways to assuage the problems of affinity choices by abstracting the device selection algorithm from MPI application layer. MPI continues to be the dominant programming model for HPC and hence our focus in this paper is limited to providing a solution for MPI based applications. Our solution can be extended to other HPC programming modelssuch as Partitioned Global Address Space(PGAS) or a hybrid MPI and PGAS based applications. We propose a solution to solve NUMA effects at the MPI runtime level independent of MPI applications. Our experiments are conducted on a two node system where each node consists of two socket Intel® Xeon® servers, attached with up to four Intel® Omni-Path fabric devices connected over PCIe. The performance benefits seen by MPI applications by affinitizing MPI processes with best possible network device is evident from the results where we notice up to 40% improvement in uni-directional bandwidth, 48% bi-directional bandwidth, 32% improvement in latency measurements and finally up to 40% improvement in message rate.

  • Research Article
  • 10.55730/1300-0632.3826
TARA: temperature aware online dynamic resource allocation scheme for energyoptimization in cloud data centres
  • Mar 1, 2022
  • Turkish Journal of Electrical Engineering and Computer Sciences
  • Narayanamoorthi Thilagavathi + 3 more

Cloud data centres, which are characteristic of dynamic workloads, if not optimized for energy consumption, may lead to increased heat dissipation and eventually impact the environment adversely. Consequently, optimizing the usage of energy has become a hard requirement in today's cloud data centres wherein the major part of energy consumption is mostly attributed to computing and cooling systems. Motivated by which this paper proposes an online algorithm for dynamic resource allocation, namely, temperature aware online dynamic resource allocation algorithm (TARA). TARA demonstrates a novel algorithm design to adapt dynamic resource allocation based on the temperature of a data centre using computational fluid dynamics (CFD). Also, TARA demonstrates a new dynamic resource reclaim strategy for making efficient resource allocations leading to efficient energy consumptions in dynamic environments. The proposed algorithm provides optimal resource allocation considering energy efficiency without being overwhelmed by online dynamic workloads. The optimal energy-efficient dynamic resource allocation for online workloads eventually optimizes the computing and cooling energy consumption. We show through theoretical analysis the correctness, efficiency and optimality bounds given as $TARA(P) \leq 2OPT(P)$, relative to the optimal solution provided by offline dynamic resource allocation algorithm $(OPT(P))$. We show through empirical analysis that the proposed method is efficient and significantly saves energy by 26\% when the data centre utilization is 100\% compared to batched reclaim. The performance analysis shows significant improvement in optimizing computing and cooling efficiency. TARA can be used in multiple areas of on-demand dynamic resource allocation in cloud computing like resource allocation for virtual machine creation, resource allocation for virtual machine migrations, and virtual resources assignment for elastic cloud applications.

  • Research Article
  • 10.1080/17445760.2025.2605532
AROF: adaptive resource optimization framework for Kubernetes cluster using workload forecasting
  • Dec 23, 2025
  • International Journal of Parallel, Emergent and Distributed Systems
  • Ravi Patel + 1 more

Traditional Kubernetes autoscaling struggles with dynamic workloads, causing SLA violations and inefficiency. We propose AROF: an Adaptive Resource Optimization Framework integrating hybrid workload classification (clustering+tagging), multi-horizon LSTM forecasting, and cost-aware autoscaling with tunable cost-SLA trade-offs. AROF formulates VM provisioning as a constrained optimization problem with quadratic SLA penalties, enabling fine-grained resource management. Extensive evaluation using Alibaba Cloud 2022 traces demonstrates AROF reduces SLA violations by 81% and improves cost efficiency by 22.4% compared to standard Kubernetes autoscalers, while outperforming recent proactive baselines. The framework provides a scalable, interpretable solution for intelligent resource optimization in production Kubernetes environments.

  • Research Article
  • Cite Count Icon 9
  • 10.3233/idt-220222
Resource optimization using predictive virtual machine consolidation approach in cloud environment
  • May 15, 2023
  • Intelligent Decision Technologies
  • Vaneet Garg + 1 more

The Proliferation of on-demand usage-based IT services, as well as the diverse range of cloud users, have led to the establishment of energy-hungry hefty cloud data centers. Therefore, cloud service providers are striving to reduce energy consumption for cost-saving and environmental sustainability issues of data centers. In this direction, Virtual Machine (VM) consolidated is a widely used approach to optimize hardware resources at cost of performance degradation due to unnecessary migrations. Hence, the motivation of the proposed approach is to minimize energy consumption while maintaining the performance of cloud data centers. This leads to a reduction in the overall cost and an increase in the reliability of cloud service providers. To achieve this goal Predictive Virtual Machine Consolidation (PVMC) algorithm is proposed using the exponential smoothing moving average (ESMA) method. In the proposed algorithm, the ratio of deviation to utilization is calculated for VM selection and placement. migrating the high CPU using VMs or we can restrict steady resource-consuming VMs from migration. The outcomes of the proposed algorithm are validated on computer-based simulation under a dynamic workload and a variable number of VMs (1–290). The experimental results show an improvement in the mean threshing index (40%, 45%) and instruction energy ratio (15%, 17%) over the existing policies. Hence, the proposed algorithm could be used in real-world data centers for reducing energy consumption while maintaining low service level agreement violations.

  • Research Article
  • 10.4018/joeuc.382092
The Deep Learning-Based Security Assessment and Optimization Model for Enterprise Information Systems Under Digital Economy
  • Jun 20, 2025
  • Journal of Organizational and End User Computing
  • Jin Qiu + 2 more

With the increasing complexity of enterprise systems and the rise in cyber threats, managing security risks while optimizing resources has become a significant challenge. Traditional models often address security and resource management in isolation, making it difficult to adapt to evolving threats and dynamic workloads. This paper proposes the deep learning-based dynamic security assessment and optimization model, which integrates dynamic security assessment, anomaly detection, multi-modal data fusion, security investment optimization, and cloud resource optimization into a unified framework. By leveraging deep learning techniques such as convolutional neural networks for feature extraction and recurrent neural networks for temporal anomaly detection, alongside reinforcement learning for resource optimization, the deep learning-based dynamic security assessment and optimization model provides real-time risk evaluation and adapts resource allocation based on system needs.

  • Research Article
  • 10.52783/jisem.v10i24s.3889
Adaptive Resource Optimization in Containerized Environments Using Particle Swarm Optimization and Decision Tree Classification
  • Mar 24, 2025
  • Journal of Information Systems Engineering and Management
  • Manmitsinh Chandrasinh Zala

Containerization has emerged as a powerful technology for deploying and managing applications. However, an efficient resource allocation in container-based cloud environments remains a significant challenge. This paper proposes a novel approach to adaptively optimize resource allocation using a combination of Particle Swarm Optimization (PSO) and Decision Tree Classification. PSO is employed to explore the solution space and identify optimal resource configurations. To enable predictive modelling for future resource needs, Decision Tree Classification is used to identify patterns in historical resource utilization. Through the integration of these two methods, our approach seeks to optimize performance and cost-efficiency in containerized environments by modifying resource allocation in response to dynamic workload fluctuations we have compared results with existing PSO algorithms in which our results are improved for container resource allocation.

  • Research Article
  • 10.12732/ijam.v38i5.783
ARTIFICIAL INTELLIGENCE IN CLOUD-BASED INFORMATION SYSTEMS: A COMPREHENSIVE STUDY ON INTELLIGENT RESOURCE MANAGEMENT, SCALABILITY, AND SECURITY
  • Oct 2, 2025
  • International Journal of Applied Mathematics
  • Balaji Ganesh N

This paper presents a mathematically sound structure in incorporating intelligent resource management and control of future data security of cloud based information system using a Risk-Aware Two-Timescale Actor-Critic (RA-2TAC) algorithm. In contrast to classical schedulers that are plagued by changes in the workload position that are non-stationary and malicious interruptions, the proposed approach models the task of resource allocation and intrusion-risk prevention as a constrained Markov process of decision-making allowing optimizing resource usage, energy consumption, and network security at the same time. The algorithm makes use of two-timescale primaldual policy-gradient approach: the actorconsider corroborates to dynamic workloads, whereas a more gradual dual variable processes a probabilistic security budget. We prove nearly deterministic convergence of the learning process, linear per-step complexity of the computation required to solve the problem in terms of cluster size, finite time bounds on deviation of optimal resource use as well as likely missed attack. When a synthetic evaluation is, with Markov-modulated Poisson workloads and stochastic intrusions events that use RA-2TAC substantially lowers the cumulative cost of operations by approximately 3045 per cent through that with round-robin and greedy heuristics, security penalties are consistently held below the desired target. Specifically, by integrating AI-based scheduling and formal risk constraints, the study enhances the state-of-the-art in cloud resource frameworks and is scalable and sound in researching next-generation, mission-critical cloud architecture that demands high-efficiency and dedication to combating cyber threats.

  • Research Article
  • Cite Count Icon 18
  • 10.1002/cpe.7469
Experimental performance analysis of cloud resource allocation framework using spider monkey optimization algorithm
  • Nov 4, 2022
  • Concurrency and Computation: Practice and Experience
  • Mohit Kumar + 4 more

SummaryThe cloud services demand has increased exponentially in the last decade due to its plethora of services. It becomes a significant platform to compute large and diverse applications over the internet. On the contrary, on‐demand resource allocation to a variety of applications becomes a serious issue due to dynamic workload conditions and uncertainty in the cloud environment. Several existing state of art techniques often fails to allocate the optimal resources to forthcoming demands, leading to an imbalance workload over cloud platform, degrading the performance. This article introduces a secure and self‐adaptive resource allocation framework that addressed the mentioned issues and allocates the most suitable resources to users' applications while ensuring the deadline constraints. Further, the proposed framework is integrated with a metaheuristic algorithm named enhanced spider monkey optimization algorithm that is based on the intelligent foraging behavior of spider monkeys. The proposed algorithm finds an optimal resource for the user's application using the fission‐fusion approach and improves multiple influential parameters like time, cost, degree of load balancing, energy consumption, task rejection ratio and so on. The experimental CloudSim based results verified that the proposed framework performs superior to state of art approaches like PSO, GSA, ABC, and IMMLB.

  • Research Article
  • 10.52783/pst.1664
Gaussian Based Convergence Factor with Squirrel Search Algorithm for Optimal Resource Allocation in Cloud Computing for Small Finance Organization
  • Mar 19, 2025
  • Power System Technology
  • Sidagouda Basagouda Patil

Cloud Computing (CC) have high attention because of executing the requests, accordance with user needs and gives quality services and task execution time among Virtual Machines (VM). But, the resource allocation for different applications is the problem because of dynamic workload conditions and uncertainty in cloud network. In this manuscript, the Gaussian based Convergence Factor (GCF) with Squirrel Search Algorithm (SSA) for allocating the optimal resource in cloud environment for small finance organization. The proposed GCF with SSA effectively balances the workload and allocates much appropriate resources to users’ application when ensuring a deadline constraint. The Tent chaotic map and Gaussian based Convergence Factor (GCF) are incorporated in the tradition SSA which improved the search ability and convergence rate of SSA to allocate the optimal resources in cloud for small finance organization. By selecting the correct instance types which aligns with requirements of finance organization and this involves deciding between options like on-demand, reserved or spot instances. The performance of GCF with SSA is evaluated with different metrics of implementation time, makespan, vigour ingesting and reserve utilization. The GCF with SSA reached less energy of 0.505J, less execution time of 0.472s, less makespan of 0.723s and high resource utilization of 51% for 100 tasks of 30 VMs which is efficient while compared to existing methods like Moth Search Adapted Sealion Optimization (MS-SLnO). DOI:https://doi.org/10.52783/pst.1664

  • Conference Article
  • 10.31972/iceit2024.050
Adaptive Resource Scaling Algorithm for Serverless Computing Applications
  • Feb 18, 2025
  • Mohammed Ali Awla + 2 more

Serverless computing has transformed cloud-based and event-driven applications by introducing the Function-as-a-Service (FaaS) model. This model offers key benefits, including greater abstraction from underlying infrastructure, simplified management, flexible pay-as-you-go pricing, and automatic scaling and resource optimization. However, managing resources effectively in serverless environments remains challenging due to the inherent variability and unpredictability of workload demands. This paper introduces an Adaptive Resource Scaling Algorithm (ARSA) tailored for serverless applications. ARSA leverages the Auto-Regressive Integrated Moving Average (ARIMA) model to forecast workload demands. Using these predictions alongside a strategy focused on maintaining service quality, ARSA dynamically adjusts the number of container instances needed. The goal is to optimize resource usage while minimizing the occurrence of cold starts. We validated ARSA using a real-world dataset from Microsoft Azure Functions. Our evaluation compared ARSA against fixed instance settings (one, two, and three instances) and the standard Kubernetes Horizontal Pod Auto-scaler (HPA). The results demonstrate that ARSA outperforms these baseline methods by significantly reducing number of cold starts, improving CPU utilization, decreasing memory costs, reducing the number of rejected requests, and enhancing response times. These improvements underscore ARSA’s potential in efficiently managing dynamic workloads and enhancing the performance of serverless environments.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.