Related Topics
Articles published on CPU Cycles
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
321 Search results
Sort by Recency
- Research Article
- 10.1038/s41598-026-48152-8
- Apr 13, 2026
- Scientific reports
- Mohammed H Alsharif + 7 more
As smart city infrastructures evolve, they generate massive volumes of spatiotemporal data from diverse sources such as surveillance cameras, wearable health monitors, drones, and environmental sensors. Efficient fusion of this data is crucial for applications that demand ultra-low latency (typically between 2 ms and 5 ms), to support time-sensitive operations. However, the heterogeneity of data sources, coupled with constraints on communication bandwidth and computational resources, poses a significant challenge to maintaining such stringent latency and energy efficiency standards. This study proposes a joint optimization framework for multi-source spatiotemporal data fusion that dynamically allocates bandwidth and CPU cycles to minimize a weighted objective of latency and energy consumption under realistic wireless channel conditions and strict resource constraints. The results demonstrate that the proposed method consistently delivers low latency and energy-efficient performance across all data sources. It outperforms traditional equal and delay-tolerant strategies by significantly reducing both latency and energy consumption. This efficiency, combined with the framework's robustness andscalability, makes it highly suitable for smart city applications.
- Research Article
- 10.66280/ijair.v1i1.8
- Mar 2, 2026
- International Journal of Artificial Intelligence Research
- Daniel R Whitman + 1 more
Resource allocation in constrained digital and cyber-physical systems (CPS) increasingly must satisfy two competing requirements: near-real-time performance under tight compute, network, and energy budgets, and transparent fairness guarantees across heterogeneous users, applications, and control loops. This paper develops a system-oriented optimization framework for fair and intelligent resource allocation that unifies (i) operational constraints typical of em- bedded and edge platforms (limited CPU cycles, shared wireless bandwidth, and energy caps), (ii) stability- and safety-relevant constraints arising from closed-loop CPS dynamics, and (iii) fairness criteria that are meaningful for both digital services (throughput/latency parity) and physical processes (risk- and constraint-violation parity). We cast the problem as a constrained stochastic program with time-coupled dynamics and propose a modular approach that combines a predictive layer for short-horizon demand/dynamics estimation with a primal–dual allocation layer enforcing feasibility and fairness via Lagrange multipliers. The method supports multiple fairness notions—max–min, proportional, and risk-sensitive fairness—and exposes their trade- offs with latency, energy, and control performance. Using a suite of representative case studies (edge inference serving, wireless scheduling for mixed-criticality traffic, and networked control with shared computation), we demonstrate that fairness constraints can be enforced with modest efficiency loss when the allocation mechanism is explicitly co-designed with system constraints. We also identify failure modes in which naive fairness regularization destabilizes control or am- plifies queueing delay, motivating a set of practical design rules for deploying fairness-aware optimization in constrained CPS.
- Research Article
- 10.47839/ijc.24.4.4337
- Jan 1, 2026
- International Journal of Computing
- Uddalok Sen + 2 more
To propose an efficient scheduling algorithm in a large distributed heterogeneousenvironment like cloud, resource (CPU cycles, memory) requirement of jobs must be predicted priorto the execution. An execution history can be maintained to store execution profile of all jobs executedearlier on the given set of resources. A feedback guided job modelling scheme is proposed earlier [1] todetect similarity between newly submitted job and previously executed jobs on that resource set. Basedon the similarity the new jobs are categorized as either an exact clone or near-miss clone or miss-cloneto the history jobs. However, in [2], it is shown that the actual resource consumption, and predictedresource requirement may differ to a great extent, especially for the near-miss-clone and miss-clone jobs.Furthermore, efficient resource scheduling based on the similarity of new jobs has not been addressedin [2]. Some studies show that even if the resource requirements of jobs are predicted accurately, it isnearly impossible to predict the actual execution time on a given resource, and actual execution time isonly available after the completion of the job [3]. Ignoring uncertain facts at the time of scheduling maylead to unsuccessful completion of jobs, especially, where resources are available for the limited periodof time, like in the case of cloud. In this work, we propose an efficient scheduling approach that selectsa resource for a job based on two critical criteria. Firstly, the selected resource is evaluated to ensurea faster completion time. Secondly, the availability of the resource until the completion of the assignedjobs is ensured. In addition, this work proposes optimization of these two criteria during the resourceselection process. Finally, we compare the efficiency of our scheduling algorithm with some well-knownjob scheduling algorithms.
- Research Article
- 10.3390/cryptography10010003
- Dec 30, 2025
- Cryptography
- Xinyao Li + 1 more
Side-channel attacks leveraging microarchitectural components such as caches and translation lookaside buffers (TLBs) pose increasing risks to cryptographic and machine-learning workloads. This paper presents a comparative study of performance and side-channel leakage under two page-size configurations—standard 4 KB pages and 2 MB huge pages—using paired attacker–victim experiments instrumented with both Performance Monitoring Unit (PMU) counters and precise per-access timing using rdtscp(). The victim executes repeated, key-dependent memory accesses across eight cryptographic modes (AES, ChaCha20, RSA, and ECC variants) while the attacker records eight PMU features per access (cpu-cycles, instructions, cache-references, cache-misses, etc.) and precise rdtscp() timing. The resulting traces are analyzed using a multilayer perceptron classifier to quantify key-dependent leakage. Results show that the 2 MB huge-page configuration achieves a comparable key-classification accuracy (mean 0.79 vs. 0.77 for 4 KB) while reducing average CPU cycles by approximately 11%. Page-index identification remains near random chance (3.6–3.7% for PMU side-channels and 1.5% for timing side-channel), indicating no increase in measurable leakage at the page level. These findings suggest that huge-page mappings can improve runtime efficiency without amplifying observable side-channel vulnerabilities, offering a practical configuration for balancing performance and security in user-space cryptographic workloads.
- Research Article
- 10.1002/cpe.70433
- Nov 19, 2025
- Concurrency and Computation: Practice and Experience
- Matteo Federico + 2 more
ABSTRACT Locking plays a crucial role since it ensures synchronized access by concurrent threads to shared resources—like shared data structures to be managed in critical sections. Traditional sleep locks—based on blocking operating system services—adopt a reactive approach (e.g., upon lock release) to waking up waiting threads, which might introduce additional latency on the critical path. On the opposite side, non‐blocking locks, like spinlocks, allow threads to wait while still using CPU cycles for checking and updating the lock variable, which causes the waste of both cycles and energy. In this article, we present a new locking algorithm, called SSPA (Spin/Sleep Proactive‐Awakening)—and its implementation for Linux systems—which combines spin and sleep waiting phases via the introduction of an innovative proactive wake‐up mechanism that exploits the SoftIRQ daemon of the Linux kernel. Our solution allows threads to be awakened from their sleep phases on time to be already CPU dispatched when the lock is really released. This provides the opportunity to quickly access the critical section while at the same time enabling control over the actual amount of CPU cycles that are spent by spinning wait phases. As we show via experimental data, our solution allows exploring new trade‐offs between responsiveness and CPU/energy efficiency in concurrent applications, hence rising as an interesting alternative to literature solutions.
- Research Article
- 10.3390/app152212049
- Nov 12, 2025
- Applied Sciences
- Gyupin Moon + 1 more
The memory demand of modern applications has been rapidly increasing with the continuous growth of data volume across industrial and academic domains. As a result, computing devices (i.e., IoT devices, smartphones, and tablets) often experience memory shortages that degrade system performance and quality of service by wasting CPU cycles and energy. Thus, most operating systems rely on the swap mechanism to mitigate the memory shortage situation in advance, even if the swap memory fragmentation problem occurs over time. In this paper, we analyze the fragmentation behavior of the swap memory space within storage devices over time and demonstrate that the latency of swap operations increases significantly under aged conditions. We also propose a new extension of the traditional swap mechanism, called VSwap, that mitigates the swap memory fragmentation problem in advance by introducing two core techniques, virtual migration and address remapping. In VSwap, virtual migration gathers valid swap pages scattered across multiple clusters into contiguous regions within the swap memory space, while address remapping updates the corresponding page table entries to preserve consistency after migration. For experiments, we enable VSwap on the traditional swap mechanism (i.e., kswapd) by implementing it with simple code modifications. To confirm the effectiveness of VSwap, we performed a comprehensive evaluation based on various workloads. Our evaluation results confirm that VSwap is more useful and highly valuable than the original swap mechanism. In particular, VSwap improves the overall performance up to 48.18% by harvesting available swap memory space in advance with negligible overhead; it performs close to the ideal performance.
- Research Article
- 10.11648/j.ijdsa.20251106.13
- Nov 12, 2025
- International Journal of Data Science and Analysis
- Balaji Subramaniam
In the data science world, massive amounts of data need to be processed efficiently as part of a high-volume of data processing. As the input data sets are highly disordered, we need to embed the appropriate algorithm to arrange the data in the required order for SQL queries to process the data quickly. Processing data in billions or trillions of rows has become common use cases. Robust data management strategies are required to handle increasing data volume. The main reason for data growth is use of IoT devices, ERP platforms, Social media apps, e-Commerce platforms, streaming data and AI / ML creates more data for data insights. A delay in few milliseconds for each input data sorting can make a difference of several minutes to hours when the system is processing larger data sets. The data sorting mechanisms are measured by their time complexity with the input element size benchmarking the processing time and resources consumed on a specific system. The data sorting performance can be improved by reducing the number of intensive operations (number of CPU cycles) and memory usage for each process when the data is sorted. “Rapid Data Sorting” provides much more efficiency to the program and thereby helps to improve the overall data processing speed. After extensive research and rigorous testing, the proposal below was formulated.
- Research Article
- 10.1145/3773032
- Nov 11, 2025
- ACM Transactions on Embedded Computing Systems
- Anuj Justus Rajappa + 7 more
Hyperdimensional Computing (HDC) is an emerging AI algorithm, touted to be an efficient, neuro-inspired and reliable alternative to neural networks for Edge AI. HDC utilizes hypervectors with several thousand elements; the number of elements in these hypervectors denotes the HDC dimension. This dimension can be optimized for improving the efficiency and reliability of HDC inference against errors such as bit-flips, which can be caused by environmental radiation-induced soft errors. We hypothesize that, by reducing the runtime chip area and execution time utilized by HDC inference through lowering dimensionality, both efficiency and reliability against soft error-induced bit-flips can be simultaneously improved while trading off a negligible amount of accuracy and error threshold. We tested our hypothesis by executing an HDC inference algorithm with two different dimension values, 10000 (10k) and 1024, on a commercially available, low-power, bare-metal ARM platform with a Cortex-M4 processor. We conducted the efficiency analysis by measuring the CPU cycles and energy required for executing the algorithm, and the reliability analysis using real-world atmospheric-like neutron radiation from the ChipIr facility in Oxfordshire, UK. Analyses revealed that, by lowering the HDC dimension from 10k to 1024, the reliability of HDC inference against soft error-induced bit-flips was 3.5 times better and efficiency improved by more than 16 times. This innovative observation contrasts the prevailing understanding in the community that increasing the HDC dimension always improves robustness or reliability. To the best of our knowledge, our work is the first to study the reliability of HDC inference using real-world radiation.
- Research Article
1
- 10.1145/3725284
- Jun 17, 2025
- Proceedings of the ACM on Management of Data
- Chen Ding + 7 more
Rapid increase of storage and network bandwidth incurs higher CPU consumption in modern data systems. This phenomenon is particularly evident for log-structured merged key-value stores (LSM-KVS), which rely on resource-intensive background operations to flush and compact disk data. While extensive research has been conducted to reduce the CPU overhead of background compaction, less attention has been paid to background flushing, which can also consume a significant amount of valuable CPU cycles and disrupt CPU caches, ultimately impacting overall performance. In this paper, we propose DFlush, a novel solution that uses DPUs to offload background flush operations to reduce its CPU cost. DPUs are an appealing choice for this goal due to their cost-effectiveness, ease of programming, and widespread deployment. However, their complex hardware architecture requires careful design of both the data and control planes. To fully harness the DPU's capabilities, DFlush decomposes a flush job into fine-grained steps, mapped them to DPU hardware units, and accelerates them through pipeline, data, and channel parallelism, ensuring data-plane efficiency. It also introduces an adaptive control plane that dynamically schedules flush jobs from different LSM-KVS instances based on their priority, reducing write stall and tail latency. Our experiments on a real DPU platform with an industrial-grade LSM-KVS show that DFlush delivers higher throughput, significantly lower tail latency, and saves up to dozens of CPU cores per LSM-KVS server while reducing energy consumption.
- Research Article
1
- 10.1145/3730966
- Jun 8, 2025
- Proceedings of the ACM on Networking
- Nikita Tyunyayev + 3 more
With a budget of 300 CPU cycles per packet, Network Functions Virtualisation (NFV) scenarios require efficient packet exchange mechanisms between NICs and CPUs. However, even the latest kernel bypass techniques, such as DPDK, can eat up 20% of the budget. This paper investigates the strain on CPUs due to escalating network speeds and the ability to exchange more packet metadata due to NICs becoming more feature-fledged and ''Smart''. We advocate for NICs to adapt to software needs rather than rely on drivers as translators. We analyze different applications and compare multiple packet buffers and data organization models. We then develop an API for the datapath that will compile into different buffer management models according to the application's needs. Introducing ASNI, we explore the potential to tailor packet descriptors for different applications, offloading the driver's work from the datapath so that the application receives the packets precisely as it needs them. We advocate for a vision where only a minimal driver on the host is required, retrieving CPU cycles to applications instead. We propose a prototype implementation on NVIDIA BlueField-3 SoC. Evaluated in multiple NFV scenarios, it manages to serve 2.2x more traffic under the same loss ratio constraints than state-of-the-art solutions.
- Research Article
- 10.1080/17445760.2025.2508165
- May 23, 2025
- International Journal of Parallel, Emergent and Distributed Systems
- S W Al-Mhameed + 2 more
The Internet of Vehicles (IoV) rapidly develops, resulting in various computation-intensive and delay-sensitive applications. Issues of delay can be mitigated with the help of edge computing. Most studies concentrated on minimizing delays while maintaining a maximum level of task completion, either from the devices' or the requesters' perspective. This research focuses on fairness for both devices and requesters. We propose a fair resource allocation optimization model for both requesters and devices. In our model, requesters' tasks are completed relatively quickly in terms of the number of completed tasks, response time, and cost. Furthermore, by striking a balance between profits and the quantity of CPU cycles left, our suggested model ensures that devices are not overburdened. We aim to maximize the number of completed tasks while minimizing delays and preserving the fairness of requesters and devices. We perform detailed experiments on randomly generated data instances. The results in this paper show the model's effectiveness in achieving its objectives regarding various factors such as task execution time, response time, cost, and profit in IoV environments.
- Research Article
- 10.52783/jisem.v10i42s.7914
- May 3, 2025
- Journal of Information Systems Engineering and Management
- Sharadadevi Kaganurmath
The Post-Quantum Lightweight Key Sharing Protocol for Secure MQTT-Based IoT Networks (PQLKS-MQTT) addresses the critical need for quantum-resistant and resource-efficient security in IoT communications. As the proliferation of IoT devices continues, securing MQTT-based networks against evolving threats, including quantum attacks, becomes imperative. PQLKS-MQTT integrates the Kyber Key Encapsulation Mechanism for post-quantum key exchanges, along with BLAKE2s hashing and ChaCha20 encryption, to ensure robust security with minimal resource consumption. Implemented using the Cooja simulator with Contiki OS, Eclipse Mosquitto MQTT broker, and Open Quantum Safe (liboqs) library, the protocol demonstrates superior performance compared to state-of-the-art solutions. Experimental results show that PQLKS-MQTT achieves the lowest CPU energy consumption (0.0000021 mJ), fastest execution time (0.35 seconds), and minimal computational (260 CPU cycles) and communication overheads (55 bytes), with only a slight increase in average energy consumption (0.00145 mJ) due to post-quantum cryptographic operations. This balance between enhanced security and efficient resource utilization makes PQLKS-MQTT a suitable solution for resource-constrained IoT devices and large-scale deployments, offering a scalable, quantum-safe communication framework for future IoT ecosystems
- Research Article
- 10.30574/ijsra.2025.15.1.0650
- Apr 30, 2025
- International Journal of Science and Research Archive
- Murali Natti
Modern database management systems, such as PostgreSQL, require meticulous attention to connection management in order to optimize the allocation and utilization of crucial system resources including CPU, memory, and disk I/O. Efficient connection management is not merely about opening or closing connections—it involves implementing advanced strategies that ensure resources are used judiciously and that system performance remains robust even under high-load conditions. This article delves into the various methodologies that can be employed to enhance query performance and overall responsiveness of the database. It explores how connection pooling can drastically reduce the overhead associated with establishing new connections by reusing a finite pool of pre-established connections, thus saving on CPU cycles and minimizing memory consumption. Furthermore, the article discusses the critical role of tuning CPU usage through parallel query execution and the careful management of worker processes, which together ensure that complex queries are processed swiftly without overburdening the system's processing cores. Additionally, the discussion extends to optimizing I/O operations by configuring parameters like shared_buffers and work_mem so that frequently accessed data remains in memory, reducing the need for slower disk-based operations. Fine-tuning these settings allows the system to manage I/O workloads more efficiently, ensuring that query execution does not suffer due to excessive disk activity. The article also emphasizes the importance of strategic memory management to prevent issues such as memory bloat, thereby maintaining a balance between available resources and workload demands. Through a comprehensive exploration of these strategies and configuration best practices, database administrators are provided with a robust framework to achieve improved performance and scalability. This proactive approach not only enhances the system’s stability under heavy workloads but also paves the way for future growth, ensuring that PostgreSQL continues to deliver high responsiveness and efficient resource utilization in diverse operational environments.
- Research Article
18
- 10.18196/jrc.v6i1.25351
- Feb 26, 2025
- Journal of Robotics and Control (JRC)
- Abdulnasser Abduljabbar Abbood + 5 more
Flying Ad Hoc Networks (FANETs) are indispensable in applications such as Surveillance, Disaster response missions, and Military operations. Both security and communication efficiency must meet certain requirements. However, their effectiveness is hobbled by dynamic topologies, resource constraints, and cyber threats. Therefore, Post-Quantum Cryptography (PQC) is necessary. Classical algorithms and current PQC schemes for FANETs have been discussed in this thesis, including cryptographic solutions that are lightweight enough for resourceconstrained environments. The numerical results of the experiment show that while lattice-based cryptography involves minimal risk of breaches, its power consumption is 25% higher than that for other systems and its processing time 30% slower. In contrast, multivariate polynomial cryptography is better on metrics like usage of electricity: only 10% more power consumed energywise and 15% more CPU cycles needed for processing. The introduction of PQC algorithms and architectures resulted in a 5–10% reduction in network throughput and increased latency to 20% in some scenarios. The results show that hybrid cryptographic systems—combining classical with PQC techniques— have the potential to achieve both high efficiency and long-term security. Case studies have validated the feasibility of tailored quantum-safe algorithms in FANETs, which can offer considerable security benefits while standing rigorous scrutiny in terms of scalability and computational performance on dynamic, missioncritical operations.
- Research Article
- 10.1145/3709718
- Feb 10, 2025
- Proceedings of the ACM on Management of Data
- Kyungmin Lim + 4 more
Sequential data access for the rapid ingestion of large fact tables from storage is a pivotal yet resource-intensive operation in data warehouse systems, consuming substantial CPU cycles across various components of DBMSs and operating systems. Although bypassing these layers can eliminate access latency, concurrent access to the same table often results in redundant data fetching due to cache-bypassing data transfers. Thus, a new design for data access control is necessary to enhance rapid data ingestion in databases. To address this concern, we propose a novel DB-OS co-design that efficiently supports sequential data access at full device speed. Our approach, zicIO, liberates DBMSs from data access control by preparing required data just before DBMSs access it, while alleviating all known I/O latencies. The core of zicIO lies in its DB-OS co-design, which aims to (1) automate data access control and (2) relieve redundant data fetching through seamless collaboration between the DB and the OS. We implemented zicIO and integrated it with four databases to demonstrate its general applicability. The evaluation showed performance enhancements of up to 9.95x under TPC-H loads.
- Research Article
8
- 10.1109/jiot.2024.3476476
- Feb 1, 2025
- IEEE Internet of Things Journal
- Jiali Yang + 6 more
Vehicular edge computing (VEC) emerges as a promising paradigm for processing computing-intensive parallel vehicular tasks, where vehicular tasks can be offloaded to the edge nodes [e.g., roadside units (RSUs)] to seek less computing delay. Considering the impact of computation services on offloading efficiency, there are several works that jointly study the decision making of task offloading and service caching. However, the existing works fail to consider the time-varying service requests and ignore the time-slots correlation of the computation services. To bridge the gap, this work designs a service-aware parallel task offloading approach, which is the first work to jointly explore time-varying computation services and task offloading based on real-world vehicular trajectory data in VEC networks. Specifically, we first propose a computation service prediction algorithm using the real-world vehicular trajectory data. Guided by this, RSUs flexibly precache computation services. Then, we propose a learning-based parallel task offloading algorithm, which allows vehicles to make offloading decisions based on the history of the edge selections. Furthermore, we conduct simulations to validate the proposed algorithm. The results demonstrate that the proposed algorithm reduces task delay by 45%, 58%, and 55% compared to the algorithms without service-aware computation offloading under various CPU cycles, task numbers, and time slots.
- Research Article
- 10.1109/lca.2025.3549423
- Jan 1, 2025
- IEEE Computer Architecture Letters
- Amin Mamandipoor + 2 more
Networking is considered a datacenter tax, and hyperscalers push hard to provide high-performance networking with minimal resource expenditure. To keep up with the ever-increasing network rates, many CPU cycles are spent on the networking tax. We make a key observation that network processing threads can be simultaneously executed on server CPUs with minimal interference with the application threads. However, utilizing simultaneous multithreading (SMT) to scale the number of network threads with the number of application threads suffers from (1) failing to provide strict tail latency requirements for latency-critical applications, and (2) reducing the number of available hardware threads for application processes, thus contributing to a high datacenter network tax. In this work, we design, implement, and evaluate a chip-multiprocessor (CMP) with specialized Simultaneous Data-delivery Threads (SDT) per physical core. The key insight is that with judicious partitioning at the architectural level, SDT can safely co-run with application processes with guaranteed performance isolation. Our evaluation results, using full-system simulation, show that a 20-core CMP enhanced with SDT reduces the area and power consumption of a baseline 40-core CMP by 47.5% and 66%, respectively, while reducing network throughput by less than 10%.
- Research Article
41
- 10.1109/tmc.2024.3455417
- Jan 1, 2025
- IEEE Transactions on Mobile Computing
- Yongkang Gong + 4 more
Space-air-ground (SAG) integrated heterogenous networks can provide pervasive intelligence services for various ground users (GUs). The network can help cellular networks release network resources and alleviate congestion pressure. Moreover, one important application of the network is that digital twin (DT) can enable nearly-instant wireless connectivity and highly-reliable data mapping from physical systems to digital world in a real-time fashion. The integration of SAG and DT (SAG-DT) reduces the gap between data analysis and physical status, which can further realize robust edge intelligence services. However, the random computation task arrival, time-varying channel gains, and the lack of mutual trust among ground GUs hinder better quality of service in the promising SAG-DT network. In this paper, we envision a SAG-DT integrated blockchain model to transfer the task data to the aerial network, and then perform the computation offloading, energy harvesting and privacy protection. Moreover, we propose a Lyapunov-aided multi-agent deep federated reinforcement learning (MADFRL) algorithm framework to optimize the CPU cycle frequency, the size of block, the number of DTs, and harvested energy to minimize the execution costs and privacy overhead. Extensive performance analyses indicate that the MADFRL algorithm framework can strengthen the data privacy via blockchain verification mechanism and approaches the optimal performance on the basis of lower computation complexity. Finally, simulation results corroborate that the proposed Lyapunov-aided MADFRL algorithm is superior to advanced benchmarks in terms of execution costs, task processing quantities and privacy overhead.
- Research Article
26
- 10.1109/jbhi.2024.3455803
- Nov 1, 2024
- IEEE journal of biomedical and health informatics
- Guoxin Wang + 4 more
Wearable Internet of Things (IoT) devices are gaining ground for continuous physiological data acquisition and health monitoring. These physiological signals can be used for security applications to achieve continuous authentication and user convenience due to passive data acquisition. This paper investigates an electrocardiogram (ECG) based biometric user authentication system using features derived from the Convolutional Neural Network (CNN) and self-supervised contrastive learning. Contrastive learning enables us to use large unlabeled datasets to train the model and establish its generalizability. We propose approaches enabling the CNN encoder to extract appropriate features that distinguish the user from other subjects. When evaluated using the PTB ECG database with 290 subjects, the proposed technique achieved an authentication accuracy of 99.15%. To test its generalizability, we applied the model to two new datasets, the MIT-BIH Arrhythmia Database and the ECG-ID Database, achieving over 98.5% accuracy without any modifications. Furthermore, we show that repeating the authentication step three times can increase accuracy to nearly 100% for both PTBDB and ECGIDDB. This paper also presents model optimizations for embedded device deployment, which makes the system more relevant to real-world scenarios. To deploy our model in IoT edge sensors, we optimized the model complexity by applying quantization and pruning. The optimized model achieves 98.67% accuracy on PTBDB, with 0.48% accuracy loss and 62.6% CPU cycles compared to the unoptimized model. An accuracy-vs-time-complexity tradeoff analysis is performed, and results are presented for different optimization levels.
- Research Article
9
- 10.3389/fcomp.2024.1465352
- Oct 21, 2024
- Frontiers in Computer Science
- Sean Choi + 5 more
Federated learning (FL) has emerged as a promising paradigm for secure distributed machine learning model training across multiple clients or devices, enabling model training without having to share data across the clients. However, recent studies revealed that FL could be vulnerable to data leakage and reconstruction attacks even if the data itself are never shared with another client. Thus, to resolve such vulnerability and improve the privacy of all clients, a class of techniques, called privacy-preserving FL, incorporates encryption techniques, such as homomorphic encryption (HE), to encrypt and fully protect model information from being exposed to other parties. A downside to this approach is that encryption schemes like HE are very compute-intensive, often causing inefficient and excessive use of client CPU resources that can be used for other uses. To alleviate this issue, this study introduces a novel approach by leveraging smart network interface cards (SmartNICs) to offload compute-intensive HE operations of privacy-preserving FL. By employing SmartNICs as hardware accelerators, we enable efficient computation of HE while saving CPU cycles and other server resources for more critical tasks. In addition, by offloading encryption from the host to another device, the details of encryption remain secure even if the host is compromised, ultimately improving the security of the entire FL system. Given such benefits, this paper presents an FL system named FedNIC that implements the above approach, with an in-depth description of the architecture, implementation, and performance evaluations. Our experimental results demonstrate a more secure FL system with no loss in model accuracy and up to 25% in reduced host CPU cycle, but with a roughly 46% increase in total training time, showing the feasibility and tradeoffs of utilizing SmartNICs as an encryption offload device in federated learning scenarios. Finally, we illustrate promising future study and potential optimizations for a more secure and privacy-preserving federated learning system.