Articles published on Data Compression
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
9690 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.rineng.2026.110243
- Jun 1, 2026
- Results in Engineering
- Tian Bai + 5 more
Thermal constraint assessment method for mission verification of sun-synchronous orbit satellites
- New
- Research Article
- 10.1038/s41377-026-02296-4
- May 18, 2026
- Light, Science & Applications
- Zhenming Yu + 16 more
Hyperspectral remote sensing images provide rich spatial and spectral information about the Earth’s surface, making them an essential tool for Earth observation. However, existing spaceborne hyperspectral payloads experience slow acquisition speeds and generate large data volumes, posing significant challenges for real-time applications. Moreover, the complex optical design and relatively high cost of traditional hyperspectral payloads hinder their broad-scale in-orbit deployment. In this work, we have proposed and completed the world’s first computational imaging-enabled compact spaceborne snapshot compressive hyperspectral payload, named BUPT-spectra01, which was successfully launched on November 11, 2024, at the Jiuquan Satellite Launch Center in China. We design a reflective coding structure, which enables BUPT-spectra01 to achieve high compactness (182 mm × 214 mm × 94 mm, 1.535 kg) and low cost. The payload operates in a sun-synchronous orbit at an altitude of 520 km, with a ground imaging swath width of 51 km by 64 km. Through a single exposure (1 ms), the payload enables 47-band hyperspectral imaging with a spectral resolution of 6.5 nm, achieving 47-times data compression simultaneously. To achieve high-accuracy hyperspectral information reconstruction, we design a novel spatial-spectral inference neural network (SSI-Net). Moreover, BUPT-spectra01 can image at a rate of 30 frames per second, which allows video-level hyperspectral observation. In-orbit experiments demonstrate that BUPT-spectra01 achieves accurate classification of ground cover based on hyperspectral features, showing promise in hyperspectral observation applications such as disaster management, environment monitoring, and resource exploration. This breakthrough significantly advances the application of computational imaging in aerospace observation, contributing to the progress of future satellite internet.
- Research Article
- 10.1126/sciadv.aec2736
- May 15, 2026
- Science Advances
- Davide Rattacaso + 4 more
As a cornerstone of automated reasoning, equational reasoning finds equivalences between symbolic expressions and fuels advances across scientific disciplines. Yet, its potential remains limited by the exponential growth of equivalent expressions with increasing problem size. We introduce quantum normal form reduction, a quantum computational framework designed to address this challenge. We construct an efficiently implementable quantum Hamiltonian whose ground state encodes all equivalent expressions in a quantum superposition. By preparing and manipulating these states, we tackle fundamental problems in equational reasoning, including verifying and counting equivalent expressions and identifying structural properties of equivalence classes. We demonstrate a quantum-inspired version of the algorithm using tensor networks to solve instances involving up to 1028 equivalent expressions, far beyond the reach of classical graph exploration. This framework opens the path for quantum symbolic computation in areas from circuit design to data compression, computational group theory, linguistics, and macromolecular modeling, unlocking previously inaccessible problems.
- Research Article
- 10.1088/2632-2153/ae64a9
- May 12, 2026
- Machine Learning: Science and Technology
- Akshat Gupta + 2 more
Abstract The petabyte-scale data generated by High Energy Physics (HEP) experiments presents a significant storage challenge. We present the Bytewise Online Autoregressive (BOA) Constrictor, a new pseudo-streaming lossless neural compressor built upon the Mamba state space model. BOA achieves competitive compression ratios across diverse structured HEP datasets, matching or exceeding LZMA, ZSTD and ZLIB at maximum compression, among other tested algorithms. With a 2.21 MB model, BOA achieves an effective compression ratio (defined as the ratio of original to compressed file size, inclusive of model size) of 7.23× on ATLAS Open Data (HDF5) and 9.13× on simulated particle collision records (HepMC v3), outperforming the next-best traditional algorithm (6.79× and 5.33×, respectively on each dataset). BOA also demonstrates robust cross-file and cross-condition generalisation on CMS Open Data (NanoAOD format), where it obtains comparable or improved effective compression ratios (within 5%) with respect to the next-best traditional algorithm. Ablation studies show that transitioning to half-precision (FP16) weights reduces the model footprint without degrading predictive accuracy, and data-type analyses reveal BOA performs best on high-entropy float32 payloads. The model has also been tested in other kinds of scientific data, yielding 1.61× (vs. 1.14× for next-best algorithm) in computational fluid dynamics and up to 1.53× (vs. 1.27×) in cosmology (CAMELS) datasets. BOA is supported by a deterministic reference C++ implementation which ensures bit-exact reproducibility across different CUDA architectures. In this proof-of-principle implementation, BOA delivers a ∼3.5 to 45 MB/s compression and ∼1.5 to 25 MB/s decompression throughput that is not yet competitive with optimised algorithms such as ZSTD or LZMA, but still provides a first step towards data compression improvements for next-generation scientific data.
- Research Article
- 10.1007/s13755-026-00459-6
- May 3, 2026
- Health information science and systems
- Omar Avalos + 3 more
Cardiovascular diseases (CVD) are among the leading causes of mortality worldwide due to genetic predisposition and lifestyle factors. Proper diagnosis of cardiovascular diseases is crucial to provide early-stage treatments. Conventional diagnostic methods such as stress tests, electrocardiograms, and echocardiography detect valuable insights into rhythm abnormalities, structural anomalies, or other cardiovascular conditions. However, their reliability heavily depends on human expertise, and they may not always detect early-stage signs of disease. In recent years, Machine Learning (ML) models have emerged as alternative diagnosis tools, capable of identifying CVD with higher accuracy. ML enables automated and precise detection based on data relationships, capturing hidden, complex patterns that are not apparent through traditional diagnostics. Most ML approaches employ supervised learning, which requires labeled data that are not always available in medical records. Under such circumstances, unsupervised learning has been explored as a suitable alternative. In this paper, a hybrid unsupervised approach combines the neural network structure of Self-Organizing Maps (SOM) with the dimensionality reduction technique of Principal Component Analysis (PCA) for unsupervised analysis for clustering CVD across different severity levels. Considering a data compression mechanism, the synergy among these methods leverages the ability to map unsupervised complex, high-dimensional data into lower-dimensional space. The proposed approach significantly improves the detection of hidden structures within large, high-dimensional medical cardiovascular datasets, providing insights into cardiovascular risk factors and improving the overall diagnostic process. Experimental evaluation on the UCI Cleveland Heart Disease dataset shows that the proposed PCA-SOM model achieves a Silhouette score of 0.94 (train) and 0.79 (test), and a Davies-Bouldin index of 0.08 (train) and 0.16 (test), outperforming baseline clustering methods such as K-means, hierarchical clustering, Gaussian Mixture and Spectral clustering highlighting its potential for supporting CVD detection.
- Research Article
- 10.3390/s26092839
- May 1, 2026
- Sensors (Basel, Switzerland)
- Amir Ijaz + 4 more
Energy consumption is a critical concern for Internet of Things (IoT) platforms lacking abundant resources, particularly for swarm robotic systems that rely on numerous devices operating collaboratively over extended periods. This study presents a comprehensive design strategy for improving processing and communication to enhance system efficiency and reduce energy consumption. We incorporate energy harvesting (photovoltaic and RF), dynamic power management, and energy-efficient communication protocols (e.g., duty cycle, power control, data compression) into two complementary platforms built for swarm robotics: MCU-based nodes (TI MSP430 with LoRa transceiver), which serve as the experimental prototype for validating energy-aware communication, compression, and scheduling mechanisms; edge platforms (Jetson Nano and TX2), which are used for high-level power profiling and system-level evaluation, particularly for computation intensive workloads and comparative analysis. Our technique involves analyzing the device’s energy usage and harvesting processes, developing efficient communication protocols, and validating the system through simulations and hardware prototypes. Experimental results under outdoor and indoor conditions show that the device maintains an energy neutrality ratio well above unity, even with limited ambient energy. Key findings include significant reductions in energy per bit transmitted and reliable long-term operation. These insights pave the way for deploying swarms of autonomous IoT-based robots with minimal maintenance and maximal longevity.
- Research Article
- 10.1016/j.net.2026.104169
- May 1, 2026
- Nuclear Engineering and Technology
- Yicheng Liao + 4 more
A high-ratio lossy data compression framework for beam diagnostic systems using ResNet AutoEncoder networks
- Research Article
- 10.1016/j.vlsi.2026.102663
- May 1, 2026
- Integration
- Yuanfa Ji + 4 more
A test data compression method based on sliding-window encoding and matching length reuse
- Research Article
- 10.1002/mp.70450
- May 1, 2026
- Medical physics
- Yuanshun Jiang + 8 more
The prediction of Epidermal Growth Factor Receptor (EGFR) mutation status in advanced lung adenocarcinoma is crucial for targeted therapy. Since EGFR mutations manifest as both macroscopic imaging features on CT and microscopic morphological changes in tissue, integrating these multiscale signals is essential for a comprehensive diagnostic assessment. However, current related research faces two key limitations: on one hand, unimodal deep learning models suffer from limited representational power; on the other hand, existing multimodal methods fail to address the inherent data structural discrepancies between continuous CT and discrete WSI, often losing critical fine-grained details due to forced data compression or shared semantic bottlenecks. To address the above limitations and improve the reliability of EGFR mutation status prediction, this study aims to propose a novel multimodal fusion framework (MFCA) that can effectively capture cross-modal semantic interactions and align imaging features across different scales. A novel MFCA based on Cross-Attention (MFCA) is proposed, and its implementation steps are as follows: 1. First, a region-of-interest-guided approach is utilized to coarsely segment whole-slide histopathology images (WSI) into three constituent regions, namely cancerous, stromal, and other regions; 2. Then, a dual-branch encoder is employed to separately extract features from two types of imaging data-global features from Computed Tomography (CT) scans and region-specific features from the segmented WSI; 3. Critically, a bidirectional cross-attention module is introduced into the framework, which is designed to facilitate deep semantic interaction and alignment between the macroscopic context of CT imaging and the microscopic context of histopathology, thereby achieving highly efficient and discriminative feature fusion. On the external validation set, our MFCA framework achieved robust performance, with Area Under the Curve (AUC) values of 0.758(95% CI: 0.683-0.832) for cancerous regions, 0.805(95% CI: 0.716-0.900) for stromal regions, and 0.760(95% CI: 0.686-0.833) for other regions. The model's performance, particularly in the stromal component, was statistically superior to all baseline and competing models. The proposed MFCA framework predicts EGFR mutation status by innovatively integrating macroscopic CT imaging with region-specific microscopic WSI features. It serves as a valuable computational tool to support precision oncology for patients with advanced lung adenocarcinoma.
- Research Article
- 10.1051/0004-6361/202557858
- Apr 29, 2026
- Astronomy & Astrophysics
- M S Cagliari + 2 more
The advent of Stage IV galaxy redshift surveys such as DESI and marks the beginning of an era of precision cosmology, with one key objective being the detection of primordial non-Gaussianities (PNG), which are potential signatures of inflationary physics. In particular, constraining the amplitude of local-type PNG, parametrised by fnl, with σ_ Euclid ∼ 1, would provide a critical test of single-versus-multi-field inflation scenarios. While current large-scale structure and cosmic microwave background analyses have achieved σ_ ∼ 5--9, further improvements demand novel data compression strategies. We propose a hybrid estimator that hierarchically combines standard two-point and three-point statistics with a field-level neural summary, motivated by recent theoretical works that have indicated that such a combination is nearly optimal in effectively disentangling primordial from late-time non-Gaussianity. We employed , a convolutional neural network that extracts small-scale information from sub-volumes (i.e. patches) of the halo number density field, while large-scale information is retained via the power spectrum and bispectrum. Using simulations, we evaluated the Fisher information of this combined estimator across various redshifts, halo mass cuts, and scale cuts. Our results demonstrate that the inclusion of patch-based field-level compression always enhances constraints on fnl, reaching gains of 30--45% at low k_̊m max (∼ 0.1 , h , PatchNet Quijote-PNG Mpc ^ -1 ) and of 15--25% when the standard summary statistics include k modes comparable to those probed by the patches (k_ ̊m max ∼ 0.4 , h , Mpc ^ -1 ). This shows that, even in this configuration, information from beyond the bispectrum can be captured. This approach offers a computationally efficient and scalable pathway to tightening the PNG constraints with forthcoming survey data.
- Research Article
- 10.3390/ijgi15040164
- Apr 11, 2026
- ISPRS International Journal of Geo-Information
- Shuo Zhang + 3 more
(1) Background: Curve data compression plays a critical role in efficient storage, transmission, and multi-scale visualization of vector spatial data, especially for complex geographic boundaries. Achieving high compression efficiency while preserving geometric fidelity remains a challenging task. (2) Methods: This study proposes a vector curve compression framework based on a convolutional autoencoder. Curve data are segmented and resampled to unify network input, after which coordinate-difference sequences are encoded into low-dimensional latent vectors through convolutional layers and reconstructed via a symmetric decoder. (3) Results: Experiments conducted on a global island boundary dataset demonstrate that the proposed method achieves effective data reduction with stable reconstruction accuracy. Specifically, compared with the classical Douglas–Peucker (DP) algorithm, Fourier series (FS) methods, and fully connected autoencoders (FCAs), the 1D CAE exhibits superior and more robust reconstruction performance, especially under high compression ratios. It achieves the lowest positional deviation (PD = 42.41) and the highest spatial fidelity (IoU = 0.9991, with a relative area error of only 0.0067%), while maintaining high computational efficiency (57.32 s). Sensitivity analyses reveal that a convolution kernel size of 1 × 7 and a segment length of 25 km yield the optimal trade-off between representational capacity and model stability. (4) Conclusions: The proposed method enables efficient vector curve compression and reliable coastline reconstruction, and is particularly suitable for small- and medium-scale cartographic applications up to a map scale of 1:250 K.
- Research Article
- 10.1016/j.compbiolchem.2025.108769
- Apr 1, 2026
- Computational biology and chemistry
- Nisha A + 1 more
Opt Deep CSSAN: Optimized Deep Convolutional Spectral-Spatial Attention Network for hyperspectral image classification.
- Research Article
- 10.1002/adma.202521432
- Apr 1, 2026
- Advanced materials (Deerfield Beach, Fla.)
- Cheng Zhang + 6 more
Growing industrial, environmental, and healthcare needs are accelerating the development of next-generation infrared systems with high detectivity, multifunctional sensing, and on-device intelligence. While traditional devices (e.g., HgCdTe, quantum wells) continue to dominate in terms of performance, they face limitations in cooling requirements, cost, and functionality. Recently, considerable advances have been made in materials, structures, and detection systems. As the foundation of IR systems, photodetectors based on traditional materials with band alignment engineering and emerging materials (e.g., two-dimensional materials and quantum dots) show high photodetectivity, low dark current, and room-temperature operation. Meanwhile, on-chip microstructures (e.g., plasmons, metasurfaces, and 3D-assembled architectures) integration enables manipulation of coupling and propagation of electromagnetic fields, which enhances polarization and wavelength-dependent light absorption. These developments empower infrared devices with multidimensional photodetection capabilities and tunable spectral response. Furthermore, advanced technologies like in-sensor computing, miniaturized spectrometers, and on-chip digitization merge sensing, storage, and computing into a single chip. The integration enables monolithic infrared systems with more compact architectures while possessing adaptive perception, data compression, and real-time signal processing capabilities. Finally, a comparative analysis containing material engineering, microstructure design, and integrated architecture is presented to outline the challenges and opportunities toward compact, intelligent, multifunctional infrared detection platforms.
- Research Article
- 10.1016/j.compchemeng.2026.109550
- Apr 1, 2026
- Computers & Chemical Engineering
- Jie Zhu + 2 more
Data compression and model reduction based approach for kinetic parameter estimation with multiple spectra
- Addendum
- 10.1002/qute.70257
- Apr 1, 2026
- Advanced Quantum Technologies
Correction to “Accelerating Quantum Circuit Simulations With Data Compression”
- Research Article
- 10.13164/re.2026.0164
- Apr 1, 2026
- Radioengineering
- A Samofalov + 1 more
In our increasingly digital world, data compression remains an important research topic. One way to improve the compression ratio is to use data transform techniques to increase further compressibility of input data for specific compression algorithms. This paper introduces a novel ternary data categorization in order to evaluate the impact of data transform techniques on input data. The categorization is described and explained in detail. Then, three transforms used for testing are described: Burrows-Wheeler and Move-to-Front transforms (BWT and MTF), as well as shifting summation. These transforms are applied to the eight true color images from the Kodak and Kaggle image sets. The findings derived from the ternary graphs and statistical measures indicate that combining BWT and MTF transforms yields the best results for this test on the selected image data. The shifting summation approach opens up possibilities for further research, particularly in searching for data patterns. Keywords: Ternary categorization, ternary diagram, image compression, data transform, transform evaluation
- Research Article
- 10.1002/ett.70420
- Apr 1, 2026
- Transactions on Emerging Telecommunications Technologies
- Sabeetha Saraswathi Sugumaran + 3 more
ABSTRACT Data deduplication is a crucial data compression technique for eliminating duplicate copies of repetitive data that reduces the bandwidth usage and storage space from cloud service providers (CSP). Data deduplication in cloud computing gained vast attention in large‐scale storage systems, where the main issue comes with security concerns. As the confidentiality of sensitive data gets reduced by data deduplication, a Deep Learning (DL) model is used for data security. In this research, Squeeze Fused Belief Network (SFBN)‐DeepkeyGen is proposed for secure data storage in the cloud environment. Initially, the file is uploaded by the user to the cloud server, which is then allowed to check deduplication. Here, the secret key is generated by SFBN‐DeepkeyGen and then the tag is generated. If the tag is not available, then the file is encrypted using the Advanced Encryption Standard (AES) algorithm. If the tag is available, then the Proof of Ownership (PoW) is checked to perform data deduplication. Finally, the experimental results revealed that SFBN‐DeepkeyGen achieved minimal encryption time, decryption time, and maximal throughput of 0.187 s, 0.197 s, and 0.817 Mbps.
- Research Article
2
- 10.1109/tit.2024.3516505
- Apr 1, 2026
- IEEE Transactions on Information Theory
- Zahra Baghali Khanian + 1 more
We consider a rate-distortion version of the quantum state redistribution task, where the error of the decoded state is judged via an additive distortion measure; it thus constitutes a quantum generalisation of the classical Wyner-Ziv problem. The quantum source is described by a tripartite pure state shared between Alice (<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">A</i>, encoder), Bob (<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">B</i>, decoder) and a reference (<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">R</i>). Both Alice and Bob are required to output a system (<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Ã</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">B̃</i>, respectively), and the distortion measure is encoded in an observable on <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ÃB̃R</i>. It includes as special cases most quantum rate-distortion problems considered in the past, and in particular quantum data compression with the fidelity measured per copy; furthermore, it generalises the well-known state merging and quantum state redistribution tasks for a pure state source, with per-copy fidelity, and a variant recently considered by us, where the source is an ensemble of pure states [ZBK & AW, Proc. ISIT 2020, pp. 1858-1863 and ZBK, PhD thesis, UAB 2020, arXiv:2012.14143]. We derive a single-letter formula for the rate-distortion function of compression schemes assisted by free entanglement. A peculiarity of the formula is that in general it requires optimisation over an unbounded auxiliary register, so the rate-distortion function is not readily computable from our result, and there is a continuity issue at zero distortion. However, we show how to overcome these difficulties in certain situations.
- Research Article
- 10.1109/tpds.2026.3658568
- Apr 1, 2026
- IEEE Transactions on Parallel and Distributed Systems
- Lin Gan + 13 more
Leveraging the latest Sunway supercomputer, we developed a fully optimized earthquake simulation model that accurately captures topographic effects for realistic seismic analysis. Optimizing for the SW26010Pro architecture with DMA/RMA communication mechanisms, data compression schemes, and vectorization, we achieved a speedup exceeding 160×. Our pipeline-based computation and communication overlapping scheme, combined with performance prediction models further minimized computational costs. These optimizations enabled the largest-scale curvilinear grid finite-difference method (CGFDM) earthquake simulations to date, covering 197 trillion grid points and achieving 86.7 PFLOPS on 39 million cores with a weak scaling efficiency of 97.9%. These advancements enabled the successful simulation of the 2008 Wenchuan earthquake, providing high-resolution seismic insights and robust assessments for regional hazard mitigation and disaster preparedness.
- Research Article
- 10.1016/j.asr.2026.02.042
- Apr 1, 2026
- Advances in Space Research
- P Salgado Sánchez + 2 more
Towards the real-time downlink of high-resolution images via low-rate telemetry using POD-based algorithms