Unlocking the Power of Numbers: Log Compression via Numeric Token Parsing
Parser-based log compressors have been widely explored in recent years because the explosive growth of log volumes makes the compression performance of general-purpose compressors unsatisfactory. These parser-based compressors preprocess logs by grouping the logs based on the parsing result and then feed the preprocessed files into a general-purpose compressor. However, parser-based compressors have their limitations. First, the goals of parsing and compression are misaligned, so the inherent characteristics of logs were not fully utilized. In addition, the performance of parser-based compressors depends on the sample logs and thus it is very unstable. Moreover, parser-based compressors often incur a long processing time. To address these limitations, we propose Denum, a simple, general log compressor with high compression ratio and speed. The core insight is that a majority of the tokens in logs are numeric tokens (i.e. pure numbers, tokens with only numbers and special characters, and numeric variables) and effective compression of them is critical for log compression. Specifically, Denum contains a Numeric Token Parsing module, which extracts all numeric tokens and applies tailored processing methods (e.g. store the differences of incremental numbers like timestamps), and a String Processing module, which processes the remaining log content without numbers. The processed files of the two modules are then fed as input to a general-purpose compressor and it outputs the final compression results. Denum has been evaluated on 16 log datasets and it achieves an 8.7% − 434.7% higher average compression ratio and 2.6× − 37.7× faster average compression speed (i.e. 26.2 MB/S) compared to the baselines. Moreover, integrating Denum's Numeric
- Research Article
76
- 10.1121/1.408579
- Mar 1, 1994
- The Journal of the Acoustical Society of America
This paper addresses the effects of nonlinear log compression on the amplitude of the backscattered signals. The changes in the statistical characteristics of signals were examined and correlated with the compression parameters. An analytical formulation was developed for the probability density function (PDF) of log-compressed amplitude signals. To obtain the theoretical PDF, a Rayleigh distributed signal is subjected to a scaled log compression of the form n1 ln(x)+n2. Such transformation is common in medical ultrasound image formation as it allows independent control over the dynamic range and gain of the displayed image. The resulting Fisher–Tippett PDF and its statistical parameters are derived and compared with the empirically measured statistics of ultrasound images of scattering phantoms. The comparisons reveal a great degree of similarity between theoretical PDF and the histogram of the image, even though the goodness-of-fit tests indicate statistical mismatch from theoretical model due to factors such as nonideal log compression transfer, noise, envelope smoothing, etc.
- Conference Article
19
- 10.1109/ultsym.1993.339648
- Jan 1, 1993
This paper addresses the effects of nonlinear log compression on the amplitude of the backscattered signals. The changes in the statistical characteristics of signals were examined and correlated with the compression parameters. We develop an analytical formulation for the probability density function (PDF) of log-compressed amplitude signals. To obtain the theoretical PDF, a Rayleigh distributed signal is subjected to a scaled log-compression of the form n1ln(x)+n 2. Such transformation is common in medical ultrasound image formation as it allows independent control over the dynamic range and gain of the displayed image. The resulting Fisher-Tippett PDF and its statistical parameters are derived and compared with the empirically measured statistics of ultrasound images of scattering phantoms. The comparisons reveal a great degree of similarity between theoretical PDF and the histogram of the image, even though the goodness-of-fit tests indicate statistical mismatch from theoretical model due to factors such as non-ideal log compression transfer, noise, envelope smoothing, etc
- Conference Article
6
- 10.1109/iecbes.2014.7047584
- Dec 1, 2014
Received echo signals of transducer in ultrasound imaging have a high dynamic range of 12 bits and hence cannot be displayed on Cathode Ray Tube (CRT), Liquid Crystal Display (LCD) monitors of ultrasound machine. Log compression is being used to compress the data to 8 bits. Since log compression is a non linear compression it is very difficult to trace the original characteristics of the signal. In this paper various global and local compression techniques were studied as a replacement for log compression so that the dynamic range of image can be retrieved if the physician has a better monitor for display, and also ensure minimum error in the retrieved signal leading to minimum error in retrieved image. From the results it is observed that Structural SIMilarity (SSIM) of wavelet compressed image is 1.02 times more and that of gamma compressed image is 2.24 times more than log compressed image. Gamma based compression can be preferred to log and wavelet based compression, as it gives good quality image when compared with other compression techniques, but cannot be used to retrieve the statistical properties when expanded, since these statistical properties are helpful for doctors for better analysis. Wavelet based compression serves this purpose and hence is best suited for Internet of Things (IoT) enabled ultrasound system for remote diagnosis in the cloud.
- Research Article
54
- 10.1109/tuffc.2007.463
- Sep 1, 2007
- IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control
This paper proposes a novel design of envelope detectors capable of supporting a small animal cardiac imaging system requiring a temporal resolution of more than 150 frames per second. The proposed envelope detector adopts the quadrature demodulation and the lookup table (LUT) method to compute the magnitude of the complex baseband components of received echo signals. Because the direct use of the LUT method for a square root function is not feasible due to a large memory size, this paper presents a new LUT strategy dramatically reducing its size by using binary logarithmic number system (BLNS). Due to the nature of BLNS, the proposed design does not require an individual LOG-compression functional block. In the implementation using a field programmable gate array (FPGA), a total of 166.56 Kbytes memories were used for computing the magnitude of 16-bit in-phase and quadrature components instead of 4 Gbytes in the case of the direct use of the LUT method. The experimental results show that the proposed envelope detector is capable of generating LOG-compressed envelope data at every clock cycle after 32 clock cycle latency, and its maximum error is less than 0.5 (i.e., within the rounding error), compared with the arithmetic results of square root function and LOG compression.
- Conference Article
7
- 10.1117/12.844388
- Mar 4, 2010
- Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
Ultrasound images appearance is characterized by speckle, shadows, signal dropout and low contrast which make them really difficult to process and leads to a very poor signal to noise ratio. Therefore, for main imaging applications, a denoising step is necessary to apply successfully medical imaging algorithms on such images. However, due to speckle statistics, denoising and enhancing edges on these images without inducing additional blurring is a real challenging problem on which usual filters often fail. To deal with such problems, a large number of papers are working on B-mode images considering that the noise is purely multiplicative. Making such an assertion could be misleading, because of internal pre-processing such as log compression which are done in the ultrasound device. To address those questions, we designed a novel filtering method based on 1D Radiofrequency signal. Indeed, since B-mode images are initially composed of 1D signals and since the log compression made by ultrasound devices modifies noise statistics, we decided to filter directly the 1D Radiofrequency signal envelope before log compression and image reconstitution, in order to conserve as much information as possible. A bi-orthogonal wavelet transform is applied to the log transform of each signal and an adaptive 1D split and merge like algorithm is used to denoise wavelet coefficients. Experiments were carried out on synthetic data sets simulated with Field II simulator and results show that our filter outperforms classical speckle filtering methods like Lee, non-linear means or SRAD filters.
- Research Article
58
- 10.1121/1.414999
- Jun 1, 1996
- The Journal of the Acoustical Society of America
Log compression of A lines to produce B-scan images in clinical ultrasound imaging systems is a standard procedure to control the dynamic range of the images. The statistics of such compressed images in terms of underlying scatterer statistics have not been derived. The statistics are analyzed for partially formed speckle using a general K distribution model of envelope statistics to derive the density function for the log-compressed envelope. This density function is used to elucidate the relation between the moments of the compressed envelope, the compression parameters, and the statistics of the scatterers. The analysis shows that the mean of the log-compressed envelope is an increasing function of both the backscattered energy and the effective scatterer density. The variance of the log-compressed envelope is a decreasing function of the effective scatterer density and is independent of the backscattered energy.
- Research Article
30
- 10.1007/s10664-020-09822-x
- Aug 12, 2020
- Empirical Software Engineering
Large-scale software systems and cloud services continue to produce a large amount of log data. Such log data is usually preserved for a long time (e.g., for auditing purposes). General compressors, like the LZ77 compressor used in gzip, are usually used in practice to compress log data to reduce the cost of long-term storage. However, such general compressors do not consider the unique nature of log data. In this paper, we study the performance of general compressors on compressing log data relative to their performance on compressing natural language data. We used 12 widely used general compressors to compress nine log files that are collected based on surveying prior literature on text compression, log compression and log analysis. We observe that log data is more repetitive than natural language data, and that log data can be compressed and decompressed faster with higher compression ratios. Besides, the compressor with the highest compression ratio for natural language data is rarely the one for log data. Nevertheless, the compressors with the highest compression ratio for log data are rarely adopted in practice by current logging libraries and log management tools. We also observe that the peak compression and decompression speeds of general compressors on log data is often achieved with a small data size, while such size may not be used by log management tools. Finally, we observe that the optimal compression performance (measured by a combined compression performance score) of log data usually requires the compression level to be configured higher than the default level. Our findings call for careful consideration of choosing general compressors and their associated compression levels for log data in practice. In addition, our findings shed lights on the opportunities for future research on compressors that better suit the characteristics of log data.
- Research Article
34
- 10.1109/tuffc.2020.2976809
- Nov 29, 2020
- IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control
In recent years, deep learning (DL) has been successfully applied to the analysis and processing of ultrasound images. To date, most of this research has focused on segmentation and view recognition. This article benchmarks different convolutional neural network algorithms for motion estimation in ultrasound imaging. We evaluated and compared several networks derived from FlowNet2, one of the most efficient architectures in computer vision. The networks were tested with and without transfer learning, and the best configuration was compared against the particle imaging velocimetry method, a popular state-of-the-art block-matching algorithm. Rotations are known to be difficult to track from ultrasound images due to a significant speckle decorrelation. We thus focused on the images of rotating disks, which could be tracked through speckle features only. Our database consisted of synthetic and in vitro B-mode images after log compression and covered a large range of rotational speeds. One of the FlowNet2 subnetworks, FlowNet2SD, produced competitive results with a motion field error smaller than 1 pixel on real data after transfer learning based on the simulated data. These errors remain small for a large velocity range without the need for hyperparameter tuning, which indicates the high potential and adaptability of DL solutions to motion estimation in ultrasound imaging.
- Research Article
7
- 10.5555/2819419.2819433
- May 16, 2015
When monitoring complex applications in cloud systems, a difficult problem for operators is receiving false positive alarms. This becomes worse when the system is sporadically being changed and upgraded due to the emerging continuous deployment practice. Other legitimate but sporadic maintenance operations, such as log compression, garbage collection and data reconstruction in distributed systems can also trigger false alarms. Consequently, traditional baseline-based anomaly detection and monitoring is less effective. A normal but dangerous practice is to turn off normal monitoring during sporadic operations such as upgrade and maintenance. In this paper, we report on the use of the process context information of sporadic operations to suppress false positive alarms. We use the context information both directly and in machine learning. Our experimental evaluation shows that 1) using process context directly improves the alarm precision up to 0.226 (36.1% improvement), 2) using process-context trained machine learning models improves the precision rate up to 0.421 (84.7% improvement).
- Research Article
2
- 10.3906/elk-1901-231
- Sep 18, 2019
- TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES
The feature extraction process is a fundamental part of speech processing. Mel frequency cepstral coefficients (MFCCs) are the most commonly used feature types in the speech/speaker recognition literature. However, the MFCC framework may face numerical issues or dynamic range problems, which decreases their performance. A practical solution to these problems is adding a constant to filter-bank magnitudes before log compression, thus violating the scale-invariant property. In this work, a magnitude normalization and a multiplication constant are introduced to make the MFCCs scale-invariant and to avoid dynamic range expansion of nonspeech frames. Speaker verification experiments are conducted to show the effectiveness of the proposed scheme.
- Research Article
8
- 10.1177/875647938900500502
- Sep 1, 1989
- Journal of Diagnostic Medical Sonography
An overview of signal-processing techniques used in the acquisition, display, manipulation, and analysis of echo scan data is presented. The principles of time-gain compensation, selective enhancement, log compression, fill-in interpolation, edge enhancement, image updating, write zoom, gray scale mapping, black/white inversion, freeze frame, frame averaging, read zoom, thresholding, contrast enhancement, filtering, and region-of-interest definition are reviewed. The concept of gray scale mapping, which dictates how the stored scan data are displayed, is explained with many illustrative examples.
- Conference Article
5
- 10.1109/isocc.2014.7087602
- Nov 1, 2014
In this paper, we propose the hardware architecture for high-speed transaction logging of forex trading system. In forex trading market, the trading volume of currencies is growing larger every year. In order to provide real-time processing of large volume and high availability service, we focused on the two types of the workload, where the bottleneck occurs, and conducted workload analysis. The bottleneck between the application server and the internal hard disk is caused by the overhead from storing the transaction logs, due to the bandwidth limitation of a hard disk. Our key idea is to suppress an overhead of the transaction logging through the compression of the transaction logs. Implementation result demonstrates the feasibility of our proposal for increasing the bandwidth through the log compression.
- Research Article
11
- 10.1016/s0894-7317(97)80030-2
- Jan 1, 1997
- Journal of the American Society of Echocardiography
Contrast echocardiography: Influence of ultrasonic machine settings, mixing conditions, and pressurization on pixel intensity and microsphere size of albunex solutions in vitro
- Conference Article
- 10.1117/12.2506609
- Feb 27, 2019
- Photons Plus Ultrasound: Imaging and Sensing 2019
We had developed a real-time clinical photoacoustic (PA) and ultrasound (US) imaging system by combining a programmable ultrasound machine and a wavelength-tunable laser. The system was able to acquire real-time images of biological tissue, but the user had to restart the image acquisition software to modify parameters for optimizing the images. We have recently updated the system to adjust imaging parameters in real-time by implementing a real-time parameter control software, which is compatible with the programmable platform of the US machine. To adjust the parameters of both PA and US images, we also implemented custom functional blocks including beamforming, frequency demodulation, log compression, decimation, scan conversion, and image display. To acquire real-time images, we performed all the calculations by using parallel processing with a graphic processing unit in US machine. The updated system has the great potential to be widely applied to a variety of clinical and preclinical applications because it allows real-time optimization of imaging parameters as well as visualizing the images in real-time.
- Conference Article
- 10.1109/ultsym.1995.495811
- Nov 7, 1995
The LOG compression technique is commonly used in medical ultrasound imaging to show the weak scattering signals on the same scale as the strong specular reflections. This transformation changes the statistical nature of the envelope detected signal from Rayleigh to doubly exponential distribution. Theoretically the authors have found the relationship between the mean values of the pre- and post-compressed data. A computer simulation verified the theoretical prediction, and experimental data support the results. The results of the study imply that for the tests which use the relative signal level such as SNR, either the pre- or the post-compressed data can be used.