Binary-Decomposed Vision Transformer: Compressing and Accelerating Vision Transformer by Binary Decomposition
Vision Transformers (ViTs) have emerged as versatile and high-performance models for various tasks such as image classification, object detection, and semantic segmentation. However, the ViT-L model, which demonstrates high accuracy, has a large number of parameters (307M), leading to increased computational requirements. To deploy ViTs on embedded devices and similar platforms, it is crucial to compress the model size and accelerate the inference process. In this paper, we propose the Binary-decomposed Vision Transformer (BdViT), a method for model compression and accelerated inference for ViTs models. BdViT consists of weight binarization based on vector decomposition and quantization of multiplication and addition operations, which does not require retraining model parameters. Through evaluation experiments using image recognition datasets, we demonstrated that BdViT can significantly reduce the number of parameters while mitigating performance degradation.
- Conference Article
3
- 10.1109/dcc.2002.1000002
- Apr 2, 2002
Summary form only given. We address the problem of entropy coding of integers i/spl isin/Z with a probability distribution defined as the two-sided geometric distribution (TSGD) which arises mainly in tasks of image and video compression. An efficient method based on binary tree decomposition of the source alphabet, combined with binary arithmetic coding, was proposed for coding of DC and AC coefficients of the DCT in the JPEG image compression standard. Binary decomposition allows for efficient coding of sources with large alphabets and skewed distribution. We propose two binary decompositions for coding of sources with the TSGD.
- Research Article
- 10.58915/ijact.v4.2024.1498
- Nov 19, 2024
- International Journal of Advanced Communication Technology (IJACT)
This study addresses the critical challenge of embedding additional information into medical images, specifically focusing on the trade-off between watermark capacity and visual quality. The importance of this challenge lies in maintaining the diagnostic value of medical images while securely embedding auxiliary data such as patient identifiers or copyright information. The study conducts a comparative analysis of two watermarking schemes: binary decomposition and Fibonacci decomposition. The binary and Fibonacci decompositions were specifically applied by utilizing modified binary watermarks and leveraging the specific domain properties of the host medical images to minimize disruptions during the embedding process. The evaluation was performed on a dataset of brain magnetic resonance imaging (MRI) images. The watermark capacity was varied to assess its impact on visual quality, which was quantified using Peak Signal-to-Noise Ratio (PSNR). The results demonstrated that the Fibonacci decomposition method achieved a higher watermark capacity of up to 3.5 bpp while maintaining high visual quality, with an average PSNR value of 76.5 dB. These results indicate that the Fibonacci decomposition approach offers significant advantages in achieving a balance between high capacity and minimal image distortion, making it a promising solution for medical image watermarking applications.
- Book Chapter
- 10.1007/3-540-17179-7_13
- Jan 1, 1986
It is well known that the notions of normal forms and acyclicity capture many practical desirable properties for database schemes. The basic schema design problem is to develop design methodologies that strive toward these ideals. The usual approach is to first normalize the database scheme as far as possible. If the resulting scheme is cyclic, then one tries to transform it into an acyclic scheme. In this paper, we argue in favor of carrying out these two phases of design concurrently. In order to do this efficiently, we need to be able to incrementally analyze the acyclicity status of a database scheme as it is being designed. To this end, we propose the formalism of "binary decompositions". Using this, we characterize design sequences that exactly generate θ-acyclic schemes, for θ = α,β. We then show how our results can be put to use in database design. Finally, we also show that our formalism above can be effectively used as a proof tool in dependency theory. We demonstrate its power by showing that it leads to a significant simplification of the proofs of some previous results connecting sets of multivalued dependencies and acyclic join dependencies.
- Conference Article
88
- 10.24963/ijcai.2018/398
- Jul 1, 2018
The task of partial label (PL) learning is to learn a multi-class classifier from training examples each associated with a set of candidate labels, among which only one corresponds to the ground-truth label. It is well known that for inducing multi-class predictive model, the most straightforward solution is binary decomposition which works by either one-vs-rest or one-vs-one strategy. Nonetheless, the ground-truth label for each PL training example is concealed in its candidate label set and thus not accessible to the learning algorithm, binary decomposition cannot be directly applied under partial label learning scenario. In this paper, a novel approach is proposed to solving partial label learning problem by adapting the popular one-vs-one decomposition strategy. Specifically, one binary classifier is derived for each pair of class labels, where PL training examples with distinct relevancy to the label pair are used to generate the corresponding binary training set. After that, one binary classifier is further derived for each class label by stacking over predictions of existing binary classifiers to improve generalization. Experimental studies on both artificial and real-world PL data sets clearly validate the effectiveness of the proposed binary decomposition approach w.r.t state-of-the-art partial label learning techniques.
- Research Article
- 10.1016/j.ultramic.2025.114207
- Jul 1, 2025
- Ultramicroscopy
Towards continuous time-dependent tomography: implementation and evaluation of continuous acquisition schemes in electron tomography.
- Conference Article
1
- 10.1117/12.386801
- May 24, 2000
- Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
Optical pattern recognition can be improved using powerful filters or defining new correlations. The morphological correlation is a robust detection method that minimizes the mean absolute error between two patterns. The morphological correlation is a nonlinear correlation and it is defined as the average over all the amplitudes of the linear correlation between thresholded versions of the input scene and the reference object for every gray level. This nonlinear correlation can be implemented optically using a joint transform correlator and provides higher performance and higher discrimination abilities in comparison with other linear correlation methods. We define different morphological correlations using different binary decompositions. Those correlations allow efficient pattern recognition with higher discrimination ability than other common linear image detection techniques. Experimental result will be presented.
- Research Article
19
- 10.1109/19.893262
- Jan 1, 2000
- IEEE Transactions on Instrumentation and Measurement
Maximum time interval error (MTIE) is historically one of the main time-domain quantities for the specification of clock stability requirements in telecommunications standards. Nevertheless, plain computation of the MTIE standard estimator proves cumbersome in most cases of practical interest, due to its heavy computational weight. In this paper, MTIE is first introduced according to its standard definition. Then, a fast algorithm based on binary decomposition to compute the MTIE standard estimator is described. The computational weight of the binary decomposition algorithm is compared to that of the estimator plain calculation, showing that the number of operations needed is reduced to a term proportional to Nlog/sub 2/N instead of N/sup 2/. A heavy computational saving is therefore achieved, thus making feasible MTIE evaluation based on even long sequences of time error samples.
- Research Article
12
- 10.1109/tcomm.2004.833189
- Sep 1, 2004
- IEEE Transactions on Communications
This paper presents a two-stage turbo-coding scheme for Reed-Solomon (RS) codes through binary decomposition and self-concatenation. In this scheme, the binary image of an RS code over GF(2/sup m/) is first decomposed into a set of binary component codes with relatively small trellis complexities. Then the RS code is formatted as a self-concatenated code with itself as the outer code and the binary component codes as the inner codes in a turbo-coding arrangement. In decoding, the inner codes are decoded with turbo decoding and the outer code is decoded with either an algebraic decoding algorithm or a reliability-based decoding algorithm. The outer and inner decoders interact during each decoding iteration. For RS codes of lengths up to 255, the proposed two-stage coding scheme is practically implementable and provides a significant coding gain over conventional algebraic and reliability-based decoding algorithms.
- Conference Article
3
- 10.1109/isit.2001.935887
- Jun 24, 2001
It has long been a challenge to coding theorists to devise an effective and practical soft-decision decoding algorithm for Reed-Solomon (RS) codes. Many attempts have been made and several MLD algorithms have been devised. Unfortunately, these algorithms can only be applied to very short codes, or codes with very small number of parity symbols. Several algebraic soft-decision algorithms based on reliability measures of received symbols have also been proposed. However, these algorithms either provide very small improvement over the pure algebraic decoding algorithms over the practical range of SNR or their decoding complexity grows exponentially with minimum distance of the code. This paper presents a two-stage scheme for turbo decoding RS codes through binary decomposition and self concatenation. This decoding scheme achieves an impressive error performance with a significant reduction in decoding complexity compared to previously proposed MLD algorithms, and can be applied to decode reasonably long RS codes.
- Conference Article
- 10.2118/93464-ms
- Mar 12, 2005
Three-term seismic inversion volumes can define key reservoir properties such as porosity and water saturation that together define the hydrocarbon pore volume. The problem is that the vertical resolution of the elastic data is limited to, at best, approximately 10 m. That means the inversion volumes actually represent elastic properties of a net-to-gross (N/G) with unknown porosity and saturation in the net. The binary decomposition is described by a model that shows how the elastic data can be decomposed into the properties of the reservoir rock, typically sand, and properties of the non-reservoir rock, typically shale. Given a production scenario, the recoverable hydrocarbon volumes can be estimated directly from the seismic traces through the process of inversion, creation of support data, binary decomposition, hydrocarbon pore volume estimation, and seismic trace volume sampling.
- Conference Article
4
- 10.1109/icc.1999.765539
- Jun 6, 1999
The maximum time interval error (MTIE) is historically one of the main time-domain quantities for the specification of clock stability requirements in telecommunications standards. Nevertheless, plain computation of the MTIE standard estimator proves cumbersome in most cases of practical interest, due to its heavy computational weight. In this paper, MTIE is first introduced according to its standard definition. Then, a fast algorithm based on binary decomposition to compute the MTIE standard estimator is described. The computational weight of the binary decomposition algorithm is compared to that of the estimator plain calculation, showing that the number of operations needed is reduced to a term proportional to Nlog/sub 2/N instead of N/sup 2/. A heavy computational saving is therefore achieved, thus making feasible MTIE evaluation based on even long sequences of time error (TE) samples. The algorithm proposed is finally applied to TE sequences generated by simulation of all the types of power-law noise, in order to check its effectiveness and correctness.
- Research Article
7
- 10.3390/en15217983
- Oct 27, 2022
- Energies
This article proposes a formal method of designing robotic systems focusing on communication between components, as well as standardization of the messages between those components. The objective is to design a robotic system controller in a systematic way, focusing on communication at an abstract agent level. Communication, thus organized, and its properly defined specification facilitate the system’s further development. The method uses a standard message structure, based on IEEE FIPA standards, for communication within robotic systems composed of agents. Communication-focused top-down design of robotic systems based on binary decomposition is proposed, and used to design a companion robot working in the kitchen environment. The implemented robotic system is verified based on whether or not the specification conforms to the specified requirements. The characteristics of the designed communication are evaluated. The obtained results prove that the proposed method of designing robotic systems is formally correct, it facilitates the implementation of agents, and separates specification of the system from its implementation. The method of designing robotic systems is correct and useful. The proposed formal notation facilitates understanding of how the system operates and organizes the design process. It puts the communication between system components at the forefront. The resulting system specification facilitates the implementation. The tools for experimental evaluation of its characteristics enable the confirmation that it fulfills the requirements, and that the communication between the system components is correct.
- Conference Article
- 10.1109/cscwd57460.2023.10152714
- May 24, 2023
Federated learning is a machine learning paradigm where many clients collaboratively train a machine learning model while ensuring the nondisclosure of local data sets. Existing federated learning methods conduct optimization over the same model structure, which ensures the convenience of parameter updates. However, the same structure among clients and the server may pose risks of privacy leakage as parameters from one’s model can fit in others’ models. In this paper, we propose a heterogeneous federated learning method to preserve privacy. Each client utilizes neural architecture search to determine distinct models via local data and update the server model via a federated learning framework with knowledge distillation. Besides, we develop a privacy-preserving binary low-rank matrix decomposition method (Blow), i.e., decomposing the output matrix into two low-rank binary matrices, to further ensure the secrecy of distilled information. A simple but efficient alternating optimization method is proposed to address a key subproblem arising from the binary low-rank matrix decomposition, which falls into the category of the Np-hard bipartite boolean quadratic programming. Based on extensive experiments over the image classification task, we show our algorithm provides satisfactory accuracy and outperforms baseline algorithms in both privacy protection and communication efficiency.
- Conference Article
- 10.1145/3458380.3458431
- Feb 26, 2021
The existing algorithms in signal processing had the problems of incomplete separation and inability to adaptively decompose signals. The VMD algorithm can be used to decompose the composite signal, but the number of IMFs needs to be artificially set before decomposition. This paper proposes a binary variational mode decomposition algorithm based on similarity matching, which combines the tree decomposition with the similarity measurement to separate the other modes mixed in the current mode signal, and finally obtains the pure sub-signal through superposition. According to the sub-signal mutual information between the signal and the original signal removes noise components. Simulation analysis shows that BVMD has a good performance of denoising and signal separation.
- Research Article
50
- 10.1117/1.2712464
- Jan 1, 2007
- Journal of Electronic Imaging
We address the representation of binary images using mathematical morphology. One of the main image representations in binary mathematical morphology is the shape decomposition representation, useful for image compression, pattern recognition, and image interpolation. The binary Morphological shape decomposition (MSD) representation can be developed and generalized. With these generalizations, the binary MSD's role as an efficient image decomposition tool is extended. Initially the MSD representation is based on only “one-parameter” families of elements. A new branch is added by introducing a multistructuring element MSD based on the decomposition of images into “multiparameter” families of elements. The MSD representation contains redundant points. Examples are presented and illustrated by computer simulations.