Mitigating Vulnerabilities in Closed Source Software

  • Abstract
  • PDF
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Many techniques have been proposed to harden programs with protection mechanisms to defend against vulnerability exploits. Unfortunately the vast majority of them cannot be applied to closed source software because they require access to program source code. This paper presents our work on automatically hardening binary code with security workarounds, a protection mechanism that prevents vulnerabilities from being triggered by disabling vulnerable code. By working solely with binary code, our approach is applicable to closed source software. To automatically synthesize security workarounds, we develop binary program analysis techniques to identify existing error handling code in binary code, synthesize security workarounds in the form of binary code, and instrument security workarounds into binary programs. We designed and implemented a prototype or our approach for Windows and Linux binary programs. Our evaluation shows that our approach can apply security workarounds to an average of 69.3% of program code and the security workarounds successfully prevents exploits to trigger real-world vulnerabilities.

Similar Papers
  • Research Article
  • Cite Count Icon 16
  • 10.1186/s42400-021-00088-4
Bin2vec: learning representations of binary executable programs for security tasks
  • Jul 1, 2021
  • Cybersecurity
  • Shushan Arakelyan + 4 more

Tackling binary program analysis problems has traditionally implied manually defining rules and heuristics, a tedious and time consuming task for human analysts. In order to improve automation and scalability, we propose an alternative direction based on distributed representations of binary programs with applicability to a number of downstream tasks. We introduce Bin2vec, a new approach leveraging Graph Convolutional Networks (GCN) along with computational program graphs in order to learn a high dimensional representation of binary executable programs. We demonstrate the versatility of this approach by using our representations to solve two semantically different binary analysis tasks – functional algorithm classification and vulnerability discovery. We compare the proposed approach to our own strong baseline as well as published results, and demonstrate improvement over state-of-the-art methods for both tasks. We evaluated Bin2vec on 49191 binaries for the functional algorithm classification task, and on 30 different CWE-IDs including at least 100 CVE entries each for the vulnerability discovery task. We set a new state-of-the-art result by reducing the classification error by 40% compared to the source-code based inst2vec approach, while working on binary code. For almost every vulnerability class in our dataset, our prediction accuracy is over 80% (and over 90% in multiple classes).

  • Conference Article
  • Cite Count Icon 259
  • 10.1145/2911451.2911502
Discrete Collaborative Filtering
  • Jul 7, 2016
  • Hanwang Zhang + 5 more

We address the efficiency problem of Collaborative Filtering (CF) by hashing users and items as latent vectors in the form of binary codes, so that user-item affinity can be efficiently calculated in a Hamming space. However, existing hashing methods for CF employ binary code learning procedures that most suffer from the challenging discrete constraints. Hence, those methods generally adopt a two-stage learning scheme composed of relaxed optimization via discarding the discrete constraints, followed by binary quantization. We argue that such a scheme will result in a large quantization loss, which especially compromises the performance of large-scale CF that resorts to longer binary codes. In this paper, we propose a principled CF hashing framework called Discrete Collaborative Filtering (DCF), which directly tackles the challenging discrete optimization that should have been treated adequately in hashing. The formulation of DCF has two advantages: 1) the Hamming similarity induced loss that preserves the intrinsic user-item similarity, and 2) the balanced and uncorrelated code constraints that yield compact yet informative binary codes. We devise a computationally efficient algorithm with a rigorous convergence proof of DCF. Through extensive experiments on several real-world benchmarks, we show that DCF consistently outperforms state-of-the-art CF hashing techniques, e.g, though using only 8 bits, DCF is even significantly better than other methods using 128 bits.

  • Conference Article
  • Cite Count Icon 82
  • 10.1184/r1/6469466.v1
TIE: Principled Reverse Engineering of Types in Binary Programs
  • Jun 29, 2018
  • Jong-Hyup Lee + 2 more

A recurring problem in security is reverse engineering binary code to recover high-level language data abstractions and types. High-level programming languages have data abstractions such as buffers, structures, and local variables that all help programmers and program analyses reason about programs in a scalable manner. During compilation, these abstractions are removed as code is translated down to operations on registers and one globally addressed memory region. Reverse engineering consists of “undoing” the compilation to recover high-level information so that programmers, security professionals, and analyses can all more easily reason about the binary code. In this paper we develop novel techniques for reverse engineering data type abstractions from binary programs. At the heart of our approach is a novel type reconstruction system based upon binary code analysis. Our techniques and system can be applied as part of both static or dynamic analysis, thus are extensible to a large number of security settings. Our results on 87 programs show that TIE is both more accurate and more precise at recovering high-level types than existing mechanisms.

  • Conference Article
  • Cite Count Icon 38
  • 10.1145/2714576.2714639
Automated Identification of Cryptographic Primitives in Binary Code with Data Flow Graph Isomorphism
  • Apr 14, 2015
  • Pierre Lestringant + 2 more

Softwares use cryptographic algorithms to secure their communications and to protect their internal data. However the algorithm choice, its implementation design and the generation methods of its input parameters may have dramatic consequences on the security of the data it was initially supposed to protect. Therefore to assess the security of a binary program involving cryptography, analysts need to check that none of these points will cause a system vulnerability. It implies, as a first step, to precisely identify and locate the cryptographic code in the binary program. Since binary analysis is a difficult and cumbersome task, it is interesting to devise a method to automatically retrieve cryptographic primitives and their parameters.In this paper, we present a novel approach to automatically identify symmetric cryptographic algorithms and their parameters inside binary code. Our approach is static and based on DFG isomorphism. To cope with binary codes produced from different source codes and by different compilers and options, the DFG is normalized using code rewrite mechanisms. Our approach differs from previous works, that either use statistical criteria leading to imprecise results, or rely on heavy dynamic instrumentation. To validate our approach, we present experimental results on a set of synthetic samples including several cryptographic algorithms, binary code of well-known cryptographic libraries and reference source implementation compiled using different compilers and options.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/iccbb.2018.8756383
Bintaint: A Static Taint Analysis Method for Binary Vulnerability Mining
  • Nov 1, 2018
  • Zenan Feng + 3 more

Vulnerabilities in the current network space has been increasingly concerned by all parties. More and more software is currently in the form of binary code in practical applications. Therefore, the research on vulnerability mining technology for binary code attracts more attention from the researchers. In order to address the problem of binary vulnerability mining, this paper focuses on the taint analysis, and proposes a method called Bintaint, which can perform static taint analysis and generate the taint control flow graph (TCFG). In addition, we implement a system based on Bintaint to reduce the path explosion in the traditional vulnerability analysis process. Combined with our bidirectional search technology and path guidance algorithm to assist the symbol execution technology to generate test cases, the high computational overhead caused by complex constraints can be reduced. Finally, we evaluate our method on X86 programs and six core programs in different architecture embedded devices, and the result shows that our method detect all known vulnerability without false negative and mitigates the problem of path explosion and computational overhead effectively.

  • Research Article
  • 10.1007/s12083-020-00980-9
High utility itemset mining using path encoding and constrained subset generation
  • Aug 22, 2020
  • Peer-to-Peer Networking and Applications
  • Vamsinath Javangula + 2 more

In this paper a two phase approach for high utility itemset mining has been proposed. In the first phase potential high utility itemsets are generated using potential high utility maximal supersets. The transaction weighted utility measure is used in ascertaining the potential high utility itemsets. The maximal supersets are obtained from high utility paths ending in the items in the transaction database. The supersets are constructed without using any tree structures. The prefix information of an item in a transaction is stored in the form of binary codes. Thus, the prefix information of a path in a transaction is encoded as binary codes and stored in the node containing the item information. The potential high utility itemsets are generated from the maximal supersets using a modified set enumeration tree. The high utility itemsets are then obtained from the set enumeration tree by calculating the actual utility by scanning the transaction database. The experiments highlight the superior performance of the system compared to other similar systems in the literature.

  • Research Article
  • Cite Count Icon 4
  • 10.1109/jphot.2013.2280517
Photonic Microwave Frequency Measurement With High-Coding-Efficiency Digital Outputs and Large Measurement Range
  • Oct 1, 2013
  • IEEE Photonics Journal
  • Bing Lu + 4 more

A photonic approach to estimating microwave frequency with high-coding-efficiency digital outputs and large measurement range is proposed and experimentally demonstrated. In the proposed approach, an optical filter array that consists of N filters is designed, wherein N - 1 optical phase-shifted filters have an identical free spectral range (FSR) but a phase increment of π/(N - 1) in the transmission responses and one filter has a doubled FSR. The filters are then employed to process the single optical sideband generated by applying a microwave signal to a carrier-suppressed single-sideband (CS-SSB) modulation module, to perform the frequency-to-amplitude conversion and the analog-to-digital conversion simultaneously. After power detection and decision operation, an N-bit digital result in the form of binary code is obtained for microwave frequency measurement within the range of 2 × FSR. A proof-of-concept experiment is performed to verify the proposed approach. A 5-bit binary code with effective number of bits of four is generated to indicate the microwave frequency in the range from 10 to 40 GHz.

  • Conference Article
  • Cite Count Icon 61
  • 10.14722/ndss.2015.23185
No More Gotos: Decompilation Using Pattern-Independent Control-Flow Structuring and Semantics-Preserving Transformations
  • Jan 1, 2015
  • Khaled Yakdan + 3 more

Decompilation is important for many security applications; it facilitates the tedious task of manual malware reverse engineering and enables the use of source-based security tools on binary code. This includes tools to find vulnerabilities, discover bugs, and perform taint tracking. Recovering high-level control constructs is essential for decompilation in order to produce structured code that is suitable for human analysts and sourcebased program analysis techniques. State-of-the-art decompilers rely on structural analysis, a pattern-matching approach over the control flow graph, to recover control constructs from binary code. Whenever no match is found, they generate goto statements and thus produce unstructured decompiled output. Those statements are problematic because they make decompiled code harder to understand and less suitable for program analysis. In this paper, we present DREAM, the first decompiler to offer a goto-free output. DREAM uses a novel patternindependent control-flow structuring algorithm that can recover all control constructs in binary programs and produce structured decompiled code without any goto statement. We also present semantics-preserving transformations that can transform unstructured control flow graphs into structured graphs. We demonstrate the correctness of our algorithms and show that we outperform both the leading industry and academic decompilers: Hex-Rays and Phoenix. We use the GNU coreutils suite of utilities as a benchmark. Apart from reducing the number of goto statements to zero, DREAM also produced more compact code (less lines of code) for 72.7% of decompiled functions compared to Hex-Rays and 98.8% compared to Phoenix. We also present a comparison of Hex-Rays and DREAM when decompiling three samples from Cridex, ZeusP2P, and SpyEye malware families.

  • Book Chapter
  • Cite Count Icon 20
  • 10.1007/978-3-030-22038-9_14
TypeMiner: Recovering Types in Binary Programs Using Machine Learning
  • Jan 1, 2019
  • Alwin Maier + 3 more

Closed-source software is a major hurdle for assessing the security of computer systems. In absence of source code, it is particularly difficult to locate vulnerabilities and malicious functionality, as crucial information is removed by the compilation process. Most notably, binary programs usually lack type information, which complicates spotting vulnerabilities such as integer flaws or type confusions dramatically. Moreover, data types are often essential for gaining a deeper understanding of the program logic. In this paper we present TypeMiner, a static method for recovering types in binary programs. We build on the assumption that types leave characteristic traits in compiled code that can be automatically identified using machine learning starting at usage locations determined by an analyst. We evaluate the performance of our method with 14 real world software projects written in C and show that it is able to correctly recover the data types in 76%–93% of the cases.

  • Research Article
  • Cite Count Icon 10
  • 10.3844/ajeassp.2009.317.323
Low-Cost Encoding Device for Optical Code Division Multiple Access System
  • Feb 1, 2009
  • American Journal of Engineering and Applied Sciences
  • Mohammad Syuhaimi Ab-Rahman + 3 more

Problem statement: Instead of using Fiber Bragg Grating (FBG) to develop the coded spectrums, which consist of expensive elements, the grating also are highly sensitive to environmental changes and this will contribute to the increment of capital and operational expenditures (CAPEX and OPEX). Approach: This study presented the development of low-cost 16-ports encoding device for Optical Code Division Multiple Access (OCDMA) systems based on Arrayed Waveguide Grating (AWG) devices and optical switches. The encoding device is one of the new technologies that used to transmit the coded data in the optical communication system by using AWG and optical switches. It provided a high security for data transmission due to all data will be transmitted in binary code form. The output signals from AWG were coded with a binary code that given to an optical switch before it signal modulate with the carrier and transmitted to the receiver. The 16-ports encoding device used 16 Double Pole Double Throw (DPDT) toggle switches to control the polarization of voltage source from +5 V to -5 V for 16 optical switches. When +5 V was given, the optical switch will give code '1' and vice versa. Results: We found that the insertion loss, crosstalk, uniformity and Optical Signal-Noise-Ratio (OSNR) for the developed prototype are <12 dB, 9.77 dB, <1.63dB and ≥20 dB. Conclusion: We had successful developed the AWG-based OCDMA encoding device prototype and characterized using linearity testing and continuous signal testing. The developed prototype was expected to be applied in the optical communication system on Passive Optical Networks (PONs).

  • Conference Article
  • Cite Count Icon 49
  • 10.1109/cvpr.2011.5995391
Fast and high-performance template matching method
  • Jun 1, 2011
  • Alexander Sibiryakov

This paper proposes a new template matching method that is robust to outliers and fast enough for real-time operation. The template and image are densely transformed in binary code form by projecting and quantizing histograms of oriented gradients. The binary codes are matched by a generic method of robust similarity applicable to additive match measures, such as L <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sub> - and Hamming distances. The robust similarity map is computed efficiently via a proposed Inverted Location Index structure that stores pixel locations indexed by their values. The method is experimentally justified in large image patch datasets. Challenging applications, such as intra-category object detection, object tracking, and multimodal image matching are demonstrated.

  • Conference Article
  • Cite Count Icon 137
  • 10.1145/2664243.2664269
Leveraging semantic signatures for bug search in binary programs
  • Dec 8, 2014
  • Jannik Pewny + 4 more

Software vulnerabilities still constitute a high security risk and there is an ongoing race to patch known bugs. However, especially in closed-source software, there is no straightforward way (in contrast to source code analysis) to find buggy code parts, even if the bug was publicly disclosed.

  • Research Article
  • Cite Count Icon 12
  • 10.1109/tc.1981.6312175
P-functions: A new tool for the analysis and synthesis of binary programs
  • Feb 1, 1981
  • IEEE Transactions on Computers
  • Andre Thayse

Considers the realization of switching functions by programs composed of certain conditional transfers (binary programs). Methods exist for optimizing binary trees, i.e. binary programs without reconvergent instructions. This paper studies methods for optimizing binary simple programs (programs with possible reconvergent instructions, but where a variable may be tested only once during a computation) and binary programs. The hardware implementations of these programs involve either multiplexers or demultiplexers and OR-gates.

  • Conference Article
  • Cite Count Icon 1
  • 10.1145/3355378.3355383
Efficient and Precise Dynamic Construction of Control Flow Graphs
  • Sep 23, 2019
  • Andrei Rimsa + 2 more

The extraction of high-level information from binary code is an important problem in programming languages, whose solution supports the detection of malware in binary code and the construction of dynamic program slices. The Control Flow Graph is one of the instruments used to represent the structure of binary programs. Most solutions to reconstruct CFGs from binary programs rely on purely static techniques, based either on data-flow analyses, or in type inference. In contrast, in this work we use a purely dynamic approach to such a purpose. Our technique can be used alone, or in combination with static analysis tools. We demonstrate that it is possible to verify completeness in several real-world programs. We also show how to combine our technique with DynInst, the current state-of-the-art static CFG reconstructor. By providing DynInst with extra information, we improve its capacity to deal with indirect jumps. Our dynamic CFG reconstructor has been implemented on top of valgrind. When applied on cBench, this implementation is able to completely cover 36% of all the functions available in that suite. It adds an average overhead of 43x onto the execution of the original programs. Although expressive, this overhead is almost four times lower than the overhead of DCFG, a tool distributed by Intel, and built on top of PinPlay.

  • Research Article
  • Cite Count Icon 2
  • 10.1007/s13369-021-05630-7
Multi-Level Cross-Architecture Binary Code Similarity Metric
  • Apr 16, 2021
  • Arabian Journal for Science and Engineering
  • Meng Qiao + 6 more

Cross-architecture binary code similarity metric is a fundamental technique in many machine learning-based binary program analysis methods. Some researches recently utilize graph embedding methods to generate binary code embedding and regard Euclidean distance between two binary code as a similarity. However, these researches utilize manual features and do not make full use of binary code structure information, which causes the loss of binary code information. To solve above problems, we propose a multi-level neural network model to generate binary code embedding, which includes CFG(control flow graph) structure information and basic block information. We could measure the cross-architecture similarity through the Euclidean distance of binary code embedding. We conduct a series of experiments to compare the similarity of cross-architecture binary code, and the results demonstrate that our model can overcome the limitations described above and show superiority over the state-of-the-art methods.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon
Setting-up Chat
Loading Interface