Hardware-Assisted Circumvention of Self-Hashing Software Tamper Resistance

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Self-hashing has been proposed as a technique for verifying software integrity. Appealing aspects of this approach to software tamper resistance include the promise of being able to verify the integrity of software independent of the external support environment, as well as the ability to integrate code protection mechanisms automatically. In this paper, we show that the rich functionality of most modern general-purpose processors (including UltraSparc, x86, PowerPC, AMD64, Alpha, and ARM) facilitate an automated, generic attack which defeats such self-hashing. We present a general description of the attack strategy and multiple attack implementations that exploit different processor features. Each of these implementations is generic in that it can defeat self-hashing employed by any user-space program on a single platform. Together, these implementations defeat self-hashing on most modern general-purpose processors. The generality and efficiency of our attack suggests that self-hashing is not a viable strategy for high-security tamper resistance on modern computer systems.

Similar Papers
  • Dissertation
  • Cite Count Icon 2
  • 10.22215/etd/2005-08278
A generic attack on hashing-based software tamper resistance
  • Oct 4, 2018
  • Glenn Wurster

Self-hashing forms of software tamper resistance have been considered efficient in protecting the integrity of an application. Hashing allows a running application to quickly determine whether the program code has been modified and respond accordingly. Self-hashing relies on being able to accurately read the code of an application in memory. In this thesis, we demonstrate that hash code contained within the program being verified is vulnerable to attack. By using the modern processor’s ability to separate code and data, self-hashing tamper resistance can be circumvented. We describe several possible implementations of an attack in this thesis. We have implemented one form of attack. All implementations are generic (i.e. they only need to be implemented once to work on a wide range of applications) and fast. Understanding work detailed in this thesis will help future tamper resistance algorithms withstand our attack.

  • Book Chapter
  • Cite Count Icon 5
  • 10.1007/978-3-642-15257-3_17
A Metric-Based Scheme for Evaluating Tamper Resistant Software Systems
  • Jan 1, 2010
  • Gideon Myles + 1 more

The increase use of software tamper resistance techniques to protect software against undesired attacks comes an increased need to understand more about the strength of these tamper resistance techniques. Currently the understanding is rather general. In this paper we propose a new software tamper resistance evaluation technique. Our main contribution is to identify a set of issues that a tamper resistant system must deal with and show why these issues must be dealt with in order to secure a software system. Using the identified issues as criteria, we can measure the actual protection capability of a TRS system implementation and provide guidance on potential improvements on the implementation. We can also enable developers to compare the protection strength between differently implemented tamper resistance systems. While the set of criteria we identified in this paper is by no means complete, our framework allows easy extension of adding new criteria in future.KeywordsSoftware Tamper ResistanceEvaluationMetrics

  • Research Article
  • 10.17586/2226-1494-2015-15-3-463-469
Centralized MAC protocol for hierarchical caching processors
  • May 15, 2015
  • Scientific and Technical Journal of Information Technologies, Mechanics and Optics
  • A.A Antonov + 7 more

The subject of research is the architecture of modern processors with hierarchical organization of cache subsystem. Analysis of implementation possibility of wireless connectivity between cores and Level 3 cache has been carried out. In order tospecify the requirements to wireless communication channel, communication in modern general-purpose processors has been explored by an example of Intel Core i7 (Haswell). Interaction model of cache subsystem components has been developed, and on its basis interaction characteristics between them are being evaluated. Analysis of the model shows that the latency of cache line transmission via the proposed channel is about 0.26 nanoseconds, which correlates well with the latency of L1 cache (about 2 nanoseconds). Also, wireless channel satisfies the distance requirements, giving the possibility for data transmission upto 3 centimeters, as well as power requirements with consumption of 1 uW. The result of research is the developed medium access protocol for wireless connectivity between computational cores and Level 3 cache. To account for required simplicity of implementation and efficiency of operation, it is proposed to use the single frequency range for all radio interfaces and time division multiple access scheme with prescribed fields for addressing and data. The paper deals with protocol data unit structure, which is used for communication between units. Possibility of shared time counter creationis used for synchronization between units. Time division duplex with possibly dynamic non-equal time shares is used to organize uplink and downlink communications. Time division mechanism gives the possibility for the system to adapt to load irregularities between the cores through allocation of various amounts of time slots for each core.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.1155/2010/291805
The Design and Implementation of Tamper Resistance for Mobile Game Service
  • Jan 1, 2010
  • Hang Bae Chang + 2 more

The commensurate number of the attacks and infringement targeting a vulnerability of the game service has been increasing constantly, due to the dramatic growth and expansion of the impact of the game industry. However, there exist no subsequent researches for the differentiated technology, which is to prevent the reverse function of the game service. Therefore, in this study, we examined the current status of infringement toward online game services which are provided in the market currently and designed the proper technical measures ('Software Tamper Resistance') for a manipulation of the game service which is the most vulnerable part. In detail, we have encrypted an execution file and decrypted it in real time process. After that we implemented antibebugging, disassemble, and antidump technology.

  • Conference Article
  • Cite Count Icon 31
  • 10.1109/pact.2015.32
Throttling Automatic Vectorization: When Less is More
  • Oct 1, 2015
  • Vasileios Porpodas + 1 more

SIMD vectors are widely adopted in modern general purpose processors as they can boost performance and energy efficiency for certain applications. Compiler-based automatic vectorization is one approach for generating codethat makes efficient use of the SIMD units, and has the benefit of avoiding hand development and platform-specific optimizations. The Superword-Level Parallelism (SLP) vectorization algorithm is the most well-known implementation of automatic vectorization when starting from straight-line scalar code, and is implemented in several major compilers. The existing SLP algorithm greedily packs scalar instructions into vectors starting from stores and traversing the data dependence graph upwards until it reaches loads or non-vectorizable instructions. Choosing whether to vectorize is a one-off decision for the whole graph that has been generated. This, however, is sub-optimal because the graph may contain code that is harmful to vectorization due to the need to move data from scalar registers into vectors. The decision does not consider the potential benefits of throttling the graph by removing this harmful code. In this work we propose asolution to overcome this limitation by introducing Throttled SLP (TSLP), a novel vectorization algorithm that finds the optimal graph to vectorize, forcing vectorization to stop earlier whenever this is beneficial. Our experiments show that TSLP improves performance across a number of kernels extractedfrom widely-used benchmark suites, decreasing execution time compared to SLP by 9% on average and up to 14% in the best case.

  • Conference Article
  • Cite Count Icon 26
  • 10.1145/1314276.1314291
Mechanism for software tamper resistance
  • Oct 29, 2007
  • W Michiels + 1 more

In software protection we typically have to deal with the white-box attack model. In this model an attacker is assumed to have full access to the software and full control over its execution. The goal of white-box cryptography is to implement cryptographic algorithms in software such that it is hard for an attacker to extract the key by a white-box attack. Chow et al. present white-box implementations for AES and DES. Based on their ideas, white-box implementations can be derived for other block ciphers as well. In the white-box implementations the key of the underlying block cipher is expanded from several bytes to a collection of lookup tables with a total size in the order of hundreds of kilobytes. In this paper we present a technique that uses a white-box implementation to make software tamper resistant. The technique interprets the binary of software code as lookup tables, which are next incorporated into the collection of lookup tables of a white-box implementation. This makes the code tamper resistant as the dual interpretation implies that a change in the code results in an unintentional change in the white-box implementation. We also indicate in the paper that it is difficult for an attacker to make modifications to the white-box implementation such that its original operation is restored.

  • Conference Article
  • Cite Count Icon 1
  • 10.1142/9781860948534_0019
TAMPER RESISTANCE VIA POISSON DISTRIBUTION
  • Jul 1, 2007
  • D Seetha Mahalaxmi + 1 more

Watermarking is a technique where we are going to embed secret message in a cover message. There are two types of watermarking. They are Static watermarking and Dynamic watermarking. Static watermarks are stored in the application executable itself. Dynamic Watermarks are stored in a program execution state, rather code itself. This makes them easier to tamperproof against obfuscating transformation. Technique such as watermarking and fingerprinting have been developed to discourage piracy [l], however, if no protective measures are taken, an attacker may be able to remove and/or destroy watermarks and fingerprints with relative ease once they have been identifed. For this reason, methods such as source code obfuscation [3] , code encryption [6] and self-verying code have been developed to help achieve some measure of Tamper-Resistance. One application of obfuscating address computations is to hide from an attacker the true target of some control transfer, In Software Tamper Resistance: Obstructing static analysis of programs [2] they have focused their efforts at the source code level. At the assembly level, however, even such modifications are translated into unambiguous control transfers such as direct and conditional jumps. A method of indirection [5,6] in which direct control transfers such as call and jump instructions are replaced with calls to specialized functions that are responsible for directing control to the intended targets in some stealthy manner. The intent is that the successors of each transformed basic block will be difficult to discover. The specialized functions, which they have termed as

  • Book Chapter
  • Cite Count Icon 27
  • 10.1007/978-3-642-16435-4_3
A Secure and Robust Approach to Software Tamper Resistance
  • Jan 1, 2010
  • Sudeep Ghosh + 2 more

Software tamper-resistance mechanisms have increasingly assumed significance as a technique to prevent unintended uses of software. Closely related to anti-tampering techniques are obfuscation techniques, which make code difficult to understand or analyze and therefore, challenging to modify meaningfully. This paper describes a secure and robust approach to software tamper resistance and obfuscation using process-level virtualization. The proposed techniques involve novel uses of software check summing guards and encryption to protect an application. In particular, a virtual machine (VM) is assembled with the application at software build time such that the application cannot run without the VM. The VM provides just-in-time decryption of the program and dynamism for the application’s code. The application’s code is used to protect the VM to ensure a level of circular protection. Finally, to prevent the attacker from obtaining an analyzable snapshot of the code, the VM periodically discards all decrypted code. We describe a prototype implementation of these techniques and evaluate the run-time performance of applications using our system. We also discuss how our system provides stronger protection against tampering attacks than previously described tamper-resistance approaches.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/aero50100.2021.9438345
High Throughput Multi-Threaded Software Defined Convolutional Interleaver
  • Mar 6, 2021
  • Mark Kubiak + 1 more

A convolutional interleaver is a data reordering operation used to distribute bursts of errors and improve the performance of forward error correction algorithms in the presence of dropout events. This type of interleaver is used in several commercial and military standards as it offers the same performance as a traditional rectangular interleaver but with half the memory requirement. This paper presents a very novel algorithm and architecture for convolutional interleaving implemented on modern general-purpose processors (CPUs). The naive implementation of a convolutional interleaver does not map well to modern multi-processor CPUs due to its non-sequential memory access pattern and inherently serial processing sequence. Non-sequential memory access is very inefficient in general-purpose processors as the high-speed, low-latency cache memory in modern CPU architectures assume some level of data locality. The sparse memory access pattern of a naive convolutional interleaver implementation has almost no data locality, causing the CPU to frequently stall as it waits on data from the high-latency external memory. In addition, these memory bottlenecks make attempts to multithread the software implementation pointless, as modern general-purpose processor cores share memory resources. The key breakthroughs described in the paper address both these challenges – the algorithm is modified to interleave buffers of bits in a ‘cache friendly’ order rather than the input order, leading to significantly higher single-core performance. Resolving the memory access bottleneck also enables an efficient parallelization approach that can take advantage of multiple cores with shared memory, pushing performance even higher. Whereas previous convolutional interleaver software implementations ran into memory bottlenecks around 0.5GSps, the multithreaded version of the presented algorithm can be consistently benchmarked at above 8 GSps on a mid-range server CPU.

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.is.2019.101449
Bitmap filter: Speeding up exact set similarity joins with bitwise operations
  • Oct 11, 2019
  • Information Systems
  • Edans F.O Sandes + 2 more

Bitmap filter: Speeding up exact set similarity joins with bitwise operations

  • Conference Article
  • Cite Count Icon 111
  • 10.1109/sp.2005.2
A Generic Attack on Checksumming-Based Software Tamper Resistance
  • May 8, 2005
  • G Wurster + 2 more

Self-checking software tamper resistance mechanisms employing checksums, including advanced systems as recently proposed by Chang and Atallah (2002) and Horne et al. (2002) have been promoted as an alternative to other software integrity verification techniques. Appealing aspects include the promise of being able to verify the integrity of software independent of the external support environment, as well as the ability to automatically integrate checksumming code during program compilation or linking. In this paper we show that the rich functionality of many modern processors, including UltraSparc and x86-compatible processors, facilitates automated attacks which defeat such checksumming by self-checking programs.

  • Conference Article
  • Cite Count Icon 3
  • 10.1145/1314257.1314262
A technique for self-certifying tamper resistant software
  • Oct 29, 2007
  • Hongxia Jin + 1 more

Until recently the use of software tamper resistance was rather limited. However, as the music and movie industries have increased their reliance on content protection systems, the importance placed on and the use of tamper resistance has also increased. Unfortunately, the nature of tamper resistance can make it difficult for developers to determine if a protection mechanism isactually robust and which attacks it can protect against. To address this issue we have designed a tool for self-certifying the strength of a tamper resistance implementation that is based on a hybrid attack-defense graph. This approachto tamper resistance evaluation is advantageous in that it enables certification without leaking confidential implementation details and it assists developers in designing more robust implementations.

  • Conference Article
  • Cite Count Icon 16
  • 10.1117/12.525581
Implementation of H.264 encoder on general-purpose processors with hyper-threading technology
  • Jan 7, 2004
  • Eric Q Li + 1 more

H.264 is the emerging video coding standard, which aims at compressing high-quality video contents at low bit-rates. While its new encoding and decoding processes are similar to many previous standards, the new standard includes a number of new features and thus requires much more computation than most existing standards do. The complexity of H.264 standard poses a large amount of challenges to implementing the encoder/decoder in real-time via software on personal computers. Even after 2~3x performance improvement with media instruction on modern general-purpose processors and another 2~4x improvement from algorithmic optimization, the H.264 encoder is still too complicated to be implemented in real-time on a single processor. Based on the detailed analysis of the possibilities of parallelism in H.264 encoder, we proposed an efficient multithreading implementation of the H.264 video encoder. In order to guarantee enough concurrency of the whole system, an elaborate macroblock and inter-frame parallel scheduling scheme is presented. In addition, our macroblock-based multithreading scheme achieves almost no video quality losses in contrast to other parallelization schemes. Our results show that the multithreaded encoder can obtain another 3.96x speed-up on a four-processor system or 4.6x speed-up on a four-processor system with Hyper-Treading Technology. The techniques demonstrated in this work can be applied not only to H.264, but also to other video/image coding/decoding applications on personal computers.

  • Research Article
  • Cite Count Icon 6
  • 10.1080/2151237x.2009.10129289
Integer Ray Tracing
  • Jan 1, 2009
  • Journal of Graphics, GPU, and Game Tools
  • Jared Heinly + 4 more

Despite nearly universal support for the IEEE 754 floating-point standard on modern general-purpose processors, a wide variety of more specialized processors do not provide hardware floating-point units and rely instead on integer-only pipelines. Ray tracing on these platforms thus requires an integer rendering process. Toward this end, we clarify the details of an existing fixed-point ray/triangle intersection method, provide an annotated implementation of that method in C++, introduce two refinements that lead to greater flexibility and improved accuracy, and highlight the issues necessary to implement common material models in an integer-only context. Finally, we provide the source code for a template-based integer/floating-point ray tracer to serve as a testbed for additional experimentation with integer ray tracing methods.

  • Research Article
  • Cite Count Icon 3
  • 10.1109/35.256883
Tools for real-time signal-processing research
  • Nov 1, 1993
  • IEEE Communications Magazine
  • J.H Snyder + 3 more

The use of digital signal processing (DSP) devices for real-time communication applications is discussed. The authors comment on distinguishing aspects of DSP architecture, describing not so much individual processors as those features common to DSPs and distinct from modern general-purpose processors. They describe three DSP32xx-based machines that support DSP algorithm implementation: SURF-board, HoBo, and DSP3. They also described rtpi, a source-code debugger for workstations and for the AT&T DSP32C signal-processor integrated circuit, and dspx, a collection of subroutines and host programs that provides an execution environment for DSPs akin to the UNIX environment. These tools facilitate the transfer of algorithms from mainframes or workstations to DSP hardware. Included are case studies of two real-time implementations: the low-delay CELP (LD-CELP) speech coder and the decoder side of the perceptual audio coder (PAC), an algorithm that compresses CD-quality audio into a 128-kb/s stream without perceptible distortion. >

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon