Version-Wide Software Birthmark via Machine Learning

Chih-Ko Chung,Pi-Chung Wang

doi:10.1109/access.2021.3103186

Abstract

Identifying the credibility of executable files is critical for the security of an operating system. Modern operating systems rely on code signing, which uses a default-valid trust model, for executable files to identify their publishers. A malware could pass software validation of operating systems and security software by using counterfeit code-signing certificates. Although the counterfeit certificates can be revoked by CAs, the previous research showed that the revocation delay takes as long as 5.6 months. In this paper, we attempt to identify the credibility of software with multiple-version executable files without relying on public key infrastructure (PKI), where a new-version executable file is usually developed incrementally based on the previous versions. The sharing features among different versions can be extracted for identifying the software. Accordingly, we present a software-birthmark scheme to serve our purpose. Our scheme generates a cross-version software birthmark for executable files of the same software. The proposed software birthmark is a binary-classification model of a machine learning algorithm based on imported and exported function names extracted from different-version executable files. To evaluate the performance of version-wide software birthmarks, our experiments include 138 versions of Windows kernel32.dll and 545 versions of firefox.exe . We also use multiple machine learning algorithms for performance comparisons. The results show that proposed software birthmark can effectively identify the derivations of these executable files. The proposed software birthmark can be used by operating systems or security software to evaluate the credibility of executable files with suspicious certificates.

Highlights

Code signing [1] is a process of digitally signing an executable file to confirm the software publishers
We describe the procedures for extracting feature strings from IAT and EAT as well as transforming these feature strings into local feature vectors for machine learning algorithms
Unlike the previous algorithms of software birthmarks designed for detecting software theft and piracy, our scheme generates one birthmark based on the different-version executable files of a program by using machine learning algorithms

Summary

INTRODUCTION

Code signing [1] is a process of digitally signing an executable file to confirm the software publishers. The proposed scheme of version-wide software birthmark (VWSB) could identify the credibility of an executable file without relying on PKI. We present the first scheme of generating cross-version software birthmarks by using machine learning algorithms. The proposed software birthmark is a binaryclassification model for identify whether an executable file is a different-version PE file of the same program without relying on PKI. We develop a procedure for extracting commonly available features from PE files These features are input into the machine learning algorithms for training to generate a binary-classification model. The experimental results show that the models generated by several machine learning algorithms can identify cross-version PE files with high accuracy.

CODE SIGNING

BINARY CODE SIMILARITY

IMPLEMENTATION OF VWSB

CALCULATION OF FILE CHARACTERISTICS

CLASSIFICATION BASED ON VWSB

COMPARISONS WITH CODE SIGNING

EXPERIMENTS

STATISTICS OF EXTRACTED FEATURES

PERFORMANCE OF VWSB

Findings

CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Version-Wide Software Birthmark via Machine Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Journal: IEEE Access	Publication Date: Jan 1, 2021
License type: CC BY 4.0

Similar Papers

Static analysis of anomalies and security vulnerabilities in executable files
Jay-Evan J Tevis ... John A Hamilton
-
Jay-Evan J Tevis, et. al.Jay-Evan J Tevis ... John A Hamilton
10 Mar 2006
10 Mar 2006

A Static Birthmark of Windows Binary Executables Based on Strings
Yesol Kim ... Younsik Jeong
-
Yesol Kim, et. al.Yesol Kim ... Younsik Jeong
01 Jul 2013
01 Jul 2013

Data hiding in windows executable files
...
-
, et. al. ...
16 Mar 2010
16 Mar 2010

Measuring similarity of windows applications using static and dynamic birthmarks
Dongjin Kim ... Yongman Han
-
Dongjin Kim, et. al.Dongjin Kim ... Yongman Han
18 Mar 2013
18 Mar 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Version-Wide Software Birthmark via Machine Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access