Detecting command injection vulnerabilities in Linux-based embedded firmware with LLM-based taint analysis of library functions

Junjian Ye,Xincheng Fei,Xavier De Carné De Carnavalet,Lianying Zhao,Lifa Wu,Mengyuan Zhang

doi:10.1016/j.cose.2024.103971

Abstract

With the popularization of IoT devices, embedded firmware security has attracted people’s attention. Command injection (CI) is one of the most common types of vulnerabilities in Linux-based embedded firmware. It is caused by user input being propagated to functions responsible for command execution without strict sanitization, which can be detected by static taint analysis. Unfortunately, single-binary taint analysis tools cannot find vulnerabilities caused by custom dynamically linked library functions (DLLFs) that are implemented in external library files, while multi-binary analysis tools are time-consuming. In this paper, we present SLFHunter, an approach that leverages Large Language Model (LLM) to analyze sensitive custom DLLFs separately, and imports their information into single-binary taint analysis tools to overcome this challenge. Our approach follows filtering rules to find out sensitive DLLFs that call common sink functions, and analyzes them with LLMs to find sink library functions (SLFs) where input parameters can be passed to executed command strings. Finally, SLFs are marked as new sinks to help existing tools discover CI vulnerabilities caused by them. We implemented SLFHunter as a ChatGPT-based module for EmTaint and evaluated it with a dataset consisting of 100 Linux-based embedded firmware samples from 13 vendors. The results show that our prompts can guide ChatGPT 4.0 to identify SLFs with 95% accuracy after being improved with a trick we dubbed “double-check”. SLFHunter can help EmTaint find 42 additional CI vulnerabilities with an average time cost increase of 89 s on our dataset, which demonstrates the effectiveness and efficiency of our approach.

Full Text