Formal Language Theory for Practical Security - Extended Abstract -

Andreas Jakoby,Jannis Leuther,Stefan Lucks

doi:10.1109/spw53761.2021.00023

Abstract

When binary data are sent from one party to another one, the encoding of the data can be described as a “data serialisation” language (DaSeL). Many DaSeLs employ the “length-prefix” pattern for strings, containers and other data items of variable length. This consists of an encoding of the item’s length, followed by an encoding of the item itself without closing brackets or “end” symbols. The receiver must determine the final byte from the length read before. Lengthprefix languages are not context-free. Thus, the plethora of tools and methods to specify, analyse, and parse context-free languages appears to be useless for length-prefix languages. This seems to explain why improper specffications of length-prefix languages and buggy hand-written parsers are so often a root cause for security issues and exploits, as, e.g., in the case of the famous Heartbleed bug. One might even be tempted to consider the use of length-prefix languages a security hazard. But this consideration would be wrong. We present a transformation of words from “calc-context-free” languages (a superset of context-free and length-prefix languages) into words from proper context-free languages. The transformation actually allows to use tools from context-free languages to deal with length-prefix languages. Our transformation runs on a Turing machine with logarithmic space. This implies the theoretical result of calc-context-free languages being in the complexity class log$C\mathcal{F}\mathcal{L}$. Similarly, deterministic calc-context-free languages are in log$\mathcal{D}\mathcal{C}\mathcal{F}\mathcal{L}$. To run in linear time, one needs to enhance the Turing machine by a stack to store additional data.

Full Text