State machine inference for security protocols
This webpage tries to give a overview of experiments with using state machine inference (aka active automata learning) to analyse implementations of standard security protocols. If you know of experiments that we are missing in this overview, please send us an email!
Background
Implementing a security protocol usually involves implementing a finite state machine (aka a Deterministic Finite Automata or DFA) to correctly handle different protocol flows. If the happy flow is not implemented correctly then you quickly notice, as the implementation simply will not work, but logical flaws in handling non-standard protocol flows can be harder to spot. Such flaws can introduce security vulnerabilities, as has for instance been the case for TLS (see here and here).
State machine inference is a technique to automatically infer the state machine from an implementation by means of black-box testing. This means we obtain formal models for free! Inspecting the state machines obtained - either manually or using a model-checker - can reveal flaws in implementations. It may also reveal differences between implementations of the same protocol that allow fingerprinting of particular implementations. Such differences are also an indication that the specifications may be unclear or ambiguous.
Often protocol state machines are not well-specified or left largely implicit in specifications. Paying explicit attention to these state machines can be seen as adhering to the principles of LangSec (language-theoretic security), as discussed in more detail here. Active state machine learning can also be regarded as a form of fuzzing, where you do not fuzz individual messages, but rather sequences of messages.
Further reading
The idea of state machine learning goes back to the work by Angluin from the 1980s. A standard software library for inferring state machines which we have used extensively in our own work is LearnLib. One of the first tools to apply the idea of state machine learning to security protocols was Prospex, which specifically targetted botnet protocols, with the aim to automatically reverse engineer these. Overviews of automated tools for protocol reverse engineering, not just using active state machine learning but also other techniques, is given in this survey paper or this one.
Overview of protocols analysed
Below an overview of research into the use of using active learning to analyse implementations of standardised security protocols. (So we exclude e.g. botnet protocols in this overview.)
This wiki provides additional information about some of the experiments listed below. This wiki also includes information n some case studies with protocols that are not security-related, such as
TCP,
MQTT, and
SIP.
Network security protocols
- TLS
- Joeri de Ruiter, A Tale of the OpenSSL State Machine: a Large-scale Black-box Analysis, NordSec 2016, LNCS volume 10014, pp. 169-184, Springer, 2016.
[Models, Scripts]
- Joeri de Ruiter and Erik Poll, Protocol state fuzzing of TLS implementations, USENIX Security, USENIX, 2015.
[Code, Models]
- SSH
- Paul Fiterau-Brostean, Frits Vaandrager, Erik Poll, Joeri de Ruiter, Toon Lenaerts and Patrick Verleg, Model Learning and Model Checking of SSH Implementations, Paul Fiterau-Brostean, Frits Vaandrager, Erik Poll, Joeri de Ruiter, Toon Lenaerts and Patrick Verleg, SPIN 2017, p. 142-151, ACM, 2017.
[Models]
- Toon Lenaerts, Improving-protocol state fuzzing of SSH, Bachelor thesis, Radboud University, 2017.
- Patrick Verleg, Inferring SSH state machines using protocol state fuzzing, Master thesis, Radboud University Nijmegen, 2016.
- OpenVPN
- IPSEC
Smartcards and banking tokens
- EMV (EuroPay-Mastercard-Visa)
- EMV-based banking tokens
- e-passport
Industrial Control Systems