Crib Dragging and Markov Chain Models for Cryptanalysis of One-Time Pads
24 Jun 2018
Properties of XOR
The “exclusive OR” (XOR) logical operation is an important building block in modern cryptography. By definition, XOR is true only when its two logical inputs differ, i.e. one is true while the other is false,
XOR can be easily extended to work on whole strings by applying it to each bit separately.
Mathematically, $({0,1}, \oplus)$ is the abelian group of addition $\mod 2$, while $({0,1},\wedge, \oplus)$ is the $\mathbb{F}_2$ field.
One-Time Pad
XOR can mix two bit-streams in a reversible way, which leads us to the most fundamental of ciphers - the One-Time Pad (OTP). Simply XOR the plaintext $P$ with key $K$ to get ciphertext $C$ and reverse this by XOR-ing the ciphertext $C$ with the key $K$ again.
Encryption:
Decryption:
Note that the key:
- must be of the same length as the plaintext,
- must be a truly random string of bits,
- must be transferred to the other party in a secure way,
- cannot be reused to encrypt multiple messages.
One-Time Pad has a number of interesting properties. First of all, it provides perfect secrecy which means that the raw ciphertext gives no additional information about the plaintext. In fact, ciphertexts can be transformed into any plaintexts with a carefully manufactured key.
Historically, One-Time Pad was used for securing diplomatic messages through codebooks containing randomly generated keys. There is also some speculation that Number Stations are used to transmit OTP encoded messages to spies across the globe.
Crib Dragging
Theoretically, attacks on One-Time Pad are impossible. However, incorrect use of the cipher can significantly impact its security.
Notice that if the encryption key is reused, the XOR of ciphertexts equals the XOR of plaintexts:
Assuming that the plaintext is a natural language message written in English, we can guess some words that typically appear in a sentence, e.g. “ the “ (note the surrounding spaces). These guesses are called cribs. Let $C^\star = C_1 \oplus C_2$ and from the above we have $C^\star = P_1 \oplus P_2$. If we could guess the right position of a crib in any of the plaintext, XOR-ing the crib with $C^\star$ would reveal the contents of the other plaintext message. Of course, we do not know where the crib is so we try all possible locations and look for places where we get natural-language-looking results. This process is called Crib Dragging and can be somewhat automated using Python.
We will need to define some helper functions:
And some sample plaintexts:
Now crack it:
A Language Model Attack
A key limitation of Crib Dragging is that the cryptanalyst has to guess the cribs and then look for partial words in decrypted portions of the plaintext. Natural Language Processing can help with this attack as shown in the paper “A Natural Language Approach to Automated Cryptanalysis of Two-Time Pads”.
Instead of guessing cribs, the authors build an n-gram Markov Chain language model to estimate the probability of strings. Markov Chain models can generate random sentences and are often used to create natural-looking names or easy-to-remember passwords. Train the language model on a corpus similar to probable plaintexts, e.g. HTML pages, plain-text English documents, MS Word files etc. Most tutorials on Natural Language Processing with Markov Chain models describe simple bi-grams. However, in many real-life applications it is better to consider chains of higher order. Note that for large $n$, the n-gram sample size quickly drops to 0. The solution is to use the Witten-Bell backoff smoothing to estimate probabilities for n-grams that do not appear in the corpus.
A Markov Model is interpreted as a graph with each vertex representing a possible n-gram and each edge labelled with transition letter and its probability. Generate a sentence from the model by running a simple random walk on the graph. To break the One-Time Pad, the model needs to account for both plaintexts in parallel. Assuming that the two plaintexts are independent, combine two identical Markov Chain language models into a joint cross-product model. Now each vertex in the graph will hold two parallel n-grams and each edge will be labelled with two transition letters and their probabilities. Add an index to each vertex to track the location of the crib.
To significantly reduce the size of the search space, keep only the vertices and edges that are consistent with ciphertexts’ XOR, i.e. the two n-grams in a vertex XOR to $C^\star$. Finally, replace the weight in each edge with the negative log and find the the shortest optimal path using the Dijkstra (or the Viterbi) algorithm.