Using HCTR2 with Luks – klog reflexiones klondikeñas

As you might know, with the release of the Linux kernel version 6.0, it is now possible to use HCTR2 on the cryptographic API. This is an interesting addition because, like Adiantum, HCTR2 ensures changes affect the whole ciphertext whilst supporting hardware acceleration. In this article I will talk more about why this is interesting and how to use it when encrypting hard drives.

As you probably have figured out I’m only publishing this post because I have failed to find the information by myself and had to figure things out. This was not helped by the fact that cryptsetup has a bug in the benchmark command preventing using “capi” cipher specifications. Anyways, let’s talk a bit about why HCTR2 is an interesting way of encrypting hard drives.

With that said, the short answer is that you use the cipher specification aes-hctr2-plain64 for example with a command like the following:
# cryptsetup luksFormat path-to-dev --cipher aes-hctr2-plain64 --key-size 128 --sector-size 4096 --label root --subsystem yourcomputer --type luks2 --pbkdf argon2id --hash sha512 --pbkdf-force-iterations 4 --pbkdf-memory 4194304 --pbkdf-parallel 4 --use-urandom --keyslot-cipher aes-xts-plain64 --keyslot-key-size 512

Please notice that HCTR2 does not require twice the key length as XTS does!

But what exactly is HCTR2? Well, HCTR2 is a variable-length “tweakable super-pseudorandom permutation”. This babble is just cryptographer-lingo to explain the following: HCTR2 is a black box which picks as inputs a block of a specific length, a key, and a value to adjust the cipher called tweak for which changes are “cheap”, then produces as output a block of the same length as the input. The algorithm is secure because an attacker cannot distinguish our black box from another one which just maps all inputs to a randomly chosen output (which has no matching input yet), even if the attacker can use the black box and the inverse. In other words, HCTR2 takes a whole block of data as input and outputs a whole block on which each bit of the output is influenced by each bit of the input. Or, in other words, changing a single bit will affect around half of the bits of the output.

Because of this, HCTR2 is VERY interesting for things like full-disk encryption being tweakable (as XTS is), ensures different encryption behavior on each block without having to calculate a new key for that (with the computational costs doing so entails when using AES). Also, because HCTR2 works over the whole block, the resulting encryption only lets an attacker see that the block has changed but not where inside of the block have changes happened. This is useful to avoid disclosing information like for example the filesystem structure accidentally.

How does HCTR2 manage to do such a feat? The construction is “relatively” simple. HCTR stands for Hash and CTR mode and that is what HCTR2 does. First you divide the block into a smaller block with the same length as the block cipher (128 bits for AES) and another block with everything else. You then compute the hash using a key dependent value and the tweak of the larger block. You then xor this hash with the smaller block and encrypt it. Next you xor the plaintext you just encrypted with the ciphertext and a key dependent value to initialize the CTR counter which you use to encrypt the larger block. Finally you hash the larger ciphertext using the same key and tweak dependent hash you used before and xor this hash with the ciphertext of the smaller block. Then you just combine both ciphertexts. And voila, the smaller block has been influenced by all of the bits of the larger block and vice versa. Similarly, decryption works the same way but in reverse.

The construction is designed so that a change in any of the bits of the plaintext or the ciphertext has a chance of influencing all of the bits of its counterpart. For example, if we changed a bit of the larger ciphertext the hash would change, this would change the shorter ciphertext and thus the resulting seed used for the counter mode affecting all other bits of the larger ciphertext. Similarly, changing a bit in the shorter ciphertext would change the resulting shorter plaintext and thus the seed used on the counter mode thus also affecting all the bits of the larger ciphertext. The logic when changing the plaintexts would work in a similar way with the hash of the larger plaintext affecting the shorter plaintext which in turn affects the CTR seed and a change in the shorter plaintext directly affecting the CTR seed.

There are a few issues with HCTR2 though. The most obvious one is that although HCTR2 calls the encryption function a similar number of times to the number of block of the input, it needs to hash the larger plaintext twice. Once before encryption and another time after it happens. Although this hash operation is also hardware accelerated in many processors (using the carry less multiplication operation) this requires being able to go through the whole block twice. Some other minor issue is that the tweak size is twice the blocksize (instead of the blocksize as it is more common) which makes that the cryptsetup benchmark program fails (although the results are still incorrect as I will explain later). Finally, like the software focused Adiantum mode, support for it is still not enabled on many distros.

So what is the performance compared to XTS? On an eight generation Core i5 processor cryptsetup reports HCTR2 as processing two thirds of the number of bytes XTS does. Unfortunately, the cryptsetup benchmark is broken: it just asks the kernel to encrypt a large chunk of memory of around 1MiB which is then processed in chunks of 64KiB. This is VERY problematic because my level one cache is only 32KiB, as a result the whole 64KiB cannot fit in it and will likely end up on the significantly slower level two cache. This means that each data access will have between two to three times the latency as the accesses evict the remaining block from the cache each of the to times it is processed.

Instead, if we measure encryption to a ramdisk, the performance of HCTR2 is quite similar to that of XTS for block-sized writes even with different write sizes. For reads there is still a slight difference of around 20% compared to XTS. This difference is significantly worsened for smaller block-sizes.

So, as we have seen, HCTR2 is slightly slower than XTS but that is the price you pay for the extra security provided. There is still job to be done to support it in linux distributions and is unlikely we will see it used with self-encrypting disks any time soon. Nevertheless, if you are concerned about attackers able to check the hard drive state at different stages or see differences in disk structures, HCTR2 is significantly more secure than XTS and should be considered.