A few days ago, leo-stone reverse-engineered the Petya ransomware and found that it was possible to use genetic algorithms to find out the key. He even published a nice decrypting tool in go. I took his globe and decided to prove why this is mostly a specific issue of how Petya reduced Salsa-20 to use only 16 bit words.

In this blogpost I’ll do a cryptanalysis of the Petya encryption algorithm (as published by leo-stone) and reduce the key entropy so that a single known plaintext is enough to break the algorithm. I’ll also explain how to use these results to make a faster and more efficient decrypting tool. And finally I’ll explain why genetic algorithms as used by leo-stone worked.

The first we should check is how is the 16 character string mapped into a decryption key.

First we can check the part of the code showing the decryption key:

`func genesToKey(genes string) string {`

key := “”

for _, v := range genes {

key += string([]rune{v, ‘x’})

}

return key

}

This basically shows us that keys are the genes separated by the x character. Thus, the real key is actually 8 characters. The valid set of characters is show by this line of code:

`geneSet := "123456789abcdefghijkmnopqrstuvwxABCDEFGHJKLMNPQRSTUVWX" // you decide the set of valid genes`

That is exactly 54 characters, reducing the key to 54⁸ possibilities.

Now, we need to see how a key is mapped into the key elements in the matrix, this is done by this code:

`m.Key0 = map_alpha[b[0]]`

m.Key2 = map_alpha[b[1]]

m.Key4 = map_alpha[b[2]]

m.Key6 = map_alpha[b[3]]

m.Key8 = map_alpha[b[4]]

m.Key10 = map_alpha[b[5]]

m.Key12 = map_alpha[b[6]]

m.Key14 = map_alpha[b[7]]

Where the map_alpha vector is defined as follows:

`s := []byte("123456789abcdefghijkmnopqrstuvwxABCDEFGHJKLMNPQRSTUVWX")`

r := make([]uint16, len(s))

for i, v := range s {r[i] = uint16(v<<1)< map_alpha[v] = r[i]

}

So basically each byte in the key gets expanded as follows (from the input letter): the last 7 bits of the value are followed by the 9 bit result of adding the letter to the ascii value z: (or 122). Given how ascii works, the above operation will never overflow so we can obtain the original letter from the 16 bit value by removing 122 to the least significant 8 bits.

Petya’s encryption is mainly based on a 16-bit version of Salsa-20. This version applies all the steps using 16-bit words instead of 32-bit (and thus also constants) except the word rotations which are 32 bits. Since Salsa-20 is a stream cipher we can get the final state of the cipher’s matrix by using a known plain text. Apparently, Petya will xor the input with 0x37 and then with the output of the Salsa-20 cipher. Since some of the encrypted values are known to be 0, we can just xor with 0x37 to get back the result of the salsa-20 like cipher used.

One of the things that looked quite odd was the use of 32 bit shifts. This was probably caused by the fact that 18 is an invalid value for a shift made on a 16 bit integer. Because of this, I decided to analyze how the different shifts affected the different elements of the matrix.

If you check carefully you’ll see that the 16 bit sum of the two elements gets zero-padded to 32 bits before the rotation. Afterwards, the result is xored into another matrix element. The first issue this shows up is that some of the bits will be xored with a 0 and thus left equal. Indeed, the rotations can be replaced with simple shifts because of this. For example this code:

`u := uint32(me[0] + me[12])`

me[4] ^= uint16(u<>(32-7))

u = uint32(me[4] + me[0])

me[8] ^= uint16(u<>(32-9))

u = uint32(me[8] + me[4])

me[12] ^= uint16(u<>(32-13))

u = uint32(me[12] + me[8])

me[0] ^= uint16(u<>(32-18))

Would become instead:

`me[4] ^= uint16((me[0] + me[12])<<7)`

me[8] ^= uint16((me[4] + me[0])<<9)

me[12] ^= uint16((me[8] + me[4])<<13) me[0] ^= uint16((me[12] + me[8])>>14)

From here, it is quite easy to see that at least 5 bits of each of the matrix elements will be untouched across all the shuffle calls, bits 6 to 2. Thus reducing the real entropy of the keys to 88 bits (still a lot but we aren’t still done).

We can now draw the perturbation matrix for the column and row shuffling steps, this will result in the following (negative numbers represent the last n bits being affected).

Column shuffle bit perturbation table:

-2 | 13 | 9 | 7 |

7 | -2 | 13 | 9 |

9 | 7 | -2 | 13 |

13 | 9 | 7 | -2 |

Row shuffle bit perturbation table:

-2 | 7 | 9 | 13 |

13 | -2 | 7 | 9 |

9 | 13 | -2 | 7 |

7 | 9 | 13 | -2 |

If we combine both of them then we’ll get the following bit perturbation table:

-2 | 7 | 9 | 7 |

7 | -2 | 7 | 9 |

9 | 7 | -2 | 7 |

7 | 9 | 7 | -2 |

For simplicity, the same table as above with the meaning of each element:

Constant 0: -2 |
Key 0: 7 |
Key 2: 9 |
Key 4: 7 |

Key 6: 7 |
Constant 2: -2 |
Nonce 0: 7 |
Nonce 2: 9 |

Counter 0: 9 |
Counter 2: 7 |
Constant 4: -2 |
Key 8: 7 |

Key 10: 7 |
Key 12: 9 |
Key 4: 7 |
Constant 6: -2 |

Sadly, the matrix points which have the smallest perturbation (the first 2 bits) are constants, but there are parts of the key with 9 and 7 bits unperturbed, as a result of this, the 128 bit key used gets reduced to only 72 bits.

This is missing the fact that the original block is finally added to the shuffled block to prevent reversibility. Since when adding a bit to itself the result is 0 and the carry is the bit it self, we have to drop one bit from each word and take into account that all unperturbed bits will be shifted one position to the left. Here is the new table:

Constant 0: -3 |
Key 0: 6 |
Key 2: 8 |
Key 4: 6 |

Key 6: 6 |
Constant 2: -3 |
Nonce 0: 6 |
Nonce 2: 8 |

Counter 0: 8 |
Counter 2: 6 |
Constant 4: -3 |
Key 8: 6 |

Key 10: 6 |
Key 12: 8 |
Key 4: 6 |
Constant 6: -3 |

This results in 80 bits instead of 72.

80 bits though, is still strong enough to deter most brute force attempts. But we can combine the knowledge we have about how keys are expanded with this. Sadly, for most of the elements of the key we only have the least significative 6 bits untouched and then blended with our cipher text so if we just blindly do this we’ll still have to check 12 bits (4096 possibilities) to find the solution, but we can be a bit smart if we check our input sets for those last 8 bits.

Instead if we see the last 6-bit pattern for each of the valid inputs, we’ll find a collision between 1 and q, 2 and r, 3 and s, 4 and t, 5 and u 6 and v, 7 and w and 8 and x. This means that depending on the characters in the key we may have to test up to 64 possible keys to find the right one which is still feasible. To find out the correct key, we just need to match the last 6 bits after shifting the word one position to the left (8 for some parts of the key) with the letter or 2 letters matching the patter and then test the combinations.

From this point it is quite easy to see that the reason why the genetic algorithm worked is because for some parts of the modified Salsa-20 the result is independent from the other digits of the key. Independent also, in a way in which those characters are almost enough to infer the key back.

It should be noted, that had Petya used 16 bit rotations (with a different rotation constant instead of 18 like 14), most of these cryptanalysis (and the genetic algorithms used by leo-stone) would have failed leaving us with a bruteforce search of the 54⁸ possible keys.

As a closing note, I’d like to thank leo-stone his amazing work reverse engineering Petya without which this post would never have been written.

EDIT: This article missed the fact that a finall addition is made between the original key and the shuffled result. This has now been fixed.

I don’t understand anything. Could you add some info into the article to make the article understandable to people other than professional cryptologists? Thank you.

I sadly, can’t find a way to make the article much clearer than it is, sorry. You are expected to understand how Salsa20 works and what confusion and difusion are and it would be hard to make this much more reachable than it already is.

Maybe the sec-t paper can be of help though: http://klondike.es/charlas/petya/paper.pdf