Parameter optimization of time-memory trade-off cryptanalysis in RainbowCrack

by Zhu Shuanglei <shuanglei@hotmail.com>
http://www.antsight.com/zsl/rainbowcrack/


1. Introduction

RainbowCrack is a general propose implementation of Philippe Oechslin's faster time-memory trade-off technique. In the tutorial and some other document of the tool, four ready to work configurations are introduced. However, you may be puzzled by the parameters(m, t and l) if you want to build tables for your custom charset. If these parameters are casually selected, the performance will be really poor.
In this article, we present some useful information which is helpful if you want to determine your parameters of a rainbow table set, which can be used to attack the ciphertexts of a selected key space of a hash algorithm .

The RainbowCrack tool is mainly based on the paper Making a Faster Cryptanalytic Time-Memory Trade-Off. We will not talk about the basic concepts here, so please read the paper and make clear the following concepts before you continue:
1. What is a rainbow chain
2. Why the rainbow chains merge in a rainbow table

2. Parameters in time-memory trade-off cryptanalysis

There are a lot of parameters in time-memory trade-off cryptanalysis:
Catalog Parameter Explanation
Const values: they will not change speed of hash computation time-memory trade-off cryptanalysis need a lot of computing power and a lot of disk access
speed of the harddisk
N the key space
Input parameters: parameters that you can adjust t rainbow chain length
m rainbow chain count of a rainbow table
l rainbow table count
Output parameters: time and space complexity depending on the "input parameters" disk usage disk space required to store the generated tables
success probability the probability you can find the plaintext of the ciphertext
mean cryptanalysis time mean time used to compute the hashes when cracking a ciphertext
max cryptanalysis time max time used to compute the hashes when cracking a ciphertext
mean disk access time mean time used to load the files when cracking a ciphertext
max disk access time max time used to load the files when cracking a ciphertext
table precomputation time time needed to compute the rainbow table set

To determine the parameter "speed of hash computation":
Run the program rtgen like this:
C:\rainbowcrack>rtgen lm alpha 1 7 0 -bench
lm hash speed: aaaaaa / s
lm step speed: bbbbbb / s
C:\rainbowcrack>rtgen md5 alpha 1 7 0 -bench
md5 hash speed: cccccc / s
md5 step speed: dddddd / s
Here "bbbbb" is the speed of lm hash, "dddddd" is the speed of md5 hash

To determine the parameter "speed of the harddisk":
If the program rcrack can read 610 MB of data in 19 seconds, the speed is considered as "19 / 610"

To determine the parameter "N":
If the charset is "alpha"(26 characters) and the plaintext length range is 1 to 7, N = 26^1 + 26^2 + 26^3 + 26^4 + 26^5 + 26^6 + 26^7 = 8353082582
If the charset is "byte"(256 characters) and the plaintext length range is 1 to 4, N = 256 + 256^2 + 256^3 + 256^4 = 4311810304

Parameter t, m and l are the only parameters you can adjust. If you change any of them, the output parameters will change. Of course we would like to select some good input parameters which will produce better output parameters.
Some parameters are considered more important than the others. These "important" parameters are in bold font in the above table.

3. Computation of the output parameters

In this section, we will talk about how the output parameters are determined by the input parameters. The discussion is not complete, the prototype of the matlab scripts used to compute the output parameters are given in appendix A. However, you can also take the source of these scripts as another reference.

3.1 Disk usage

The format of a rainbow table generated by rainbowcrack is given in appendix B. Each rainbow chain will take 16 bytes, so the formula is:
disk_usage = m * 16 * l

3.2 success probability

3.3 mean cryptanalysis time

We need "t*t/2" times rainbow chain walk(hash computation and a reduce function) to finish the search of a rainbow table. Thus the time to search a rainbow table is "t*t/2 / step_speed".
Each rainbow table has its success probability. If we can't find the plaintext in a rainbow table, we will continue with the next. The formula is:
mean_cryptanalysis_time = rate_of_each_table * 1 * t*t/2 / step_speed
                                         + (1 - rate_of_each_table) ^ 1 * rate_of_each_table * 2 * t*t/2 / step_speed
                                         + (1 - rate_of_each_table) ^ 2 * rate_of_each_table * 3 * t*t/2 / step_speed
                                         ...
                                         + (1 - rate_of_each_table) ^ (table_count - 1) * rate_of_each_table * table_count * t*t/2 / step_speed
                                         + (1 - rate_of_each_table) ^ table_count * table_count * t*t/2 / step_speed

4. Parameter optimization of time-memory trade-off cryptanalysis

We use the matlab script performance_chart.m to select our configurations. We pass the script with our desired success probability, disk usage range and the rainbow chain count range.


Fig: Performance chart of key space 8353082582
Matlab: performance_chart(8353082582, 0.999, 250, 2000, 3, 8, 354000, 19 / 610)

We use different colors to represent the performance curves if we use different rainbow table count.

There are three charts in the figure:
chart 1: The minimal t we can use if we want to reach the desired success probability 
chart 2: mean cryptanalysis time if we use the t selected in char 1
chart 3: mean cryptanalysis time plus mean disk access time if we use the t selected in char 1

The cryptanslysis time will decrease when disk space increase(m*l increase). However, we will need longer disk access time if use larger tables. Look at the chart 3 to find out how "mean_cryptanalysis_time + mean_disk_access_time" change when disk space change.

You can select any point in these curves as your configuration of the rainbow table set. Of course some configurations are better than the others.

Fig: Performance chart of key space 8353082582(zoomed to where we select the configuration #0)
Matlab: performance_chart(8353082582, 0.999, 250, 2000, 3, 8, 354000, 19 / 610)

Now we zoomed the figure to where we select the configuration #0.

When a configuration is selected, you can determine the parameter t and l directly from the figure. m*l can be determined by: 
m *  l = disk_usage / 16

If you want to crack one ciphertext at a time, the total time is: mean_cryptanalysis_time + mean_disk_access_time
If you want to crack n ciphertexts at a time, the total time is:   mean_cryptanalysis_time * n + max_disk_access_time (when n >> 1)
                                                                  the mean time is:  (mean_cryptanalysis_time * n + max_disk_access_time) / n = mean_cryptanalysis_time  (when n >> 1)
You should also take the curves in chart 2 into consideration if you always have dozens of ciphertexts to crack. In this situation, the disk access time will take very little percent in cracking progress.


Appendix A - Matlab scripts

basic scripts description
function ret = calc_disk_usage(m, table_count) calculate disk usage
function ret = calc_success_probability(N, t, m) calculate success probability
function ret = calc_mean_cryptanalysis_time(N, t, m, table_count, step_speed) calculate mean cryptanalysis time
function ret = calc_max_cryptanalysis_time(t, table_count, step_speed) calculate max cryptanalysis time
function ret = calc_mean_disk_access_time(N, t, m, table_count, disk_speed) calculate mean disk access time
function ret = calc_max_disk_access_time(m, table_count, disk_speed) calculate max disk access time
function ret = calc_table_precomputation_time(t, m, table_count, step_speed) calculate table precomputation time
function calc_configuration_parameters(N, t, m, table_count, step_speed, disk_speed) wrapper of the above scripts
 
parameter optimization scripts description
function performance_chart(N, target_success_probability, ...
                                          disk_usage_min, disk_usage_max, ...
                                          table_count_min, table_count_max, ...
                                          step_speed, disk_speed)
you can find your custom configuration of a selected charset with this script
 
demonstration scripts description
function demo_chain_merge(N, t, m) demonstrate the merge of rainbow chains
function demo_advantage_of_multiple_table(N, t, disk_usage_min, disk_usage_max) demonstrate how the success probability change if we create multiple rainbow tables


Appendix B - Format of a rainbow table in RainbowCrack

A rainbow table is a raw array of rainbow chains. Only the starting point and the end point are stored in a rainbow chain. The data struct is defined as:
struct RainbowChain
{
uint64 nIndexS;
uint64 nIndexE;
};
Each rainbow chain takes 16 bytes.
For example:

[user@linux90 rainbowcrack]$ ./rtgen lm alpha 1 7 0 100 4 test
hash routine: lm
hash length: 8
plain charset: ABCDEFGHIJKLMNOPQRSTUVWXYZ
plain charset in hex: 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56 57 58 59 5a 
plain length range: 1 - 7
plain charset name: alpha
plain space total: 8353082582
rainbow table index: 0
reduce offset: 0

generating...
4 of 4 rainbow chains generated (0 m 0 s)
[user@linux90 rainbowcrack]$ hexdump lm_alpha#1-7_0_100x4_test.rt 
0000000 ba40 a519 0001 0000 57bd 9f74 0001 0000
0000010 7af6 42b8 0000 0000 3dad 910f 0000 0000
0000020 728b 499c 0001 0000 c840 cb54 0000 0000
0000030 feab b2b4 0000 0000 77ac c144 0000 0000
0000040
[user@linux90 rainbowcrack]$ ./rtsort lm_alpha#1-7_0_100x4_test.rt 
available physical memory: 15585280 bytes
loading rainbow table...
sorting rainbow table...
writing sorted rainbow table...
[user@linux90 rainbowcrack]$ hexdump lm_alpha#1-7_0_100x4_test.rt 
0000000 7af6 42b8 0000 0000 3dad 910f 0000 0000
0000010 feab b2b4 0000 0000 77ac c144 0000 0000
0000020 728b 499c 0001 0000 c840 cb54 0000 0000
0000030 ba40 a519 0001 0000 57bd 9f74 0001 0000
0000040

We generated a simple rainbow table with 4 rainbow chains and the size of the generated file is 64 bytes.

The rtsort program is very simple too. It sort the end points and write the rainbow chains back to the file.

Interestingly, you can dump the content of a rainbow chain like this:

[user@linux90 rainbowcrack]$ ./rtdump lm_alpha#1-7_0_100x4_test.rt 0
#0 0000000042b87af6 CPENHYM|4350454e48594d dc755c2de44e8620
#1 0000000138480a7a PXXXZOM|505858585a4f4d acdd711d4d251cae
#2 0000000017f34221 AGUGSNB|41475547534e42 62c9104f81dd89ac
#3 000000002703a3d0 BCBICDU|42434249434455 7103a8b83dd78800
#4 00000001a14058f0 VQDTIFQ|56514454494651 0a965e6115101515
......
......
#95 00000001abbf9d13 WEZDJYB|57455a444a5942 500da7c384c8a4b6
#96 0000000161daa38d SEQFGUP|53455146475550 3d25c6e9d112ed31
#97 00000000bb89fb89 JDUEUNR|4a445545554e52 12586e9d6b663bff
#98 0000000012216749 YOPLXZ |594f504c585a   e3d94db4cd0f94e5
#99 00000000910f3dad

The starting point and end point are same in the output of hexdump and rtdump.

Create date: 2004/1/1