Faster time-memory trade-off technique - basics V.1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is a simplified explanation about Philippe Oechslin http://lasecwww.epfl.ch/philippe.shtml faster time-memory trade-off technique http://lasecwww.epfl.ch/php_code/publications/search.php?ref=Oech03 implementations: http://lasecwww.epfl.ch/~oechslin/projects/ophcrack/ http://www.antsight.com/zsl/rainbowcrack/ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Introducion ~~~~~~~~~~~~ This theory is about generating tables (so called rainbow tables) which enable to look up hashes (digests of known algorithms lan manager, md5 etc) for a corresponding plain text. digest algorithms are one way which means they cause loss of data and are not reversable. those algorithms are often used on passwords so they wouldnt be seen when there is a security breach in database for example. Holding all possible character variantation and their corresponding digest requires enormous ammount of memory (hard disk space), and is considered not possible at this time. This paper is for people like me, who are not native english speakers and/or dont understand original paper. From this point i assume you have basic knowlege about what digests, hexademical and binary are, where and why they are used and so on. Lets generate ~~~~~~~~~~~~~ first we generate a example chain from table, we start with setting possible characters that might occour in plaintext (charset), we choose "abcde" for our example. then we set how long a plaintext can be , we set 1 - 4 (it can be 1 or 2 or 3 or 4 characters long) now we calculate how many possible variantations there are with settings we chose. formula is simple, we count how many characters there are in our charset, currently - we have abcde - we have 5 characters. to calculate how many possible variantations there in one possible length. we put character count into power of length. 5^1 this is 5. now we do this for every possible length and add results together so total number of variantations is: 5^1 + 5^2 + 5^3 + 5^4 we get 5 + 25 + 125 + 625 = 780. i use LM digests for examle, and i use LM() as represntation where a text between parenthes is turned into digest. we start generating a table by taking first plaintext/password from all possible variantations, it is number 1 in our example (if range would have been 2-4 - first one would have been 25). So dont confuse first with number 1. nr 1 is a nr 2 is b nr 3 is c nr 4 is d nr 5 is e nr 6 is aa nr 7 is ab and so on.. generating rainbowtable is done by makeing a fixed ammount of calculations (chain length) from one plaintext place in keyspace to another plaintext in keyspace - note that next calculation needs to be related to previous one and thats why we need to use reduction function that takes digest from previous plaintext and turnes it into next one in keyspace using additional number (reduction index) - it looks like this ,note that this is one (bad in practice - easy for illustration) version of reduction function, this could be implemented other ways as long as basic principle stays same: first plaintext can be used as chain start LM(a) which gets us digest: 7584248B8D2C9F9E now we take this digest and turn it into binary (base 2) format 111010110000100001001001000101110001101001011001001111110011110 now we take reduction index.. which is position in chain, currently 0 and we XOR the digest with it. it means we left null pad same length binary format from 0. that is 000000000000000000000000000000000000000000000000000000000000000 and compare it with digests binary format and put 1 to corresponding place where two digits on the same place are different and 0 where two numbers are equal but in this case with all 0-s as second argument.. output wont change. ---------------------------------------------------------------------- small demo of xor decimal 18 in base 2 10010 decimal 3 in base 2 00011 output 10001 in base 10 decimal is 17 so 18 xor 3 is 17. ---------------------------------------------------------------------- after xor-ing we get new decimal number and then we start makeing next plaintext from it by calculating its modulus (remainder of division) with charset length and building next plaintext by taking the n-th character from charset, where n is result from previous calculation and then we reduce number by actually dividing it with charset length (which is 5) and losing remainder (numbers after comma), not rounding. since modulus is 0 with equal division, we must count characters from 0 also so its: 0 = a, 1 = b, 2 = c, 3 = d, 4 = e 8467933381150941086 % 5 = 1 , first character in next plaintext is 1-st from charset which is b 8467933381150941086 / 5 = 1693586676230188217 (everything after comma is taken off) 1693586676230188217 % 5 = 2 , second character in next plaintext is 2-st from charset which is c 1693586676230188217 / 5 = 338717335246037643 338717335246037643 % 5 = 3 , third character which is d 338717335246037643 / 5 = 67743467049207528 67743467049207528 % 5 = 3 , fourth character which is d now if our plaintexts can have less or more characters, we have to calculate its length also. i decided to use use last number from plaintext calculation 67743467049207528 and calculate modulus with keyspace (780) and do -1 because of modulus and then decide with help of previous calculations how long plaintext is: 67743467049207528 % 780 = 228. 228 - 1 = 227 227 > 5 227 > (5 + 25) 227 > (5 + 25 + 125) 227 < (5 + 25 + 125 + 625) is number of possible keys with length 4 new plaintexts length is 4. so we get substring length of 4 from "bcdd" which is bcdd itself. now we increment reduction index by one, 0 + 1 = 1 (position in chain) and do the same procedure to string "bcdd" as we did to "a" before. LM(bcdd) -> 480B3416EB74922E 480B3416EB74922E XOR 1 = 5191300268518838831 5191300268518838831 % 5 = 1 5191300268518838831 / 5 = 1038260053703767766 1038260053703767766 % 5 = 1 1038260053703767766 / 5 = 207652010740753553 207652010740753553 % 5 = 3 207652010740753553 / 5 = 41530402148150710 41530402148150710 % 5 = 0 41530402148150710 % 780 = 730 730 - 1 = 729, (5 + 25 + 125 + 625) > 729 > (5 + 25 + 125) this time we got string bbda now we increment reduction index (position in chain) 1 + 1 = 2 now we continue this procedure until we have made fixed number of calculations (chain length). lets say we make chain of length 8 (hey.. i am doing this with calculator here! :) in our example. so. we continue operation until we have made the 7-th operation - lets pretend position in chain is compared to needed chain length after the increment. so after some calculations we should get following "related" plaintext and hash pairs. a 7584248B8D2C9F9E r 0 bcdd 480B3416EB74922E r 1 bbda 6341555ACD0C65FF r 2 7152091531716617725 0 1430418306343323545 0 57216732253732941 4 11443346450746588 1 580 aaeb 591C25E845ED3549 r 3 6421048848259298634 4 1284209769651859726 1 256841953930371945 0 51368390786074389 4 457 ebae D25BD8A18281C62E r 4 3031589431807282696 2 606317886361456539 1 121263577272291307 4 607 ecbe 887D537720C76BA9 r 5 1967021786472637013 3 393404357294527402 3 78680871458905480 2 15736174291781096 0 296 ddca F479A7D435ABBED1 r 6 3523259189462351812 3 704651837892470362 2 140930367578494072 2 28186073515698814 2 274 dccc BD4AA4FC656DB9DA r 7 2727979165077074425 0 545595833015414885 0 109119166603082977 0 21823833320616595 2 775 aaac so our chain start will look like a or 000a and end will look like aaac where 0 is representation for filling with 0x00, ascii 0, (\0 escape in C), also known as nul character. to save memory (disk space) only 000a and aaac is saved on disk. to create a table we create many chains and store their start and end in table, plaintext what starts the next chain could be randomly selected. here raises a issue - what will happen when 2 hashes point to same plaintext after reduction function? they merge and create false alarms. this is a issue that is partially solved by adding additional parameters - indexes (numbers to xor with) to reduction function for different tables or every fixed ammount of chains. Lets sort ~~~~~~~~~ All chains are sorted by ends (in our example i used ASCII numeric values), this means if we have chains(not from our example): 0aaa|00rr ffaa|eeii qqyy|00wq 000b|00ac qios|retr 0aaa|00rr 0 (0) + 0 (0) + r (114) + r (114) = 228 ffaa|eeii e (101 ) + e (101 ) + i (105 ) + i (105 ) = 412 qqyy|00wq 0 (0) + 0 (0) + w (119 ) + q (113 ) + = 232 000b|00ac 0 (0) + 0 (0) + a (97 ) + c (99 ) + = 196 qios|retr r (114 ) + e (101 ) + t (116 ) + r (114 ) + = 445 then this would be sort to 000b|00ac 0aaa|00rr qqyy|00wq ffaa|eeii qios|retr Lets search ~~~~~~~~~~~ now we start searching chain/table for a corresponding plaintext. i made a table with chain we generated with additional non-relevant chains above, to make it look like a table. so table we're going to search is: 000a|aaac 000b|00ac 0aaa|00rr qqyy|00wq 000a|aaac ffaa|eeii qios|retr lets say we get 887D537720C76BA9 as hash to search for. now we generate all possible chain ends for this hash. this is done by applying all possible reduction functions to input hash and then those ends are searched from table and the chain is regenerated to see what plaintext was needed to make this digest. in our example chains length was 8 so we take our input hash and reduct it first with 7 (8 - 1 position in chain). 9835108932363185070 0 1967021786472637014 4 393404357294527402 2 78680871458905480 0 700 aeca first possible end chain with this hash is thus aeca, and is looked up if it exists in table. lookup is done by comparing the most middle key in table with the one we are looking for, and depending on result - which is greater we take next middle from lower or higher part of table. this looks like this: a (97 ) + e (101 ) + c (99 ) + a (97 ) + = 394 we take most middle chain end (4-th out of 7) 00wq 0 (0) + 0 (0) + w (119 ) + q (113 ) + = 232 232 < 394 - move higher now we move forward by taking middle between 4 - 7 , that is 6-th chain eeii (101 ) + e (101 ) + i (105 ) + i (105 ) = 412 412 > 394 - move lower so we continue like this while we got desired results or nothing left to search. in our case we would take the most middle chain between numbers 4 and 6. we get 5 and that is not the chain we want. I will refer to this operation later as look up. it doesn't so we generate second possible key - we reduct hash with 6 and then hash we got from plaintext of previous reduction with 7. 887D537720C76BA9 XOR 6 9835108932363185071 1 393404357294527402 4 78680871458905480 2 15736174291781096 0 296 beca LM(beca) 059D23C283FA56EB 059D23C283FA56EB XOR 7 404518859878061804 4 80903771975612360 0 16180754395122472 2 3236150879024494 2 574 eacc eacc is second possible key to hash we are looking for. lookup shows that there isnt such a key in our table. we make third key 887D537720C76BA9 XOR 5 9835108932363185068 3 393404357294527402 3 78680871458905480 2 15736174291781096 0 296 ddca LM(ddca) F479A7D435ABBED1 F479A7D435ABBED1 XOR 6 17616295947311759063 3 3523259189462351812 2 704651837892470362 2 140930367578494072 2 592 dccc LM(dccc) BD4AA4FC656DB9DA BD4AA4FC656DB9DA XOR 7 2727979165077074425 0 545595833015414885 0 109119166603082977 0 21823833320616595 2 775 aaac now after this lookup we find this end from table and read this chains start point 000a. now we regenerate chain up to 8-3 and check if result of plaintext is same to our input hash. if they match (currently they do), then we have found it. End ~~~~~~~~~~~~~~~~~~~~~~~~ now you should have basic understanding what theory is about, you are recommended to read original paper and references mentioned above, they are your best guides - especially when you want to implement something, then you should not follow the guidelines provided here, these are only to explain. Author of this paper doesnt have anything to do with creators of original theory or implementors mentioned above. This paper is written for http://www.plain-text.info by Heintz for educational purposes only. You may distribute this paper as long it is not changed. To add or change information on paper contact heintz@hotmail.com and your info with credits are included.