Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I find the discussion surrounding the XKCD strip alarming for the superstition it reveals about password generation. The particular theme I am alarmed by is that people seem to think that if a password looks alien, or was difficult for them to come up with, it will be hard for a machine to guess.

Look, we're working with big numbers here. You need to do the math.

In this thread alone, I've seen suggestions to use a common dictionary word translated into another language, or written in l33tsp34k with some permutations. From a probabilistic perspective, these are still dictionary words, even though they look like gibberish. The same is true of the common method of typing a word with ones fingers displaced on the keyboard.

Conversely, I see a lot of argument that these XKCD passphrases would be easy to guess because they are made up of dictionary words. This misunderstands the math behind the situation. Even if an attacker knows that your password was generated via this method, and even if they know the word list you used, the password is still hard to guess. The difficulty grows exponentially with each word in the phrase, and that's pretty fast.

The key with passwords is not to create something that looks random -- something that if you showed it to another human being, they'd have a hard time deciphering. It's to create something that is random; literally a result of a throw of the dice for every new password.

Human beings are really bad at creating randomness. There's a demonstration done in an early statistics class in which the professor divides the class into two groups. He tells one to toss a coin a hundred times and record the sequence of heads and tails, while the others are to write down a sequence they think is random using their imagination. The papers are completed and mixed and then -- magically! -- he is able to sort them into the two types, easily and with high accuracy.

The lesson is this: even when you think you're being random, you probably aren't. You're probably using the same tricks everyone else is, and making the same mistakes.

I would trust passwords that come out of a script like this to be far more secure than passwords anyone (myself included) made up, no matter how random they're trying to be.



This should be higher up. It's scary to see people — intelligent people, I'm sure — saying things like "And that goes even higher when you add punctuation!"

No, it doesn't. All of the reasonable punctuation you could add to a sentence adds only a few bits of entropy at best. It also makes the sentence harder to remember— was there a comma or not? Adding unreasonable punctuation or symbols is even worse— you get slightly more entropy at the cost of a password that is way harder to remember.

The crucial point here is that four random words, separated by spaces, selected at random only from the 2000 most common English words — EVEN IF your attacker knows that your password is four random English words from the 2000 most common separated by spaces — already is a very long random string. If it's not random, each common English word you add adds 11 bits, and is only marginally harder for most English speakers to remember. Conversely, choosing "random" extra characters to add in makes it slightly longer, very slightly more random, and way, way harder to remember.


It's certainly a "very long random string" without context but as people have pointed out above, it's actually not a very good password if people adopted this pattern widely (and you said the attacker knows this).

2000^4 = 16000000000000 possible passwords = 1.6E13 = ([A-Z] + [a-z] + [0-9] + [!@#$%^&()])^7.1ish. So, your four words from the 2000 word list are equal to a 7ish character password that looks like "Av#12GH". I'm not sure if you meant that seven characters was "very long" but I wouldn't say it is. Still a very strong password but maybe not as random as it appears to be when the pattern is known.


My point was that adding a character to something like "valve tangle hastens accept" is like adding a couple bits to something like "Av#12GH", and yet people feel like it's accomplishing something valuable. They feel that way because "Av#12GH" looks random, but "valve tangle hastens accept" doesn't.

Obviously the literal length of the string is not strictly relevant, and it was probably inarticulate of me to include that.


Knowledge of the pattern has nothing to do with it. That 2048^4 figure is what I mean when I say such a password is strong, and such a figure presumes the attacker knows what system I am using.

Recall that since the passphrase is randomly generated, that 2048^4 is the true probability of guessing it--all the elements of the set are live possibilities. To compete on equal footing, a seven character password must also be randomly generated.

A password is not necessarily strong simply because it spans a large character set. "Sp1d3r!", for example, may as well be a dictionary word. Raw length "spiderspiderspider" is not necessarily helpful either. Randomness is what you need.


And yet the point is moot because no one is going to use a password like "Av#12GH".


I have a password file here with several hundred passwords just like that (actually, they're all 12 chars with upper/lower case, digits, and "special chars", as chosen and stored by 1Password...)

Joe Public is unlikely to use passwords like that, but I'm 100% sure I'm not the only hackernews reader who does.


This is not entirely correct, we include non alphanumeric characters (punctuation) in our passwords occasionally because it increases the solution space for a brute force attack.

   While this doesn't really improve any individual password the fact that we occasionally include non alphanumeric charecters increases the possible password set from 62 possible charecters ^ password length to a something more like 90^password length.  

  Similarly we dictionary attacks are more efficient than brute force attacks because we're talking maybe 200,000^(words in password) if we allow for some common word permutations versus 90^passwordlength.


You seem to believe that you are saying something different from the comment you are replying to, but actually you seem to simply not understand it.


are you kidding, i'm pointing out the historical reasons for why we add punctuation chars in our passwords, as it directly impacts the solution space a brute force attack needs to cover.


No, I'm not kidding. And the comment you were replying to already explained that.


I resent your accusation that I use gibberish for my passwords; I actually use perfectly well formed executable code in perl.


Had you used brainfuck, I would have downvoted you.


> I would trust passwords that come out of a script like this to be far more secure than passwords anyone (myself included) made up, no matter how random they're trying to be.

Definitely agree with you here.

I've been using the "few random words" method for passwords I need to remember for some time (and random 20 character mixes of alpha/numeric/symbol for the other, which I have stored in a keepass db), and I know I'm not all that random in my choice of words so if someone managed to see one or two of my passphrases it would be quite easy to create a script that could brute force the other couple quickly.

I shall have to use a script like this (or throw together my own for paranoia's sake) next time I change one of my passphrases.


May I ask how did you find out about that study?

It sounds very interesting. I've got to try it sometime :).


You mean the statistics demonstration? I'm sure I've seen it in several places.

I know of two tricks for detecting the students. The first is to look for six or seven heads or tails in a row. Over a hundred tosses, a coin will probably do that, but humans "being random" won't. The other is to look at the page as a sequence of "HHH" and "TT" strings and estimate how many there are. A coin, of course, changes from heads to tails 50% of the time, but a human does it more like 70% of the time.

I'm sure there are other characteristics, too, but those two are sufficient to throw out most human attempts at a glance. It's actually kind of obvious, when you see the two side by side.

Me "being random" with the numpad: 10110101001010010101011010100101011010100101010110101

Computer-generated random: 1101110100000000011111110111011000010001010110011111

See? Here's a few more. Try it.

1000100111011001010000001001010110111000011011101011

1101010001010010101110100101000101010111101001001000

1101111010110111100010110001000100001001111001001110

1100000010000010101001000001101001101011111100111001


Cool! Makes sense, too— it feels unrandom to sit there hitting one key a bunch. How do you know when to stop?

So here's what I got: figure I have a bias to switch keys. That means that 01 and 10 are more common than 11 and 00. So what I need to do is group 01 with 00, and 10 with 11. What I do is generate twice as many bits as I need, treat the string as a sequence of two-bit pairs, and reduce each one to its first bit. That looks like this: 01011110000010111000010011111011111111000000111010110001111000111100111100001100011010011101

Gets you past the litmus test, but looks like it goes too far the other way? Hard for me to tell, actually.

Another thing would be something like a sequential xor of each bit in triples (i.e. 010 -> (0 ^ 1) ^ 0 = 1), which segments triples across probability like so:

     -       +           +       -
    000 001 010 011 100 101 110 111
      0   1   1   0   1   0   0   1
   
You can do that quickly by counting the 1s— 1 or 3 is 1, 0 or 2 is 0. That looks like this: 00011011011011000000111111001011000101010000100100110011011010

I don't know if it adds much more (apparent) entropy, though.

Data:

    011101101010100100010100100010111001010100110000101011101101110
     0 1 0 1 1 1 1 0 0 0 0 0 1 0 1 1 1 0 0 0 0 1 0 0 1 1 1 1 1 0 1
      0  0  0  1  1  0  1  1  0  1  1  0  1  1  0  0  0  0  0  0  1

    111010111010100101000101011110100011011011010001111011110101001
     1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0
      1  1  1  1  1  0  0  1  0  1  1  0  0  0  1  0  1  0  1  0  0

    101110100101101010110101010001101101010010110111010110101001110
     1 1 1 1 0 0 1 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 0 1 0 0 1 1 1 0 1
      0  0  1  0  0  1  0  0  1  1  0  0  1  1  0  1  1  0  0  1  0




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: