I find the discussion surrounding the XKCD strip alarming for the superstition i...

Cushman · on Aug 11, 2011

This should be higher up. It's scary to see people — intelligent people, I'm sure — saying things like "And that goes even higher when you add punctuation!"

No, it doesn't. All of the reasonable punctuation you could add to a sentence adds only a few bits of entropy at best. It also makes the sentence harder to remember— was there a comma or not? Adding unreasonable punctuation or symbols is even worse— you get slightly more entropy at the cost of a password that is way harder to remember.

The crucial point here is that four random words, separated by spaces, selected at random only from the 2000 most common English words — EVEN IF your attacker knows that your password is four random English words from the 2000 most common separated by spaces — already is a very long random string. If it's not random, each common English word you add adds 11 bits, and is only marginally harder for most English speakers to remember. Conversely, choosing "random" extra characters to add in makes it slightly longer, very slightly more random, and way, way harder to remember.

a3camero · on Aug 11, 2011

It's certainly a "very long random string" without context but as people have pointed out above, it's actually not a very good password if people adopted this pattern widely (and you said the attacker knows this).

2000^4 = 16000000000000 possible passwords = 1.6E13 = ([A-Z] + [a-z] + [0-9] + [!@#$%^&()])^7.1ish. So, your four words from the 2000 word list are equal to a 7ish character password that looks like "Av#12GH". I'm not sure if you meant that seven characters was "very long" but I wouldn't say it is. Still a very strong password but maybe not as random as it appears to be when the pattern is known.

Cushman · on Aug 11, 2011

My point was that adding a character to something like "valve tangle hastens accept" is like adding a couple bits to something like "Av#12GH", and yet people feel like it's accomplishing something valuable. They feel that way because "Av#12GH" looks random, but "valve tangle hastens accept" doesn't.

Obviously the literal length of the string is not strictly relevant, and it was probably inarticulate of me to include that.

Dove · on Aug 11, 2011

Knowledge of the pattern has nothing to do with it. That 2048^4 figure is what I mean when I say such a password is strong, and such a figure presumes the attacker knows what system I am using.

Recall that since the passphrase is randomly generated, that 2048^4 is the true probability of guessing it--all the elements of the set are live possibilities. To compete on equal footing, a seven character password must also be randomly generated.

A password is not necessarily strong simply because it spans a large character set. "Sp1d3r!", for example, may as well be a dictionary word. Raw length "spiderspiderspider" is not necessarily helpful either. Randomness is what you need.

j_baker · on Aug 12, 2011

And yet the point is moot because no one is going to use a password like "Av#12GH".

bigiain · on Aug 12, 2011

I have a password file here with several hundred passwords just like that (actually, they're all 12 chars with upper/lower case, digits, and "special chars", as chosen and stored by 1Password...)

Joe Public is unlikely to use passwords like that, but I'm 100% sure I'm not the only hackernews reader who does.

keithnoizu · on Aug 11, 2011

This is not entirely correct, we include non alphanumeric characters (punctuation) in our passwords occasionally because it increases the solution space for a brute force attack.

   While this doesn't really improve any individual password the fact that we occasionally include non alphanumeric charecters increases the possible password set from 62 possible charecters ^ password length to a something more like 90^password length.  

  Similarly we dictionary attacks are more efficient than brute force attacks because we're talking maybe 200,000^(words in password) if we allow for some common word permutations versus 90^passwordlength.

kragen · on Aug 12, 2011

You seem to believe that you are saying something different from the comment you are replying to, but actually you seem to simply not understand it.

keithnoizu · on Aug 14, 2011

are you kidding, i'm pointing out the historical reasons for why we add punctuation chars in our passwords, as it directly impacts the solution space a brute force attack needs to cover.

kragen · on Aug 15, 2011

No, I'm not kidding. And the comment you were replying to already explained that.

jcr · on Aug 11, 2011

I resent your accusation that I use gibberish for my passwords; I actually use perfectly well formed executable code in perl.

nakkiel · on Aug 12, 2011

Had you used brainfuck, I would have downvoted you.

dspillett · on Aug 12, 2011

> I would trust passwords that come out of a script like this to be far more secure than passwords anyone (myself included) made up, no matter how random they're trying to be.

Definitely agree with you here.

I've been using the "few random words" method for passwords I need to remember for some time (and random 20 character mixes of alpha/numeric/symbol for the other, which I have stored in a keepass db), and I know I'm not all that random in my choice of words so if someone managed to see one or two of my passphrases it would be quite easy to create a script that could brute force the other couple quickly.

I shall have to use a script like this (or throw together my own for paranoia's sake) next time I change one of my passphrases.

Ideka · on Aug 11, 2011

May I ask how did you find out about that study?

It sounds very interesting. I've got to try it sometime :).

Dove · on Aug 11, 2011

You mean the statistics demonstration? I'm sure I've seen it in several places.

I know of two tricks for detecting the students. The first is to look for six or seven heads or tails in a row. Over a hundred tosses, a coin will probably do that, but humans "being random" won't. The other is to look at the page as a sequence of "HHH" and "TT" strings and estimate how many there are. A coin, of course, changes from heads to tails 50% of the time, but a human does it more like 70% of the time.

I'm sure there are other characteristics, too, but those two are sufficient to throw out most human attempts at a glance. It's actually kind of obvious, when you see the two side by side.

Me "being random" with the numpad: 10110101001010010101011010100101011010100101010110101

Computer-generated random: 1101110100000000011111110111011000010001010110011111

See? Here's a few more. Try it.

1000100111011001010000001001010110111000011011101011

1101010001010010101110100101000101010111101001001000

1101111010110111100010110001000100001001111001001110

1100000010000010101001000001101001101011111100111001

Cushman · on Aug 12, 2011

Cool! Makes sense, too— it feels unrandom to sit there hitting one key a bunch. How do you know when to stop?

So here's what I got: figure I have a bias to switch keys. That means that 01 and 10 are more common than 11 and 00. So what I need to do is group 01 with 00, and 10 with 11. What I do is generate twice as many bits as I need, treat the string as a sequence of two-bit pairs, and reduce each one to its first bit. That looks like this: 01011110000010111000010011111011111111000000111010110001111000111100111100001100011010011101

Gets you past the litmus test, but looks like it goes too far the other way? Hard for me to tell, actually.

Another thing would be something like a sequential xor of each bit in triples (i.e. 010 -> (0 ^ 1) ^ 0 = 1), which segments triples across probability like so:

     -       +           +       -
    000 001 010 011 100 101 110 111
      0   1   1   0   1   0   0   1

You can do that quickly by counting the 1s— 1 or 3 is 1, 0 or 2 is 0. That looks like this: 00011011011011000000111111001011000101010000100100110011011010

I don't know if it adds much more (apparent) entropy, though.

Data:

    011101101010100100010100100010111001010100110000101011101101110
     0 1 0 1 1 1 1 0 0 0 0 0 1 0 1 1 1 0 0 0 0 1 0 0 1 1 1 1 1 0 1
      0  0  0  1  1  0  1  1  0  1  1  0  1  1  0  0  0  0  0  0  1

    111010111010100101000101011110100011011011010001111011110101001
     1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0
      1  1  1  1  1  0  0  1  0  1  1  0  0  0  1  0  1  0  1  0  0

    101110100101101010110101010001101101010010110111010110101001110
     1 1 1 1 0 0 1 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 0 1 0 0 1 1 1 0 1
      0  0  1  0  0  1  0  0  1  1  0  0  1  1  0  1  1  0  0  1  0