Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Be aware that adding to the length simply by taking more of the lyrics adds very little entropy. If you're trying "Oh say can you see" then it doesn't take a lot of extra bits also to try "Oh say can you see by the dawn's early light what so proudly we hailed at the twilight's last gleaming".

Similarly, extended passages of text -- even if they don't come from a restricted corpus like that of song lyrics -- have less entropy than you'd think. A smaller number of independent random words is likely to be a better tradeoff.



I can see your point in that the kolmogorov complexity of two lines in a song isn't much larger than one line. Similarly, 30 digits of pi and 300 digits of pi have very little difference in kolmogorov complexity.

What I don't know is if state-of-the-art password guessers are great at recognizing larger patterns in the entire canon of human knowledge. I.e. is there a "common phrases" attack that's analogous to a "dictionary attack"?


Google released the world's largest corpus and did us a favor by analyzing it for n-grams. For example, they found that the phrase "serve as the initial" was over a 100 times more common than the phrase "serve as the insurance". [1] For $150 you can buy the 24GB data set yourself, so it's a fair assumption that makers of password crackers could reliably guess common phrases first. [2]

[1] http://googleresearch.blogspot.com/2006/08/all-our-n-gram-ar... [2] http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=...


If these types of passwords become popular, brute force crackers will build dictionaries of well known phrases.


That may be true, but we still end up better off. The compute time for the password cracker has gone up quite a bit, making it a more expensive endeavor (they've got to build dictionaries for both WKP's and passwords with fuzzing). It doesn't solve the problem, but it's a start in the right direction (away from fuzzing of dictionary words, which is clearly bad for human memory, and good for password crackers efficiency).

However, when using randomly chosen dictionary words to build phrases (not well known), the entropy shoots well above the level of being reasonable to crack in a lifetime.


Given that the knowledge about correct parts of a password based on known sources (pi, peace and war, song lyrics etc) drastically reduces the amount of possible solutions. But how would an attacker figure out the first part of such a password? What comes to mind are timing attacks http://en.wikipedia.org/wiki/Timing_attack What other possibilities did I miss?

EDIT: I get that having a long streak of my pass in a dictionary would reduce overall security but it's still unclear how a partial match in the dictionary would be detected.


But there's a long tail of song lyrics. If you pick something obscure, the odds of the attacker even having heard of it become very small (particularly if the attacker is from a different culture than your own). Pick something arty and incomprehensible, and the odds against someone else accidentally stringing those words together in some other context become astronomical.

For instance, I'd wager no cracker has ever heard the song containing the line "We barter images on the matrix". And that's one of the more intelligible lines from the song in question (from a 1978 album by the little-known prog-rock group Happy The Man). Pull it up on Google and you'll see what I mean.

If you don't know the song, of course, lines from it will be about as hard to remember as randomly chosen words. But if you do know it, you have a good mnemonic.


This gets into the whole "security through obscurity" thing. Ideally, you should use a password-generation system such that if the attacker knows your pasword-generation system (e.g. lines from songs) it would still be infeasible to guess your actual password.

Thats why the 4-random-words technique is good. According to XKCD, the 4-random-words technique generates about 17 trillion passwords---all equally likely.

But even with a long tail, song-lyric passwords relies on obscurity. I imagine there are much fewer than 17 trillion songs to choose from. And if the attacker knew some information about you (say from looking at your Facebook profile or your search history) I'm sure it could drastically weed out the search space.


The answer is obviously to write your own song or poem and not tell anyone about it. A passpoem, perhaps in the style of Lewis Carrol.


there might not be 17 trillion songs, but you aren't limited to the first 4 words of the song. there might be 100-300 words per song and you can pick your starting word anywhere you like.


But it falls into the same boat as any dictionary attack. Most people with a passphrase are probably going to use one from a song. 90% of them are going to use one of the top 1,000 songs, 90% of them are going to start at the beginning of a line. If we say there are ~20 unique lines in the average song, and most people won't use more than ten successive words even if it bridges a line, that's 1000 * 20 * 10 = a keyspace of 200,000. Trivial.

What this means is even if you decide you're going to be really secure and pick, say, the 30,000th most popular song, assume all songs have 200 unique lines (to account for sensical starting points in the middle of lines), and use 20 words from it, you're in a keyspace of only 120 million, which even if it takes 1ms to hash will be cracked in a day.

By contrast, four random english words chosen from the 2,048 most common has a keyspace of ~1.75e13, or 17,500,000,000,000.

Choosing a clever, unusual line from the middle of a very uncommon song is the passphrase land version of choosing a rare English dictionary word and replacing the vowels with numbers. If your hash gets compromised, it might as well be "password".


There's an easy way to defeat this:

   smellz like T33N SPIRIT!
Trivial to memorize. Unlikely to brute force.

I use phrases like that for the few locations where password managers don't reach (i.e. the password manager master password).


>.<

How is this an improvement? I now have to remember a song lyric, and some set of random manipulations of that song lyric. I've used that trick for passwords before, and it was a hassle. But that doesn't even matter— unless you're choosing the manipulations randomly (which is a contradiction in terms) you're falling right back into the exact damn trap the comic was about!

You've added ! at the end, replaced s with z, capitalized some words, and replaced vowels with numbers. These are already standard manipulations in a dictionary attack. And it's causing you to ignore the fact that you've chosen what is probably among the top 10 song lyrics used. "p4ssw0rd!" is "password" as far as a dictionary attacker is concerned. Calling this trivial to brute force is demeaning to the word "trivial". Your attacker wouldn't even laugh at you, because there'd be dozens of other hashes in the file just like yours.

It's been said over and over in these comments: the appearance of randomness is not randomness. Humans are horrible at making things random, as you've just demonstrated. Stop trying to make it look weird, and actually do the math.


It's fairly easy for me to remember those manipulations. But you're right insofar that this would probably be both safer and easier to remember:

   Smells like teen spirit, and I like that plenty mucho!
I'm too lazy to do the math on it, perhaps you can help out?

Edit: It's a little annoying to collect these downvotes from people who either haven't done the math themselves or are too lazy to explain their advanced attack methods.

In my naive opinion my string above is at least equivalent to a 12 character password from a set of "Mixed upper and lower case alphabet plus numbers and common symbols.".

I count each word (10) and both symbols (,!) as a character here.

According to [1] an 8 char password of that type would take 83½ Days to crack in a Class-F attack ("supercomputer"). I'm purely guessing that those additional 4 "chars" should put it well into the multi-year range, under the premise my other assumptions are not too far off and that the number of english words is quite a bit larger than the number of ascii characters/symbols.

Any of the downvoters care to debunk that with real math?

I'd be honestly curious about a worst-case analysis that assumes the fragment "Smells like teen spirit" does appear in the attackers dictionary.

[1] http://www.lockdown.co.uk/?pg=combi


Yeah, that's what I was getting at. Something like that is pretty much immune to naive brute force, even if we count "Smells like teen spirit" as a word. My guess would be that if it does get cracked, it would be by searching [lyric]+", and"+[some kind of Markov attack], but I honestly have no idea how one would work out the entropy in that model. It depends a lot on how the search is carried out, I think.

I guess we'll find out when passphrases become common :)


What happens when your obscure song makes the soundtrack for a hit movie next summer?


given how prone people are to mis-hearing song lyrics, the corpus isn't the full text of all published song lyrics as you suggest.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: