Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The point of the article is that as long as you control the domain, you have no excuse for your links breaking. Going out of business, and therefore being unable to pay for your domain, is specifically called out as a valid reason, but there's no reason you'd lose control of your domain after being acquired, even if you decided to redirect your old links to newer information. Shiny new software is also specifically called out as a bad reason to break links, since backwards-compatible redirects are trivial. And if you're capable of permanently losing all your data through accidental deletion or a server being compromised, you have much bigger problems.

All that said, you're fundamentally right -- sometimes information stops being available because it's out of date, and keeping it available would be confusing (if a product is no longer available, it would be strange to maintain a page describing it for years afterwards). Archiving through the Wayback machine is a very helpful stopgap, but expecting them to continuously archive every version of the entire Internet for all time won't scale.

What's needed is a distributed, decentralized system, ideally at the protocol level. Imagine if a GET request by default gave you the "current" version of a page, but you could send an extra header that said "give me this page, as it appeared at date-time X". This would remove the confusion caused by the existence of a page being conflated with that page being current[1], and allow sites to maintain clean navigational and data structures by flagging outdated pages as "expired" instead of completely deleting them. When a server got a request for a page that used to but no longer exists, it could respond with a new 4xx-series header, "No longer current", indicating the document is not available for the given date-time, but is available for an earlier date.

[1] I frequently get people sending me ANGRY emails about flippant, immature blog posts I wrote 10+ years ago[2]. They assume that because it's still on my website, I still stand by those statements, when in fact I'm just reluctant to delete information.

[2] The posts still get traffic, because links to them made 10+ years ago still work, despite rewriting my CMS 3 times.



> What's needed is a distributed, decentralized system, ideally at the protocol level. Imagine if a GET request by default gave you the "current" version of a page, but you could send an extra header that said "give me this page, as it appeared at date-time X".

Sounds like Freenet USK's (see https://freenetproject.org/understand.html, search for USK (and boo on them for not having any anchors on that page)).


> [1] I frequently get people sending me ANGRY emails about flippant, immature blog posts I wrote 10+ years ago[2]. They assume that because it's still on my website, I still stand by those statements, when in fact I'm just reluctant to delete information.

Couldn't you implement a clumsy manual version of the 'expired' header you're proposing, by doing something like having your server precede each page with "The following has not been modified since …, and should not be regarded as current" if it is more than a certain amount of time old?


> an extra header that said "give me this page, as it appeared at date-time X"

It's your lucky day: http://www.mementoweb.org/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: