Cache-Control: immutable

briansmith · on May 11, 2016

This Google Chrome document is worth reading: https://docs.google.com/document/d/1vwx8WiUASKyC2I-j2smNhaJa...

In particular, it gets into the heart of the matter: What does the user want to happen when they click the Refresh button?

It does seem worthwhile to try to change the default behavior of the Refresh button to mean "refresh the page" instead of "fix the page" (what it currently does), which would make this "immutable" proposal unnecessary, AFAICT.

developer2 · on May 11, 2016

IIRC this is exactly what the reload button used to do. You had to hold down what I believe was the Control key while pressing the reload button to do a "force refresh". Now it would seem it's the default behaviour. That, or maybe a normal refresh does the revalidation checks (which return 304), while a Control-refresh does a full download of all resources?

SwellJoe · on May 12, 2016

Browsers behave the same (though they've acquired additional nuance since the HTTP 1.0 and pre days) as they always have, generally speaking. IE had a cache control bug for many years that made it impossible to force a reload in some circumstances, but was fixed in IE 6.

The change is on the server side, not the browser. Modern single page applications do all kinds of janky things, and a lot of them break caching, either explicitly (with cache-control headers) or accidentally (with uncacheable URLs).

As far as I know, every major browser has standards compliant cache-control implementations, and all have some way to force a full reload.

Source: I worked on cache-control browser compatibility in Squid many years ago. The browsers took a while, but did get it right eventually.

sslalready · on May 12, 2016

In at least recent versions of Firefox and Chrome, a reload includes a `Cache-Control:max-age=0` with the request. During a forced reload (e.g. Shift-Reload), both the legacy `Pragma:no-cache` (HTTP/1.0) and the more modern `Cache-Control:no-cache` (HTTP/1.1) headers are sent.

manigandham · on May 11, 2016

Judging by the comments here, there seems to be some confusion.

This is exactly like long-lived cache settings today. Right now browsers send a request on basic reloads and get back a 304 from the server which states that nothing has changed. All this setting does is let the server tell the browser to skip that check/roundtrip instead of wasting the time/bandwidth on confirming with a 304 after the initial load.

The browser is still completely in control here and can do a full reload or just reload all the time if it wants to. Web scrapers and other HTTP clients are unaffected.

zer00eyz · on May 11, 2016

If you go back and read the RFC for http, don't we have the headers to do this already, and the browsers just don't behave that way? http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14...

If that is true wouldn't a "strict" cache control make a whole lot more sense?

manigandham · on May 11, 2016

Yes, technically browsers should just skip the revalidation if the header already has a max-age, since that's what the caching time is.

And yes, having more uniform and standardized approach to caching would be better.

Klathmon · on May 11, 2016

Which is fantastic for "modern" web apps.

We use webpack and all filenames are just sha hashes of the contents in the production builds. There is no need for the browser to ever ask anything about that file again (unless its purged from the cache...).

lucaspiller · on May 12, 2016

This has been the standard way of serving assets in Rails for the last 5+ years. I don't think it was invented there, as if you are using a CDN it's basically required.

Invalidating edge caches takes time and/or is expensive (i.e. Cloudfront), so adding the content hash in the file name is a good trick to ensure users always get the correct version of the asset.

Klathmon · on May 12, 2016

I didn't mean to imply it was invented by webpack or anything (and I used the scarequotes around modern because I'm not sure how long it's been widely used).

I was just pointing out how and with what I use it.

toomuchtodo · on May 11, 2016

Content addressable web! Love it!

Klathmon · on May 11, 2016

It really works beautifully. One of our applications coming up needs to work offline, and we found that because of the naming we were using, appcache was almost completely work-free to implement.

If the appcache.manifest changes, it rechecks all files (or in my case, would only pointpessly re-check those which haven't changed, and download new ones), and the appcache.manifest will change the second a single byte anywhere in the program changes.

It's fantastic.

Syrup-tan · on May 12, 2016

It's notable that appcache is now deprecated [0]

It's been replaced by Service Workers, but those can get reallly hard to deal with if they start caching themselves (!)

Chrome's devtools has ways to delete old service workers and such, but I've found that on Firefox it's next to impossible to debug.

[0] https://developer.mozilla.org/en-US/docs/Web/HTML/Using_the_...

stqism · on May 12, 2016

It's also notable that service worker adoption is only half way at best, whereas appcache is easier to deploy and supported in far more browsers.

Not to say service workers aren't the future, but implementing partially adopted features in web pages is poor practice.

http://caniuse.com/#feat=serviceworkers

Klathmon · on May 12, 2016

We are aware and are ready to switch to a service-worker and appcache hybrid when that time comes that appcache is going to be removed.

But until then we wanted to keep it simple. Especially during development we don't want to have to deal with differences in SW and appcache.

gbvb · on May 12, 2016

Is there a write up on how this is implemented? I want to find out more about this model of delivering content.

gbvb · on May 12, 2016

Nevermind. I should have googled first. https://medium.com/@okonetchnikov/long-term-caching-of-stati...

mitchtbaum · on May 12, 2016

Why use hashes? Wouldn't UUIDs make more sense?

STRiDEX · on May 12, 2016

Frontend builds sometimes destroy the dist folder and rebuild all assets. Hash will result in the same name if it is unchanged.

nathan_f77 · on May 12, 2016

Hashes are generated from the contents of the files. You don't need to store UDIDs anywhere.

0x0 · on May 11, 2016

What happens if a random wordpress blog's frontpage (/) is compromised and has malware injected, setting the immutable keyword? Cloudflare and letsencrypt means most sites will be https sooner than later, so the https part will be "taken care of". (At least that's better than nothing; imagine the power granted to captive wifi portals if not!)

martinflack · on May 11, 2016

I think it would be bad practice to use this keyword on end-point URL's that are advertised in search engines or API documentation.

You would want to use it for resources to which base pages and manifests point; such as JS, CSS, JPG, PNG, etc.

The browser could enforce that, sort of. It could ignore immutable cache status on the object that is actually in the browser location bar and IMS it, but it could allow it on referenced objects.

The idea is that referenced objects can simply stop being referenced, and a fresh object is referenced.

nemothekid · on May 11, 2016

The point still stands. Some wordpress install that references /parralax-plugin.js could be hijacked to server malicious JS with an immutable header.

A possible solution would be to only use the immutable header with integrity html tag.

Ciantic · on May 12, 2016

I think the point is that it would be renamed /parallax-plugin2.js and HTML would be set to reference that instead? That is why immutable cache shouldn't work for the page in the address bar.

JoshTriplett · on May 12, 2016

The concern here is that even after recovery from the compromise, the site could then never use the name "parallax-plugin.js" again, because a browser might have the cached malware under that name instead of the correct version.

jakub_g · on May 12, 2016

On top of that you as a developer would have to understand what's happening and that it's happening at all. Might be not easy as we have a habit on clearing our cache all too often :)

JoshTriplett · on May 11, 2016

What happens if a random site gets compromised and serves an HPKP header pinning a bad key for ten years?

lstamour · on May 11, 2016

There's max-age support, the ability to preload pins in the browser, and certificate transparency to work around this, see section 4.5: https://tools.ietf.org/html/rfc7469#page-21

As to this original point, it would be best if this didn't apply to the address bar URL / main document request. But it's a good point, worth considering. Perhaps the UA should set a timer, and two or three refreshes in a row would be the equivalent to the prior refresh behaviour.

0x0 · on May 11, 2016

Next generation ransomware, holding pinned SSL keys hostage? :O

return0 · on May 11, 2016

Or simply the domain is resold, but old visitors still see a page from a year back. Immutable is useful, but the max-age limit should be limited to a few hours, which is an acceptable timeframe for internet disruptions (e.g. DNS).

snug · on May 13, 2016

Immutable will only work for https, at least in FireFox. Last line:

> Immutable in Firefox is only honored on https:// transactions.

rwmj · on May 12, 2016

Half-related to this, we need content-addressable web proxies:

https://rwmj.wordpress.com/2013/09/09/half-baked-idea-conten...

With these, you fetch data by its hash. You provide a primary URL where the item is known to exist, but the browser is free to fetch the data from any proxy (or local cache) with the same hash.

This could replace package mirroring, git clones, parts of bittorrent, CDNs and more.

It does assume that you use a hash with enough bits that collisions are extremely unlikely, and also that your hash is cryptographically strong (else a rogue proxy can inject data).

ianopolous · on May 12, 2016

Exactly! IPFS already has a web proxy to their content addressed network. (e.g https://ipfs.io/ipfs/QmXZnH2WVmFoiE7tRJQk9QstLGhSKpVyEQ4Rywx... ) And hopefully browsers will learn to speak the protocol natively so then there's no need for a http proxy at all.

Natanael_L · on May 12, 2016

With integrity tags using hashes getting standardized, this could easily be done with browser addons.

kazinator · on May 12, 2016

"Immutable" with "max age" is an oxymoron. If it expires it isn't immutable. Use another word.

What you need is an absolute date and time in the cache header which says "we promise this page does not change before this date and time". This could be treated as a "lease" and automatically extended in some configurable intervals. For instance, if it is 30 days, then the file is good for 30 days since its modification time stamp. When that time passes, this is renewed automatically: it is now good for 60 days since its modification time stamp. Basically, it is always good for N*30 days since its modification time stamp, where N is the smallest N required for that time to be in the future.

When the webmaster publishes a new version of the file, he or she knows precisely when browsers who have seen the cached the previous version will start picking up the new one. Changes can be co-ordinated with the expiry time to minimize the refresh lag: the time between when the earliest new client sees the new page, and the last old client stop seeing the old one. If we know that a page expires for everyone on June 1, 2016 at noon, we can update that page in the morning on June 1. By afternoon, everyone sees the new one.

jakub_g · on May 12, 2016

Yes I think it would still be a good idea to require expiry date for "immutable" content, just as a safety net if something is misconfigured somewhere etc - then when you fix a bug, you will know the precise time at which it will be gone for everyone (hopefully the expiry date was not set to 10 years).

However I wonder what is the typical cache lifetime of resources on current web. IIRC someone on HN posted like a week ago that from their study, it's rather short - stuff is evicted from cache quite rapidly if not used. So fast that getting a cache hit for jquery from CDN is quite unlikely very often.

ars · on May 11, 2016

I've learned to press Enter on the URL line instead of reload for exactly this reason.

combatentropy · on May 12, 2016

  > I've learned to press Enter on the URL line
  > instead of reload for exactly this reason.

Yes, this is what I do too.

The browser can do one of three things:

1. Serve the file from cache. This is what happens when you put the cursor in the address bar and hit Enter. Well, at least for ancillary files. It likely still will ask whether the main HTML document has changed. But it will load CSS and JavaScript files from its local cache --- if the webmaster properly set the HTTP headers, like Expires, to tell the browser that it can cache the files.

2. Ask the server whether the file has changed. This is what happens when you click the Reload button. This is the area of dispute. The article is saying it would probably be better if the browser acted just like it does when you put the cursor in the address bar and hit Enter. Instead the browser seems to check not only the main document file but also every single CSS, JavaScript, image, whatever, file. It doesn't redownload them all, but it sends an If-Modified-Since header, to ask whether they have changed, and then requests the whole file only for ones that the server says have changed. The payload back and forth is usually just a few hundred bytes for files that have not changed. But the network requests take a noticeable slice of time, because it's one request per file.

3. Ask the server for the whole file, regardless. This is usually when you hit Shift and Reload.

slashdev · on May 11, 2016

I apologize if someone already mentioned this, but there's a way to eliminate the penalty to the user without changing HTTP at all. Browsers can simply check all the non-expired resources after all the rest. Now the latency is the same, but we still do the checks, just after everything else and we're already rendering the page. Only if one of those resources did actually change then we re-render the page.

The immutable solution is cleaner, and doesn't load the server as much, but it's not backward compatible and requires the people who run the server to know what they're doing. Maybe the two solutions could be combined?

The biggest potential drawback I see is that maybe most resources, including the html don't expire, so every page will be rendered and re-rendered. Giving little benefit and making the rendering choppier. Some of that maybe could be mitigated by starting the rendering in the background and not displaying it until a certain percentage of requests return, or special casing the "page" itself as opposed to page resources.

mediumdeviation · on May 12, 2016

What you're suggesting won't work. The problem is that a lot of pages require their resources to be loaded in a specific order. The C in CSS stands for cascading, and means that rules that are loaded later override earlier ones (if selector specificity match). The same goes for JS since later scripts might depend on the framework or libraries loaded in earlier, unless the script has the async attribute. And then there are the content which are loaded by the CSS and JS themselves, which in most modern web apps make up the majority of the content.

slashdev · on May 12, 2016

Nonsense. You have the css and javascripts. You just aren't certain that they're the most recent version. So you go ahead and render the page using either the version you have, or requesting it from the server, still doing everything in order. Then you validate your assumptions in the background about the stale versions you used still being the most current. If your assumptions are right (and mostly they will be) nothing happens. If they're wrong, you re-render the page, again all in order.

jasonkester · on May 12, 2016

Sense. It makes a difference if you run an old javascript file before loading its new replacement and running that. As in,

Original script:

  <script>
    location.href="http://google.com";
  </script>

New script

  <script>
    // oops!  forgot to remove this
    // location.href="http://google.com";
  </script>

You'll want to make sure you have the right version in place before you try to run either of them.

slashdev · on May 12, 2016

You have a point. What about CSS? At first I think it would just render funny, but some javascript actually interacts with the CSS, e.g. jQuery selectors based on style classes or something. So it has the same problem really.

Images would be safe.

Or an alternate plan, start rendering, but do not run the javascript until the resource checks for css+js files return. It would slow things down some more, but not as much as waiting for everything.

jasonkester · on May 12, 2016

Before:

  body
  {
    background:blue;
  }

After:

  body {}

It would play out similar to the Flash of Unstyled Content issue, but substituting "unpredictably wacky" for "unstyled".

brianwawok · on May 11, 2016

Is there a danger here of getting a corrupt resource, then no matter how many times you mash reload it never gets fixed? What do we have to stop this... I don't think CSS files have a SHA checksum header by default do they?

detaro · on May 11, 2016

from the article:

Correcting possible corruption (e.g. shift reload in Firefox) never uses conditional revalidation and still makes sense to do with immutable objects if you're concerned they are corrupted.

Also, there is Subresource Integrity, which adds a hash to the including tags, and if it is integrated correctly with the caching logic could catch this: https://developer.mozilla.org/en-US/docs/Web/Security/Subres...

Animats · on May 11, 2016

The combination of subresource integrity, cache immutability, and encoding the hash in the URL would be a good solution. URLs would look like

  http://www.cloudflare.com/cache/md5/d8e8fca2dc0f896fd7cb4cb0031ba249.css

and subresource integrity would insure that the value in the URL matches the hash of the content. Files named by their hash can be treated as immutable with confidence.

brianwawok · on May 12, 2016

This seems better. Out of the filename we get for free the hash.

Animats · on May 12, 2016

It works for Bittorrent, after all.

Matthias247 · on May 11, 2016

How do I do shift reload on my smartphone browser? And how do I even know what shift reload is (most people won't) or that the site is corrupted (it still shows something - how do I know that it isn't the most recent stuff)?

And for my general understanding of this proposal: Even if the current domain owner might guarantee that the content never changes - the domain can switch to another site which might use the pathes but of course wants to put different contents there. Is this somehow covered?

jackweirdy · on May 12, 2016

> How do I do shift reload on my smartphone browser?

For some reason, I have the intuition that it's by tapping the address bar (to get text focus) then go/enter/return. But I have no idea if that actually does the equivalent of shift reload!

manigandham · on May 11, 2016

You can just empty your cache completely - not ideal but still easily done on mobile.

Your 2nd question is confusing - are you asking what happens if you have the same exact path but from another domain? Then it depends on what that server responds with. This is just a HTTP response header, nothing more.

Matthias247 · on May 11, 2016

Yeah, I could clean the cache. But most people won't know how - and what a cache even is.

The second question was about that I expected that the new server won't even get queried if the immutable caching policy from the old server prevented this. And so it doesn't have a chance to signal that it's content changed.

tobz · on May 11, 2016

The potential for a domain to transfer ownership and still use the same paths, yet have different content seems incredibly unlikely. Like, it feels like you were trying to come up with potential issues for the sake of finding a way to say "see, this won't work!" :P

The biggest reason to use this is for versioned resources. Things that will never change. Say I create a minified JavaScript file. Its MD5 hash is 123456789abcdef...., and so in the output file, the filename is "foo.123456789abcdef.js". If the file changes, the hash changes. If I request the version of the file with "123456789abcdef", I should get that one. Ignoring the unlikely potential for hash collisions, everything in this scenario is working as intended. There is no conceivable reason to ever want to change the content while keeping the same hash.

Now, let's say that file, somehow, gets corrupted AND cached in your browser. I can't say I've ever seen something like this happen, but I suppose it's possible? I'd be very interested to hear if something like this is possible, to be honest. It seems like between TCP retransmissions and Content-Length, you would need some sort of subtle corruption that flips a bit and isn't corrected?

EDIT: As Klathmon points out, Subresource Integrity is probably a better solution to "corrupted file in cache" scenarios. As it stands, if a file was corrupted on disk, let's say, but the ETag and/or the Last-Modified values were accurate, the origin would only ever respond back saying "nope, no changes! you seem to have the latest copy" and you'd still be stuck with the corrupted file. Only a hard reload/cache clear solves that.

Matthias247 · on May 11, 2016

I don't want to come up with potential issues just for the sake of preventing this. But it's my job as an engineer to think about all potential issues and to avoid them as long as possible. And I'm not directly involved in this topic here or in the web in the large, but I just read this and have wondered if this is fully thought through or not and asked therefore.

Of course domain changes are unlikely. But nevertheless they are possible in our system and we have to cope with it. I just googled subresource integrity and it doesn't seem like an appropriate solution for this scenario. This would mean a new domain owner would need to generate those for ALL his links - just to be sure that the previous site didn't mess anything up. This means at first extra work and second you wouldn't even know for how long you need this (until all previous users have visited the new site).

There would be even possibilites for major annoyancies, if a previous site owner put that feature on things like index.html before owner change - just to avoid that visitors see the new page as long as possible.

tobz · on May 12, 2016

I mean, you could say the same thing about HSTS and key pinning. Domain changes hands, but "oops", HSTS was set and the old keys were pinned.

Is that actually a problem? No, it's not. Similarly, as the owner of a new domain, why would I want the old content? The only reason I can think of is that I brought a company outright, or something. In that situation, if I don't want to change the content, everything still works. If I want to change it, and they did something stupid -- like unversioned paths using this proposed flag -- then yeah, I'm in a weird spot. That seems like the most trivial and unlikely of scenarios, though. It requires such a complex chain of events to occur.

I think it's safe to say that malicious usage of the flag is entirely out of scope when considering the validity of it, again, because it requires a contrived situation.

rileymat2 · on May 11, 2016

> The potential for a domain to transfer ownership and still use the same paths, yet have different content seems incredibly unlikely.

index.html? favicon? There are a ton of duplicate paths.

jakub_g · on May 11, 2016

Of course. But no one sane would put 'Cache-Control: immutable' on `index.html`. It's to be used on `/js/lib/jquery-1.7.1.min.js` or `/js/mystuff-<sha1here>.js` or `/photos/mnbvcxzasdfghjklqqwertyuio1234567890.jpg`

kiallmacinnes · on May 11, 2016

Nobody sane would, intentionally. But I'd bet the house on it happening by accident quite a bit.

At a technical level, I like this idea. When used well, it makes sense to allow. It's hard to fault it without bringing in human error, politics, or economics.

At a practical level...

I can't wait to see what happens when a bug allows Facebook to serve this header on all pages, even for a few minutes. The most Facebook dependant folks around, those checking their phones every five minutes, will be stuck in a perpetual time freeze, unable to move forward ;).

I also can't wait to see what happens when a government tries to ban a cache-control: immutable page.

Or even what happens when, someday, Google is selling it's assets and gets to "google.com.". (Someday, itll happen - Google won't exist for all eternity)

jakub_g · on May 12, 2016

Fully agree you have to be extra cautious and that in complex setups it can happen accidentally.

But so can happen a myriad of other things: returning too long "Expires" value for some content (in months instead of days), misconfiguring ETags etc.

detaro · on May 11, 2016

If you can construct an attack out of it, relying on people doing sane things is dangerous... (I'm not sure this is interesting enough as an attack vector, but "but nobody would do that" is a bad answer a lot of the time)

JoshTriplett · on May 11, 2016

> The potential for a domain to transfer ownership and still use the same paths, yet have different content seems incredibly unlikely.

More generally, many protocols (including basic email verification) break horribly when the assumption that domains last forever gets broken. Ideally, domains shouldn't expire, ever.

toomuchtodo · on May 11, 2016

We better get moved over to IPFS sooner than later then ;)

0x0 · on May 11, 2016

But subresource integrity is not available for URLs that generate text/html ?

detaro · on May 11, 2016

Not that I know of.

EDIT: reading the firefox bug for this test implementation, I'm not sure if it is intended to be applied to pages, or only to sub-resources (early posts mention the distinction, later ones don't)

https://bugzilla.mozilla.org/show_bug.cgi?id=1267474

mnarayan01 · on May 12, 2016

I'm pretty sure that in the past at least, Firefox would skip the request even on a refresh for resources that had the appropriate Cache-Control headers and which that did not have any of the various Conditional-Get related headers: e.g. Etag, Date, etc. Did this change?

zedr · on May 12, 2016

> Facebook, like many sites, uses versioned URLs - these URLs are never updated to have different content and instead the site changes the subresource URL itself when the content changes. This is a common design pattern...

Why not use the ETag header instead? https://www.w3.org/Provider/Style/URI.html

return0 · on May 11, 2016

What about overriding the reload page event with javascript?

duskwuff · on May 12, 2016

There is no "reload page event". Even if there were, it wouldn't be overridable.

return0 · on May 12, 2016

What if they made it so?

ams6110 · on May 12, 2016

TLDR, our web pages are so bloated we want a new HTTP standard to deal with it.