The Security Impact of HTTP Caching Headers
Earlier this week, an update for Media-Wiki fixed a bug in how it used caching headers [2]. The headers allowed authenticated content to be cached, which may lead to sessions being shared between users using the same proxy server. I think this is a good reason to talk a bit about caching in web applications and why it is important for security.
First off all: If your application is https only, this may not apply to you. The browser does not typically cache HTTPS content, and proxies will not inspect it. However, HTTPS inspecting proxies are available and common in some corporate environment so this *may* apply to them, even though I hope they do not cache HTTPS content.
It is the goal of properly configured caching headers to avoid having personalized information stored in proxies. The server needs to include appropriate headers to indicate if the response may be cached.
Caching Related Response Headers
Cache-Control
This is probably the most important header when it comes to security. There are a number of options associated with this header. Most importantly, the page can be marked as "private" or "public". A proxy will not cache a page if it is marked as "private". Other options are sometimes used inappropriately. For example the "no-cache" option just implies that the proxy should verify each time the page is requested if the page is still valid, but it may still store the page. A better option to add is "no-store" which will prevent request and response from being stored by the cache. The "no-transform" option may be important for mobile users. Some mobile providers will compress or alter content, in particular images, to save bandwidth when re-transmitting content over cellular networks. This could break digital signatures in some cases. "no-transform" will prevent that (but again: doesn't matter for SSL. Only if you rely on digital signatures transmitted to verify an image for example).The "max-age" option can be used to indicate how long a response can be cached. Setting it to "0" will prevent caching.
A "safe" Cache-Control header would be:
Cache-Control: private, no-cache, no-store, max-age=0
Expires
Modern browsers tend to rely less on the Expires header. However, it is best to stay consistent. A expiration time in the past, or just the value "0" will work to prevent caching.
ETag
The ETag will not prevent caching, but will indicate if content changed. The Etag can be understood as a serial number to provide a more granular identifcation of stale content. In some cases the ETag is derived from information like file inode numbers that some administrators don't like to share. A nice way to come up with an Etag would be to just send a random number, or not to send it at all. I am not aware of a way to randomize the Etag.
Pragma
Thie is an older header, and has been replaced by the "Cache-Control" header. "Pragma: no-cache" is equivalent to "Cache-Control: no-cache".
Vary
The "vary" header is used to ignore certain header fields in requests. A Cache will index all stored responses based on the content of the request. The request consist not just of the URL requested, but also other headers like for example the User-Agent field. You may decide to deliver the same content independent of the user agent, and as a result, "Vary: User-Agent" would help the proxy to identify that you don't care about the user agent. For out discussion, this doesn't really matter because we never want the request or response to be cached so it is best to have no Vary header.
In summary, a safe set of HTTP response headers may look like:
Cache-Control: private, no-cache, no-store, max-age=0, no-transform Pragma: no-cache Expires: 0
The "Cache-Control" header is probably overdone in this example, but should cover various implementations.
A nice tool to test this is ratproxy, which will identify inconsistent cache headers [3]. For example, ratproxy will alert you if a "Set-Cookie" header is sent with a cachable response.
Anything I missed? Any other suggestions for proper cache control?
References:
[1] http://www.ietf.org/rfc/rfc2616.txt
[2] https://bugzilla.wikimedia.org/show_bug.cgi?id=53032
[3] https://code.google.com/p/ratproxy/
------
Johannes B. Ullrich, Ph.D.
SANS Technology Institute
Twitter
Comments
Right now everything with cache control has been "best effort", as there is no one magic configuration to convince all clients browsers to behave properly.
Cache-Control:must-revalidate, no-cache, no-store, private
Anonymous
Nov 15th 2013
1 decade ago
..."Pragma: no-cache" is equivalent to "Cache-Control: no-cache"...
AFAIK, the RFC defines Pragma only as a request header. It has no meaning as a response header.
For Cache-Control we use no-cache, no-store. According to the RFC this combination makes all other options (private, max-age, must-revalidate) superfluous.(There is a known glitch in older IE where the meaning of no-store is overextended to "not allowed to download" so sites need to be careful with that option.) I would be interested to hear other people's experience.
Anonymous
Nov 15th 2013
1 decade ago
Anonymous
Nov 23rd 2013
1 decade ago