Caching POST requests in Varnish
Introduction
When a client (web browser) wants a regular resource (e.g., an HTML document or an image) from a web server, it will make a GET
request to the server. In a default installation, Varnish will cache such requests, and this will lessen the load on the conventional web server.
When a user logs in to a site, or provides some unique or private information, this will be done through a POST
request. By default, Varnish will not cache POST
requests, but pass them directly to the backend server, unmodified. This is typically a good idea, but sometimes it makes sense to also cache POST requests.
POST
requests.When two different users use POST
toward a web server, we only want them to receive the same reply if they supplied the same request body. The solution is to make the request body a part of the hash, and let the normal caching logic happen. The result is that only clients who supply the same body will receive the the same reply.
This tutorial describes the steps needed to cache POST
requests with Varnish.
Prerequisites
Before you can start this tutorial, you should have:
- A working installation of Varnish, with a working backend
- Upgraded to at least Varnish Cache 4.1.7 or Varnish Enterprise 4.1.7r1
- At least one URL that should be cached
The above should all be in a safe testing environment. Only after the last step - testing - should you put this in production.
Step 1 - Prepare the VCL for caching POST requests
In this tutorial we use two VMODs, std
and bodyaccess
. The first one, the standard vmod, is included with all Varnish versions.
The second VMOD, bodyaccess
, is only included by default in Varnish Enterprise. If you are running Varnish Cache, you need to compile the VMOD from source. This VMOD is a part of varnish-modules, and it should be relatively straightforward to compile these against any current version of Varnish Cache.
You also need to import the VMODs in your existing VCL. You may already be using std
, but you should make sure you import both VMODs at the top of your VCL:
import std;
import bodyaccess;
Step 2 - Update vcl_recv to cache POST requests
Add the following to the very top of sub vcl_recv
:
unset req.http.X-Body-Len;
This is good practice. Delete all headers that we will use internally. This is to make sure that clients can not change the caching behavior of our VCL by sending non-standard headers with the request.
Next, pick the POST requests that you want cached, and prepare for hashing the request body:
if (req.method == "POST" && req.url ~ "search.html$") {
std.log("Will cache POST for: " + req.http.host + req.url);
if (std.integer(req.http.content-length, 0) > 500000) {
return(synth(413, "The request body size exceeds the limit"));
}
if(!std.cache_req_body(500KB)){
return(hash);
}
set req.http.X-Body-Len = bodyaccess.len_req_body();
return (hash);
}
The test req.url ~ "search.html$"
is just an example, and probably not the URLs that you want to cache. You need to replace it with a test that selects exactly the POST requests you want to cache.
The statement std.cache_req_body(500KB);
will store up to 500 kilobytes of request body. This will allow us to hash the body (and also to restart the request if that is needed). Varnish will close the connection if the body is too big.
Step 3 - Change the hashing function
The following code will add the request body to the hash:
sub vcl_hash {
# To cache POST and PUT requests
if (req.http.X-Body-Len) {
bodyaccess.hash_req_body();
} else {
hash_data("");
}
}
The effect of this code is that requests with different request bodies will get different hash values. In other words, if this step is omitted, then the request body will be ignored by Varnish but sent to the backend. This will cause information leakage, which is a risk from a security perspective.
Note that we do not have a return(lookup)
, so the built-in VCL will make sure that both the host and the URL are part of the hash, as normal.
Step 4 - Make sure the backend gets a POST request
The default behavior of Varnish is to pass POST requests to the backend. When we override this in vcl_recv
, Varnish will still change the request method to GET before calling sub vcl_backend_fetch
. We need to undo this, as follows:
sub vcl_backend_fetch {
if (bereq.http.X-Body-Len) {
set bereq.method = "POST";
}
}
Step 5 - Test
First of all, you must verify that the logic sub vcl_recv
works as intended. This is best done in varnishtest
, but this is outside the scope of this tutorial.
If you are unfamiliar with varnishtest
, you can use the tool varnishlog
to test your VCL. You should see log lines with the string Will cache POST for:
on requests the that you want to cache, and not on any other requests.
Here we give an example of how to use curl
and varnishlog
to check that you get hit or miss when you should. The following example curl
commands show how you can make POST requests with different bodies:
$ curl -d "FOO" -v https://example.com/search.html
$ curl -d "BAR" -v https://example.com/search.html
$ curl -d "FOO" -v https://example.com/search.html
$ curl -d "BAR" -v https://example.com/search.html
While doing the curl commands, you should run varnishlog
to see if things work as expected. It might be a good idea to only show fields that are interesting, like this:
$ varnishlog -i VCL_call,ReqMethod,BereqMethod,ReqURL,BereqURL,VCL_Log
* << BeReq >> 3
- BereqMethod POST
- BereqURL /search.html
- BereqMethod GET
- VCL_call BACKEND_FETCH
- BereqMethod POST
- VCL_call BACKEND_RESPONSE
* << Request >> 2
- ReqMethod POST
- ReqURL /search.html
- VCL_call RECV
- VCL_Log Will cache POST request: example.com/search.html
- VCL_call HASH
- VCL_call MISS
- VCL_call DELIVER
* << Session >> 1
* << BeReq >> 6
- BereqMethod POST
- BereqURL /search.html
- BereqMethod GET
- VCL_call BACKEND_FETCH
- BereqMethod POST
- VCL_call BACKEND_RESPONSE
* << Request >> 5
- ReqMethod POST
- ReqURL /search.html
- VCL_call RECV
- VCL_Log Will cache POST request: example.com/search.html
- VCL_call HASH
- VCL_call MISS
- VCL_call DELIVER
* << Session >> 4
* << Request >> 32770
- ReqMethod POST
- ReqURL /search.html
- VCL_call RECV
- VCL_Log Will cache POST request: example.com/search.html
- VCL_call HASH
- VCL_call HIT
- VCL_call DELIVER
* << Session >> 32769
* << Request >> 32772
- ReqMethod POST
- ReqURL /search.html
- VCL_call RECV
- VCL_Log Will cache POST request: example.com/search.html
- VCL_call HASH
- VCL_call HIT
- VCL_call DELIVER
* << Session >> 32771
As you can see, the two first requests were miss, and resulted in backend fetch, while the last two were hit, and no backend fetch was needed.
Furthermore, you should decrease the parameter of std.cache_req_body
and supply big bodies to your URL to see if you get the expected behavior.
Finally, when you are finished testing, you probably want to remove the logging we introduced in Step 2.
Conclusion
Caching POST requests requires very few lines of code, but some care is necessary to get it right. The key is to get vcl_hash
to also use the request body in the hashing function.
Read about hashing in VCL if you want to know more about the subject.