Caching POST requests in Varnish

Tags: vcl (29)

Introduction

When a client (web browser) wants a regular resource (e.g., an HTML document or an image) from a web server, it will make a GET request to the server. In a default installation, Varnish will cache such requests, and this will lessen the load on the conventional web server.

When a user logs in to a site, or provides some unique or private information, this will be done through a POST request. By default, Varnish will not cache POST requests, but pass them directly to the backend server, unmodified. This is typically a good idea, but sometimes it makes sense to also cache POST requests.

When two different users use POST toward a web server, we only want them to receive the same reply if they supplied the same request body. The solution is to make the request body a part of the hash, and let the normal caching logic happen. The result is that only clients who supply the same body will receive the the same reply.

This tutorial describes the steps needed to cache POST requests with Varnish.

Prerequisites

Before you can start this tutorial, you should have:

  • A working installation of Varnish, with a working backend
  • Upgraded to at least Varnish Cache 4.1.7 or Varnish Enterprise 4.1.7r1
  • At least one URL that should be cached

The above should all be in a safe testing environment. Only after the last step - testing - should you put this in production.

Step 1 - Prepare the VCL for caching POST requests

In this tutorial we use two VMODs, std and bodyaccess. The first one, the standard vmod, is included with all Varnish versions.

The second VMOD, bodyaccess, is only included by default in Varnish Enterprise. If you are running Varnish Cache, you need to compile the VMOD from source. This VMOD is a part of varnish-modules, and it should be relatively straightforward to compile these against any current version of Varnish Cache.

You also need to import the VMODs in your existing VCL. You may already be using std, but you should make sure you import both VMODs at the top of your VCL:

import std;
import bodyaccess;

Step 2 - Update vcl_recv to cache POST requests

Add the following to the very top of sub vcl_recv:

unset req.http.X-Body-Len;

This is good practice. Delete all headers that we will use internally. This is to make sure that clients can not change the caching behavior of our VCL by sending non-standard headers with the request.

Next, pick the POST requests that you want cached, and prepare for hashing the request body:

if (req.method == "POST" && req.url ~ "search.html$") {
    std.log("Will cache POST for: " + req.http.host + req.url);
    if (std.integer(req.http.content-length, 0) > 500000) {
        return(synth(413, "The request body size exceeds the limit"));
    }

    if(!std.cache_req_body(500KB)){
        return(hash);
    }
    set req.http.X-Body-Len = bodyaccess.len_req_body();
    return (hash);
}

The test req.url ~ "search.html$" is just an example, and probably not the URLs that you want to cache. You need to replace it with a test that selects exactly the POST requests you want to cache.

The statement std.cache_req_body(500KB); will store up to 500 kilobytes of request body. This will allow us to hash the body (and also to restart the request if that is needed). Varnish will close the connection if the body is too big.

Step 3 - Change the hashing function

The following code will add the request body to the hash:

sub vcl_hash {
    # To cache POST and PUT requests
    if (req.http.X-Body-Len) {
        bodyaccess.hash_req_body();
    } else {
        hash_data("");
    }
}

The effect of this code is that requests with different request bodies will get different hash values. In other words, if this step is omitted, then the request body will be ignored by Varnish but sent to the backend. This will cause information leakage, which is a risk from a security perspective.

Note that we do not have a return(lookup), so the built-in VCL will make sure that both the host and the URL are part of the hash, as normal.

Step 4 - Make sure the backend gets a POST request

The default behavior of Varnish is to pass POST requests to the backend. When we override this in vcl_recv, Varnish will still change the request method to GET before calling sub vcl_backend_fetch. We need to undo this, as follows:

sub vcl_backend_fetch {
    if (bereq.http.X-Body-Len) {
        set bereq.method = "POST";
    }
}

Step 5 - Test

First of all, you must verify that the logic sub vcl_recv works as intended. This is best done in varnishtest, but this is outside the scope of this tutorial.

If you are unfamiliar with varnishtest, you can use the tool varnishlog to test your VCL. You should see log lines with the string Will cache POST for: on requests the that you want to cache, and not on any other requests.

Here we give an example of how to use curl and varnishlog to check that you get hit or miss when you should. The following example curl commands show how you can make POST requests with different bodies:

$ curl -d "FOO" -v https://example.com/search.html
$ curl -d "BAR" -v https://example.com/search.html
$ curl -d "FOO" -v https://example.com/search.html
$ curl -d "BAR" -v https://example.com/search.html

While doing the curl commands, you should run varnishlog to see if things work as expected. It might be a good idea to only show fields that are interesting, like this:

$ varnishlog -i VCL_call,ReqMethod,BereqMethod,ReqURL,BereqURL,VCL_Log
*   << BeReq    >> 3
-   BereqMethod    POST
-   BereqURL       /search.html
-   BereqMethod    GET
-   VCL_call       BACKEND_FETCH
-   BereqMethod    POST
-   VCL_call       BACKEND_RESPONSE

*   << Request  >> 2
-   ReqMethod      POST
-   ReqURL         /search.html
-   VCL_call       RECV
-   VCL_Log        Will cache POST request: example.com/search.html
-   VCL_call       HASH
-   VCL_call       MISS
-   VCL_call       DELIVER

*   << Session  >> 1

*   << BeReq    >> 6
-   BereqMethod    POST
-   BereqURL       /search.html
-   BereqMethod    GET
-   VCL_call       BACKEND_FETCH
-   BereqMethod    POST
-   VCL_call       BACKEND_RESPONSE

*   << Request  >> 5
-   ReqMethod      POST
-   ReqURL         /search.html
-   VCL_call       RECV
-   VCL_Log        Will cache POST request: example.com/search.html
-   VCL_call       HASH
-   VCL_call       MISS
-   VCL_call       DELIVER

*   << Session  >> 4

*   << Request  >> 32770
-   ReqMethod      POST
-   ReqURL         /search.html
-   VCL_call       RECV
-   VCL_Log        Will cache POST request: example.com/search.html
-   VCL_call       HASH
-   VCL_call       HIT
-   VCL_call       DELIVER

*   << Session  >> 32769

*   << Request  >> 32772
-   ReqMethod      POST
-   ReqURL         /search.html
-   VCL_call       RECV
-   VCL_Log        Will cache POST request: example.com/search.html
-   VCL_call       HASH
-   VCL_call       HIT
-   VCL_call       DELIVER

*   << Session  >> 32771

As you can see, the two first requests were miss, and resulted in backend fetch, while the last two were hit, and no backend fetch was needed.

Furthermore, you should decrease the parameter of std.cache_req_body and supply big bodies to your URL to see if you get the expected behavior.

Finally, when you are finished testing, you probably want to remove the logging we introduced in Step 2.

Conclusion

Caching POST requests requires very few lines of code, but some care is necessary to get it right. The key is to get vcl_hash to also use the request body in the hashing function.

Read about hashing in VCL if you want to know more about the subject.