Banning content from Varnish
Banning is a concept in Varnish that allows expression-based cache invalidation. This means that you can invalidate multiple objects from the cache without the need for individual purge calls.
A ban is created by adding a ban expression to the ban list. All objects in the cache will be evaluated against the expressions in the ban list before being served. If the object is banned Varnish will mark it as expired and fetch new content from the backend.
Ban expressions
A ban expression consists of fields, operators, and arguments. Expressions can be chained using the &&
operator. Only logical AND operations can be performed. Logical OR operations are done by evaluating multiple ban expressions.
Ban expression format
This is the format of ban expressions:
<field> <operator> <arg> [&& <field> <oper> <arg> ...]
The following fields are supported:
req.url
: the request URLreq.http.*
: any request headerobj.status
: the cache object statusobj.http.*
: the response headers stored in the cached object
The operator can be:
==
: thefield
equals anarg
!=
: thefield
is not equal to anarg
~
: thefield
matches a regular expression defined by thearg
!~
: thefield
doesn’t match a regular expression defined by thearg
The argument of a ban expression is either a literal string or a regular expression pattern. Strings are not delimited by double quotes "
or the long string format {"
…"}.
Ban expression examples
Let’s start with a very basic example that is the ban equivalent of a regular purge:
req.url == / && req.http.host == example.com
So the request’s URL equals /
, and the request’s Host
header equals example.com
.
In another example we’ll invalidate all objects from the cache that have an HTTP 404 status:
obj.status == 404
We can also create an expression that uses response headers that are stored in the object. Let’s say we want to invalidate all images at once. We’d use the following expression:
obj.http.Content-Type ~ ^image/
This expression looks at the Content-Type
response header and invalidates all items that match the ^image/
regular expression.
For the last example, we’ll match on a URL pattern, instead of an individual URL :
req.url ~ ^/products(/.+|$) && req.http.host == example.com
This pattern will match all objects where the URL starts with /products/...
or equals /products
.
Executing a ban from the command line
Now that you know what a ban is and what ban expressions look like, it’s time to explain how to execute a ban.
The quickest way to do this is by using the varnishadm
program. This program makes a connection to the CLI interface of varnishd
.
You can choose to call the varnishadm
program without any arguments, where you can enter individual commands. This is what happens in the example below:
$ varnishadm
200
-----------------------------
Varnish Cache CLI 1.0
-----------------------------
Type 'help' for command list.
Type 'quit' to close CLI session.
varnish> ban obj.status == 404
200
The ban obj.status == 404
command will issue a ban that aims to invalidate all objects with an HTTP 404 status code.
Another way you can ban using varnishadm
is by adding the ban expression as an argument. Here’s an example of this:
varnishadm ban obj.status == 404
Sometimes certain characters in your ban expression might interfere with how your Linux shell interprets commands:
$ varnishadm ban obj.http.Content-Type ~ ^image/
expected conditional (~, !~, == or !=) got "/root"
Command failed with error code 106
In this case, you’re better off using quotes to avoid errors:
varnishadm ban "obj.http.Content-Type ~ ^image/"
Performing bans via HTTP
Although banning can be done using varnishadm
and doesn’t require any VCL code, it would be nice to invalidate objects via an HTTP interface.
The ban()
function in VCL can be used to add new bans to the ban list. Because it is a VCL function, it can be triggered via HTTP.
Ban VCL code
Here’s the VCL code you need to perform bans via HTTP:
vcl 4.1;
acl purge {
"localhost";
"192.168.55.0"/24;
}
sub vcl_recv {
if (req.method == "BAN") {
if (!client.ip ~ purge) {
return (synth(405));
}
if (!req.http.x-invalidate-pattern) {
return (purge);
}
ban("obj.http.x-url ~ " + req.http.x-invalidate-pattern
+ " && obj.http.x-host == " + req.http.host);
return (synth(200,"Ban added"));
}
}
sub vcl_backend_response {
set beresp.http.x-url = bereq.url;
set beresp.http.x-host = bereq.http.host;
}
sub vcl_deliver {
unset resp.http.x-url;
unset resp.http.x-host;
}
purge
and is identical to the way we restrict access in our purge tutorial. The similarities do not end there: we even use the BAN
request method to invoke a ban.If the client IP address doesn’t match the ACL, an HTTP 405 Method Not Allowed
error is returned. The fact that we choose an HTTP 405 status means we’re not using a regular HTTP GET
method. Instead we’re using a custom BAN
method.
BAN
method.The logic that captures the BAN
request method must be explicitly defined in your VCL code, otherwise the built-in VCL behavior will not recognize the request method and will trigger a return (pipe)
.
Piping will send the request directly to the backend and will abandon any notion of caching and of HTTP. The piped request will be treated as bytes shuffled over the wire. Although the backend may respond with a 200 OK
status, it will not be a confirmation of a successful ban.
This VCL example uses the x-invalidate-pattern
request header to pass the ban expression. If this header is not set, a regular purge is executed.
However, if the x-invalidate-pattern
is set, it will be used to compose the ban expression, along with the Host
header. The snippet below illustrates this:
ban("obj.http.x-url ~ " + req.http.x-invalidate-pattern
+ " && obj.http.x-host == " + req.http.host);
If the value of x-invalidate-pattern
is set to \.jpg$
and the Host
header’s value is example.com
, the ban expression would be the following:
obj.http.x-url ~ \.jpg$ && obj.http.x-host == example.com
This would remove all JPG images from the cache for the example.com
domain.
Triggering a ban via HTTP
Bans can be triggered via HTTP by using the BAN
request method and by specifying an invalidation pattern in the x-invalidate-pattern
header. As for any HTTP/1.1
request, a Host
header must be supplied as well.
Here’s an example:
BAN / HTTP/1.1
x-invalidate-pattern: ^/product/[0-9]+\.html
Host: example.com
This HTTP request will add the following ban expression to the ban list:
obj.http.x-url ~ ^/product/[0-9]+\.html && obj.http.x-host == example.com
This will remove product pages from the cache, for example http://example.com/product/14456.html
and http://example.com/product/226701.html
.
And this is the corresponding HTTP response you may receive:
HTTP/1.1 200 Ban added
Date: Tue, 20 Oct 2020 13:30:12 GMT
Server: Varnish
X-Varnish: 32770
Content-Type: text/html; charset=utf-8
Retry-After: 5
Content-Length: 246
Accept-Ranges: bytes
Connection: keep-alive
<!DOCTYPE html>
<html>
<head>
<title>200 Ban added</title>
</head>
<body>
<h1>Error 200 Ban added</h1>
<p>Ban added</p>
<h3>Guru Meditation:</h3>
<p>XID: 32770</p>
<hr>
<p>Varnish cache server</p>
</body>
</html>
return (synth(200,"Ban added"))
will use the output template of the vcl_synth
subroutine to return a synthetic HTTP response. If you want to change the output, you have to override the vcl_synth
subroutine and update the resp.body
variable.The ban list
Unlike purges, bans will not immediately remove objects from the cache. The synthetic message from the VCL examples already gave it away: bans are added.
When you execute a ban, the ban expression is added to the ban list. This is a list containing all the bans that need to be evaluated, and matched against all the objects in cache.
The easiest way to see what the ban list contains is by running varnishadm ban.list
.
There is always an item on the ban list
Here’s the output of the ban list when the varnishd
process was just started:
$ varnishadm ban.list
Present bans:
1603270370.244746 0 C
Although no bans were issued, and no objects are stored in the cache, there is already an item on the list. Let’s break it down:
1603270370.244746
is the time at which the ban was added. It is in Unix timestamp format and has microsecond precision.0
is the refcount. There are currently0
objects that refer to this ban.C
stands for completed. This means the ban is fully evaluated.
The reason there is already a ban on the list is because every object in cache needs to refer to the last ban it saw when entering the cache. This allows bans that are older than the object to be disregarded.
So as soon as the first object is stored in cache, the refcount increases to 1
:
$ varnishadm ban.list
Present bans:
1603270370.244746 1 C
The refcount will increase as objects get inserted.
Adding a first ban
The ban list will change as soon as the first ban is added.
The following example may be a bit confusing:
varnishadm ban obj.status != 0
We’re banning all objects that do not have a 0
status. That’s literally every object in the cache.
When we consult the ban list, we see it has been added:
$ varnishadm ban.list
Present bans:
1603272627.622051 0 - obj.status != 0
1603270370.244746 3 C
Initially all three objects still refer to the initial ban as the one they have seen last. But with the addition of the new ban, that will change.
After a short while, the ban list will look like this:
$ varnishadm ban.list
Present bans:
1603272627.622051 0 C
1603270370.244746 0 C
The newly added ban is completed, and no objects refer to it because we just removed all objects from the cache. The initial ban is also still around.
As soon as a new object enters the cache, it refers to the last one it has seen:
$ varnishadm ban.list
Present bans:
1603272627.622051 1 C
If you look at the timestamp, it is 1603272627.622051
, which matches the ban we just executed.
Adding multiple bans
Let’s have a look at a ban list that already has some ban expressions in it:
$ varnishadm ban.list
Present bans:
1603273224.960953 2 - req.url ~ ^/[a-z]$
1603273216.857785 0 - req.url ~ ^/[a-z]+/[0-9]+
1603272627.622051 9 C
Nine objects saw 1603272627.622051
as their last ban. This means that up to two ban expressions should be evaluated for those objects.
For two objects, 1603273224.960953
was the last one they saw. These objects aren’t subject to any invalidation. These were objects that were inserted into to cache after the two recent bans were added.
There are zero objects that saw 1603273216.857785
as their last ban. This kind of makes sense because if you do the math between the last and the second-to-last ban, you’ll see there’s only an eight-second difference between the two bans. During those eight seconds, no new objects got added to the cache.
As time progresses, you’ll see that the req.url ~ ^/[a-z]+/[0-9]+
evaluation has completed, and that those nine objects have been processed:
$ varnishadm ban.list
Present bans:
1603273224.960953 2 - req.url ~ ^/[a-z]$
This means that nine objects were invalidated since they are no longer referenced.
Any future bans that are executed will apply to the two remaining objects, as long as they have not expired.
The ban lurker
The ban lurker is a Varnish thread that inspects the ban list and matches the ban expression to the right objects.
Every 0.010 seconds the ban lurker will look for objects that are at least one minute old. The lurker will process 1000 ban expressions at a time. It looks for the position of that object on the ban list and applies the most recent bans up until the point when a ban expression matches.
When a match is found that object is put on the expiry list and is removed from the cache shortly thereafter.
The ban lurker has some runtime parameters that control its behavior:
ban_lurker_age
is the minimum age a ban should have before it is processed by the ban lurker. The default value is60
seconds.ban_lurker_sleep
is the number of seconds the ban lurker sleeps before processing another batch. The default value is0.010
seconds.ban_lurker_batch
is the number of bans the ban lurker processed before going back to sleep. The default value is1000
.ban_lurker_holdoff
sets the number of seconds the ban lurker holds off when lock contention occurs during a cache lookup. The default value is0.010
seconds.ban_cutoff
limits the ban lurker from inspecting the ban list until the ban_cutoff limit is reached; beyond that it treats all objects as if they matched a ban and removes them from cache. The default value is0
.ban_dup
eliminates older identical bans when a new ban is added. The default value ison
.
Asynchronous bans
You may have noticed that our ban expressions contain fields like obj.http.x-url
and obj.http.x-host
instead of req.url
and req.http.host
.
The reason why we don’t use req.url
and req.http.host
in our ban expressions is because the ban lurker has no knowledge of any incoming HTTP request. Its scope is limited to the object context.
Any ban expression that refers to an obj.http.*
or an obj.status
field can be processed by the ban lurker. Basically only the response information that is part of the object is available to the ban lurker.
When the request context is used in a ban expression, the worker thread that handles the incoming request is responsible for this.
This means that such bans aren’t processed asynchronously and that space is only freed from the cache when a request comes in that matches one of these ban expressions.
The following VCL snippet contains the code that is used to store and remove the custom x-url
and x-host
response headers:
sub vcl_backend_response {
set beresp.http.x-url = bereq.url;
set beresp.http.x-host = bereq.http.host;
}
sub vcl_deliver {
unset resp.http.x-url;
unset resp.http.x-host;
}
Integrating bans in your application
Just like purges, you can call bans using command line HTTP clients:
#HTTPie
http BAN "www.example.com" "x-invalidate-pattern:^/contact"
# curl
curl -X BAN -H "x-invalidate-pattern:^/contact" "www.example.com"
But as we’ve seen earlier, there are also other command line tools in place to trigger bans:
varnishadm ban obj.http.Content-Type ~ ^image/
For frameworks like WordPress, Drupal, Magento, and many others, there are community-maintained plugins available that perform purge and ban calls to Varnish.