Technotes

Technotes for future me

Varnish Purge

Varnish Purging or Banning

Everything

varnishadm "ban req.url ~ /"

Everything for one domain

varnishadm "ban req.http.host == blaataap.com"

Specific domain

varnishadm "ban req.http.host == example.com && req.url == /some/url/"

Specific domain starting with blaat

varnishadm "ban req.url ~ /blaat"

https://www.hipex.io/docs/nl/varnish/flushen/

More complex

The result of this is that, for all objects in memory, the HTTP response header Content-Type would match the regular expression ^image/, which would invalidate immediately.

varnishadm ban obj.http.content-type ~ “^image/”

varnishadm ban obj.http.content-type ~ “^image/” && req.url ~ “^/feature”

HTTP Purging

HTTP Purging is the most straightforward of these methods. Instead of sending a GET /url to Varnish, you would send PURGE /url. Varnish would then discard that object from the cache.
Add an access control list to Varnish so that not just anyone can purge objects from your cache; other than that, though, you’re home free.

Shortcomings of Purging

HTTP purging falls short when a piece of content has a complex relationship to the URLs it appears on. A news article, for instance, might show up on a number of URLs.
The article might have a desktop view and a mobile view, and it might show up on a section page and on the front page. Therefore, you would have to either get the content management system to keep track of all of these manifestations or let Varnish do it for you.
To let Varnish do it, you would use bans, which we’ll get into now.

Bans

A ban is a feature specific to Varnish and one that is frequently misunderstood. It enables you to ban Varnish from serving certain content in memory, forcing Varnish to fetch new versions of these pages.

An interesting aspect is how you specify which pages to ban. Varnish has a language that provides quite a bit of flexibility. You could tell Varnish to ban by giving the ban command in the command-line interface, typically connecting to it with varnishadm.
You could also do it through the Varnish configuration language (VCL), which provides a simple way to implement HTTP-based banning.

Let’s start with an example. Suppose we need to purge our website of all images.

varnishadm ban obj.http.content-type ~ “^image/”

The result of this is that, for all objects in memory, the HTTP response header Content-Type would match the regular expression ^image/, which would invalidate immediately.

Here’s what happens in Varnish. First, the ban command puts the ban on the “ban list.” When this command is on the ban list, every cache hit that serves an object older than the ban itself will start to look at the ban list and compare the object to the bans on the list. If the object matches, then Varnish kills it and fetches a newer one. If the object doesn’t match, then Varnish will make a note of it so that it does not check again.

Let’s build on our example. Now, we’ll only ban images that are placed somewhere in the /feature URL. Note the logical “and” operator, &&.

varnishadm ban obj.http.content-type ~ “^image/” && req.url ~ “^/feature”

You’ll notice that it says obj.http.content-type and req.url. In the first part of the ban, we refer to an attribute of an object stored in Varnish. In the latter, we refer to a part of a request for an object. This might be a bit unconventional, but you can actually use attributes on the request to invalidate objects in cache. Now, req.url isn’t normally stored in the object, so referring to the request is the only thing we can do here.

Issuing bans that depend on the request opens up some interesting possibilities. However, there is one downside to the process: A very long list of bans could slow down content delivery.

There is a worker thread assigned to the task of shortening the list of bans, “the ban lurker”. The ban lurker tries to match a ban against applicable objects. When a ban has been matched against all objects older than itself, it is discarded.

As the ban lurker iterates through the bans, it doesn’t have an HTTP request that it is trying to serve. So, any bans that rely on data from the request cannot be tested by the ban lurker. To keep ban performance up, then, we would recommend not using request data in the bans. If you need to ban something that is typically in the request, like the URL, you can copy the data from the request to the object in vcl_fetch, like this:

set beresp.http.x-url = req.url;

Now, you’ll be able to use bans on obj.http.x-url. Remember that the beresp objects turn into obj as it gets stored in cache.

Graceful Cache Invalidations

Imagine purging something from Varnish and then the origin server that was supposed to replace the content suddenly crashes. You’ve just thrown away your only workable copy of the content. What have you done?! Turns out that quite a few content management systems crash on a regular basis.

Ideally, we would want to put the object in a third state — to invalidate it on the condition that we’re able to get some new content. This third state exists in Varnish: It is called “grace,” and it is used with TTL-based invalidations. After an object expires, it is kept in memory in case the back-end server crashes. If Varnish can’t talk to the back end, then it checks to see whether any graced objects match, and it serves those instead.

One Varnish module (or VMOD), named softpurge, allows you to invalidate an object by putting it into the grace state. Using it is simple. Just replace the PURGE VCL with the VCL that uses the softpurge VMOD.

import softpurge;
sub vcl_hit {
   if (req.method == "PURGE") {
      softpurge.softpurge();
      error 200 "Successful softpurge";
   }
}

sub vcl_miss {
   if (req.method == "PURGE") {
      softpurge.softpurge();
      error 200 "Successful softpurge";
   }
}

Distributing Cache Invalidations Events

All of the methods listed above describe the process of invalidating content on a single cache server. Most serious configurations would have more than one Varnish server. If you have two, which should give enough oomph for most websites, then you would want to issue one invalidation event for each server. However, if you have 20 or 30 Varnish servers, then you really wouldn’t want to bog down the application by having it loop through a huge list of servers.

Instead, you would want a single API end point to which you can send your purges, having it distribute the invalidation event to all of your Varnish servers. For reference, here is a very simple invalidation service written in shell script. It will listen on port 2000 and invalidate URLs to three different servers (alfa, beta and gamma) using cURL.

nc -l 2000 | while true
    do read url
    for srv in "alfa" "beta" "gamma"
        do curl -m 2 -x $srv -X PURGE $url
    done
done

It might not be suitable for production because the error handling leaves something to be desired!

Cache invalidation is almost as important as caching. Therefore, having a sound strategy for invalidating the content is crucial to maintaining high performance and having a high cache-hit ratio. If you maintain a high hit rate, then you’ll need fewer servers and will have happier users and probably less downtime. With this, you’re hopefully more comfortable using tools like these to get stale content out of your cache.

Purging/Banning via an HTTP request

You can use the following template to write ban lurker friendly bans:

sub vcl_backend_response {
  # For banning/purging
  set beresp.http.x-url = bereq.url;
  set beresp.http.x-host = bereq.http.host;
}

sub vcl_deliver {
  # For banning/purging
  # We remove resp.http.x-* HTTP header fields,
  # because the client does not need them
  unset resp.http.x-url;
  unset resp.http.x-host;
}

sub vcl_recv {
  # For banning/purging
  if (req.method == "PURGE") {
      if (client.ip !~ purge) {
      return(synth(403, "Not allowed"));
      }
      return (purge);
  }

  if (req.method == "BAN") {
      if (client.ip !~ purge) {
      return(synth(403, "Not allowed"));
      }
      ban("obj.http.url ~ " + req.url); # Assumes req.url is a regex. This might be a bit too simple
      # Throw a synthetic page so the request won't go to the backend.
      return(synth(200, "Ban added"));
  }

  if (req.method == "REFRESH") {
      if (client.ip !~ purge) {
      return(synth(403, "Not allowed"));
      }
      set req.method = "GET";
      set req.hash_always_miss = true;
  }
}

View BAN logging

varnishlog -g request -q 'ReqMethod eq "PURGE"'

varnishlog -g request -q 'ReqMethod eq "BAN"'

varnishlog -g request -q 'ReqMethod eq "REFRESH"'

Actual PURGE/BAN/REFRESH

curl -X PURGE https://technotes.adelerhof.eu/test/add-test/

curl -X BAN https://technotes.adelerhof.eu/test/add-test/

curl -X REFRESH https://technotes.adelerhof.eu/test/add-test/

Request headers

curl -I https://technotes.adelerhof.eu/test/add-test/

Check BAN list

varnishadm ban.list

https://varnish-cache.org/docs/6.3/users-guide/purging.html

https://info.varnish-software.com/blog/wiki-highlights-cache-invalidation-varnish

http://book.varnish-software.com/4.0/chapters/Cache_Invalidation.html

http://book.varnish-software.com/4.0/chapters/Appendix_G__Solutions.html#solution-write-a-vcl-program-using-purge-and-ban

https://www.smashingmagazine.com/2014/04/cache-invalidation-strategies-with-varnish-cache/

http://book.varnish-software.com/4.0/chapters/Cache_Invalidation.html

Last updated on 31 Jan 2021
Published on 19 Mar 2020
Edit on GitHub