Varnish Architecture

Varnish has quite a few moving parts, and it’s important that you familiar with its workings before you start customizing and optimizing its behaviour for your individual use case.

After finishing this chapter, you will have an understanding of the following topics:

The components of the Varnish proxy
How Varnish interacts with backend servers and clients
Benefits of the Varnish Shared Memory (VSM) system
Storage backend options
The use of threads and pools

Overview

The Varnish cache as a software consists of a number of components that interact to provide the user with flexible caching at high performance.

A high-level overview of Varnish These are the high-level components of the Varnish proxy:

varnishd: The software program that that handles HTTP requests, manages the content cache, and interacts with the system administrator.
VCL: The Varnish Configuration Language allows the user to define in minute detail how Varnish will process and cache HTTP content that’s passing through. Crafting caching logic is the most important skill when it comes to running an effective caching proxy.
Backends: The servers to which Varnish passes on incoming HTTP requests, and from which it receives the web content it is supposed to serve at high speed. These servers are also known as “origin servers”.
Directors: A director is a set of backend servers from which Varnish can choose based on specific distribution or load balancing algorithms.
Varnish Shared Memory (VSM): The VSM allows Varnish to communicate with other applications, for example logging utilities, without sacrificing speed for protocol overhead.
Storage backends: Varnish offers a choice of storage backends that it can uses to store cached content.
Threads and pools: Varnish derives much of its superior speed from the way it uses multithreading to handle tasks concurrently. Understanding how this works is essential for effective performance tuning.

In the following topics and lessons, we’ll discuss all these in detail.

Components of the Varnish proxy service

varnishd, the Varnish daemon, is the software that runs as a service in the background, handles HTTP requests and responses, and applies rules for how to modify and cache them.

There are, in fact, two processes in the varnishd program. The first process is called “the manager”, and its job is to provide a control interface. After connecting to a special TCP socket port with a secret pre-shared key (PSK), the system administrator can interact with the proxy via a command line. This command line interface (CLI) offers almost full control of how Varnish handles HTTP traffic. While changes made via the command line interface will only affect the running process, and won’t persist through a restart of the service, they also do not require such a restart¹, as a configuration file modification would. This means that configuration changes made this way will not only have immediate effect, they will also not cause any service downtime.

The second process is “the worker”, making it clear that this is the more important of the two. It does all the work of handling HTTP traffic. When varnishd is started, the manager process comes up, and once it has processed all the command line flags, it will start the worker process. This is why the worker process is also called “the child process”; it’s spawned by the manager process. The manager process will always make sure that there’s a worker process. Should the worker process die, a new one will immediately take its place.

This split between manager and worker process does not only implement the common “Separation of Concerns” pattern, it also has security benefits. Since the manager process needs system-level access, for example to open TCP socket port 80, it will typically run with root user permissions. The worker process, the one that’s exposed to the internet, does not need any special access, and is therefore started with minimal permissions only.

VCL

Everything that Varnish does with HTTP requests, for example how it decides what to cache or which HTTP headers to remove, is specified using a special programming language called VCL — the Varnish Configuration Language.

The VCL syntax is not very complex, as you can see from the following code snippet that removes the cookie information from an HTTP request to make it anonymous and cacheable:

sub vcl_recv {
    # Remove the cookie HTTP header from the current request
    unset req.http.cookie;
}

The manager process will first compile the VCL program while checking it for errors. It is then the worker process that runs it for each and every HTTP request that comes in.

This may sound like an expensive operation that is likely to have a negative impact on overall performance. But don’t worry: Even very complex VCL programs execute in just a few microseconds. This is because VCL code is first compiled to C code, which in turn gets compiled into machine instructions. It’s this hardware-level machine code that is executed by the worker process.

Not only can Varnish run VCL programs very quickly, its command line interface even allows you to load new VCL code at runtime and switch between the loaded VCL programs instantly without restarting the worker process, without missing a single HTTP request.

In the rare case that the VCL syntax is too limiting or complicated for implementing some special functionality, you can extend your caching logic using inline C code. But the more maintainable approach is to extract this special code into external modules that you can load in. These so-called VMODs make it easy to re-use common functionality or handle more complex logic separately.

How Varnish interacts with clients and origin servers

Accepting client requests

Backends and directors

Varnish calls a server that it talks to in order to get the requested content a “backend” or “origin” server. There can be multiple backends, and with VCL, you can assign a specific one for every incoming request based on custom rules. You’ll learn how to write these rules in a later lesson.

However, most of the time when you have multiple backends, you don’t really care which specific one gets to serve the content for a request. You just want Varnish to choose one that is reachable, ideally spreading the load evenly between all backends. For this case, Varnish has a feature called a “director”. A director is a group of backends that, when you tell it to handle a request, will pick one of its backends using a specific distribution algorithm. By default, Varnish uses a “round-robin” algorithm that picks the next backend in the list for any new request, starting over after the last one. Alternatively, there’s also a “random director” that picks backends, well, randomly. You can even write your own director.

It is even possible to “layer” directors. Let’s assume you have backend servers in data centre A, and you prefer to have them handle incoming requests. But for the rare case that all these backends, or even the whole data centre, go offline, you also have set up emergency backends in data centre B. In this scenario, you can define a director “A” that distributes traffic to the backends in data centre A. And a second director “B” does the same for the other data centre. You can then layer a director “Main” on top of the two that, using the “fallback” algorithm, will use director “A” if at all possible, and if not, fall back on director “B”.

Health checks

How does Varnish know if a backend server is down in the first place? It would of course notice it when it tries to submit a content request, but that’s too late because now the client will get a delayed response or, more likely, an error message. In order to proactively keep track of all available backends, Varnish uses health checks. To ensure that only healthy backends get to play, you can tell Varnish to connect to a special URL on each backend on a regular basis, and only if this “probe” is successful, Varnish will use the backend to handle incoming requests.

Connection pools

Having many backends is definitely an advantage in terms of availability and load distribution. But it comes at the price of more backend connections. Establishing backend connections, especially over the network via TCP and/or TLS, requires significant amounts of time, and those will add up. A common way to minimize this overhead is to maintain a connection pool. Varnish pools backend connections by default. Whenever a backend task is finished, the used connection is not closed but rather added to a connection pool for reuse with a future request.

The Varnish Shared Memory system

In order to monitor and troubleshoot Varnish’s behaviour, the system administrator needs information on its activities and status. However, providing this information without a performance penalty impacting its main job of serving web content isn’t trivial. This is why many network services don’t provide this information in their standard mode of operation. In order to get insight into the inner workings of these services, you have to enable a special “debug mode” that might have the downside of not being suitable for production use.

Varnish solves this problem more elegantly by reporting and logging events and statistics via a segment of shared memory. For every HTTP request, it appends a number of very detailed records to this log memory segment. Other processes can attach themselves to this shared memory segment, and fetch, filter and process log records asynchronously. This decoupling ensures that Varnish has to only put up minimal effort to provide a stream of internal details, and doesn’t even need to know if and how they’ll be used.

Two examples of applications that use the Varnish Shared Memory (VSM) segment are varnishlog and varnishncsa. The former outputs a stream of microscopic details of the HTTP traffic handled by Varnish. We’ll cover varnishlog in detail in the lesson “Troubleshooting”. The latter writes an access log with records in the traditional Apache/NCSA log format, a format that can be processed by many web server analytics applications.

Another tool that comes with Varnish is varnishstats. It reads from a separate segment in shared memory where Varnish publishes statistics counters down to a microsecond resolution such as cache hit-rate, resource usage and various performance metrics.

Varnish even includes an API library that allows users to develop their own tools for consuming the treasure trove of information that is the VSM.

Storage backend options

Varnish can store its cache content in a number of ways that have different performance characteristics. These storage mechanisms are called “storage backends”; don’t confuse them with the HTTP backend servers from which Varnish will request web content.

In most cases, memory-based storage backends such as “malloc” are the best choice because of their superior performance. Of course, the server running Varnish needs to have sufficient RAM available to hold enough content objects for the cache to be effective.

In cases when the amount of cache storage required exceeds the available RAM resources, Varnish can also use virtual memory backed by a file on disk. It will rely on the operating system to handle the necessary paging, the swapping in and out of data between memory and disk storage. Because of this overhead, using the “file” storage backend can severely diminish the cache performance.

Multithreading

Network services have to do many things at the same time. Among the tasks Varnish has to handle are accepting new client connections, processing incoming requests, receiving web content from backend servers, storing content objects in a cache backend, updating the VSM with its latest activities and statistics, and additional internal housekeeping.

The classic way to allow software to keep multiple plates spinning at the same time is to have the operating system manage a multitude of processes. Since the number of running processes will always exceed the number of hardware components that actually can execute code in parallel, even with today’s multi-core CPUs, processes have to share their CPU with others. The event when the operating system switches between two running processes is called “content switching”. Because the operating system has to keep each process safely isolated from every other process, context switching requires a lot of work behind the scenes; it’s a very expensive operation. The same applies to starting new processes and decommissioning ones that have outlived their purpose.

For this reason, the concept of threads was introduced. Threads are also called “light-weight processes” (LWP) because they also virtually run in parallel, but their management overhead is far lower than that of full-fledged processes. All threads share the resources of the process to which they belong, which is why spawning new threads and switching between them are much simpler operations than process creation and context switching.

You will already have guessed that Varnish uses threads for organizing all its concurrent tasks. Up until version 4.0, Varnish used a central thread to accept new client connections. For each request, a dedicated thread would then handle all the necessary tasks, from processing the request over delivering cache content to fetching the related content from a backend.

This system of dedicated threads had the disadvantage that when there is no cached content available and the backend is slow to answer, the affected client will have to wait for Varnish to fetch and cache the response, sometimes for a relatively long time. That’s why Varnish 4.0 introduced asynchronous fetches, which were made possible by splitting work between frontend and backend threads. When the so-called “grace mode” is enabled, a frontend thread can now satisfy the client with slightly outdated cache content while a backend thread will hurry to update the cache with fresh content.

Having only a single thread for establishing new client connections also has its limitations. It becomes a bottleneck when a huge amount of requests start coming in; with “huge amount” meaning many thousands of requests per second. For these rare high-demand scenarios, Varnish offers the option of running multiple thread pools. Each of these pools maintains its own thread for accepting client connections. In order to get the most out of the available hardware capacity, it’s recommended to set the number of thread pools equal to the number of CPU cores the server running Varnish has.

Table of Contents

There are, however, a few exceptions. Some very fundamental changes will always require a service restart. ↩