Introducing Varnish

The need for speed

On the web, the only thing that’s more important than the performance at which a website delivers its content is the value said content creates for the users of that website.

With attention spans decreasing, many people will nowadays simply abandon a website that takes too long to get displayed in their browser, and try their luck elsewhere. And it’s not just human visitors give bad marks for sluggishness. When search engines determine the rank of a website, its response time is going to be a substantial factor for how high it will appear in search results.

The problem is that, depending on the complexity a web application has to deal with behind the scenes, reaching minimal response times is a challenge. Common websites using a content management system like Drupal or WordPress, for example, often do hundreds or even thousands of database requests to render just a single page. Even if a single database roundtrip only takes a millisecond or two, those add up quickly.

Another issue is simple server load. In order to render just a single web page, a browser doesn’t just have to get its HTML code from the web server. It will have to make additional requests to get all the other necessary pieces such as images, CSS style files, fonts, and JavaScript logic. In consequence, a web server often has to process a dozen or more HTTP requests for every single page that gets visited. While a web server can handle incoming requests in parallel, the amount of “workers”, i.e. server resources that process incoming requests, is limited. When all workers are busy, for example due to a spike in concurrent visitors, newly arriving requests can’t be handled right away. They will either be parked until a worker becomes available, which adds more waiting on the time it will still take the web application to come up with a response. In the worst case, requests that can’t be handled by a web server worker get dropped, and the visitor will see either an incomplete web page or just an error message.

If you look closely, you’ll see that these issues are intertwined. The faster a web server can answer incoming requests, the earlier its workers are freed up to handle new ones. That’s why web server performance optimization is an important task in web operations. The optimal way to deliver content, however, is to not involve the web server in the first place.

Wait, how do you serve web content without a web server? Well, there still needs to be a web server, but by adding another service that can serve web content in front of it, we can reduce the amount of web traffic it has to handle substantially.

One such service is a CDN. By putting a Content Delivery Network in between the web application and its visitors, you can lower both the average content delivery time and the load on your web server significantly. Maybe now you’re thinking “Wait, we’re adding another component to the content delivery pipeline, and thus increase infrastructure complexity. How does that make things faster?” The answer is caching.

What a CDN does is forward incoming requests to the web application, receive its response, and then store this response for reuse with subsequent requests for the same content. Answering requests with using pre-baked content will take orders of magnitude less time than the initial response that the web application had to render and serve. A CDN can be an integral part of a web performance optimization strategy.

But as always, there’s a catch: Not only the performance but also the cost of a CDN can be substantial. What if you don’t have to increase the cost of running your website and still benefit from the advantages of cached web requests?

You use Varnish.

What is Varnish? Varnish is an open source HTTP reverse proxy with high-performance caching capabilities.

I know, that’s a mouthful. Let’s break it into more digestible pieces.

Okay, I’m not going to explain “open source”. And since we’re talking about web traffic, it’s no surprise that the underlying protocol in play here is going to be HTTP.

How proxies work

A simple HTTP transaction looks like this.

A sequence diagram of a simple HTTP roundtrip

Sequence diagrams like this one are read from top to bottom. The horizontal arrows show which component is interacting with which other component.

This diagram shows that in the most simply HTTP rountrip, a web browser first sends an HTTP request to a server. The server then answers with its HTTP response, and the interaction is finished.

A proxy is what you could call a HTTP middleman. It is an application that receives HTTP requests and passes them on to its original destination, usually a website or web application.

The website will send its response back to the proxy. The proxy then takes the HTTP response, and forwards it on to the client that made the original HTTP request.

A sequence diagram of an HTTP roundtrip via a proxy

Of course, you don’t just insert another infrastructure component for the fun of it. The value of a proxy lies in its ability to analyze, modify, and store HTTP requests and their related responses when they pass through.

The most common functions performed by proxies are:

filtering
load balancing
authentication
logging
caching

Before we go on, let me introduce two terms that will help me maintain clarity about the components I’m referencing. Up until now, I used a web browser as the origin of requests submitted to the proxy in my examples. But a proxy can just as well be located in front of an API service that receives queries not from a human visitor but from another piece of software. To cover both scenarios, I’ll use “client” as an umbrella term for all kinds of applications speaking to the proxy in HTTP.

In the same vein, I’ll reference the server to which the proxy passes on incoming requests as the proxy’s “backend server”.

Caching proxies

It is between receiving content from the backend and passing it on where Varnish adds its unique value: caching the web application’s responses in its local storage.

A sequence diagram of a proxy caching a HTTP response

In this scenario, the proxy receives a request from the client. The proxy tries to retrieve content that was delivered for the same request in the past, but there isn’t any in the cache storage. That’s why the proxy has to forward the request to the backend server, which returns its response to the proxy. If there isn’t a condition that would prevent caching this response (we’ll talk about conditions like this later on), the proxy will store the response in its cache storage. And of course, it also delivers it to the client that requested it in the first place.

The increased complexity from adding Varnish to the mix pays off as soon as the same content gets requested once more. If there is cached content for a specific request available, Varnish will handle that requests itself by serving the response from its cache storage instead of requesting it another time from the backend server.

A sequence diagram of a proxy serving a cached HTTP response

As you can see, the backend server is not involved in this interaction at all; the proxy with its cache storage can take care of the client’s request all by itself.

Because Varnish usually uses RAM as its cache storage, both storing and retrieving cached content happens extremely fast. Faster than a web server can, for example, read an image file from disk; and much, much faster than it could render a HTML page using information from a database. That’s where the “high-performance caching capabilities” part comes from.

Content caching can improve the perceived website performance tremendously, and this is the reason why the instigator of Varnish chose this name. When he found himself staring at an art poster with the word “Vernissage” on it, he checked a dictionary and found the following definition:

To cover with varnish.
To give a smooth and glossy finish to.
To give a deceptively attractive appearance to; gloss over.

And there’s no denying that by speeding up your website, Varnish does make it much more attractive, and deceptively so!

The history of Varnish

Before Varnish entered the scene, there already existed another popular cache proxy named Squid. Squid was mostly used as a forward proxy; in other words, it was located on the client side. Companies, for example, would funnel the web traffic of all their employees through Squid to accelerate access to commonly used content. Used as a reverse proxy in front of a single website, Squid wasn’t quite as effective and reliable.

That’s why Varnish was designed to fill this gap, to specifically increase the scalability and capacity of content-heavy dynamic websites and heavily consumed APIs. Sites like this usually run on content management systems that create the content from detail information stored in databases and external sources. Varnish’s job is not to create content but to make the delivery of said content lightning fast. Following the Unix philosophy of DOTADIW (“Do One Thing, And Do It Well”), it focuses exclusively on the HTTP protocol and doesn’t, for example, deal with SSL/TLS encryption.

The development of Varnish started in early 2006 in Oslo, Norway, by Poul-Henning Kamp, Anders Berg and Dag-Erling Smørgrav. Anders was a sysadmin at Norway’s biggest online newspaper vg.no, and Dag-Erling was a consultant at what at the time was Norway’s biggest FOSS (Free and Open Source Software) consultancy Linpro. Varnish 1.0 was released in September of 2006. In 2010, the company Varnish Software was founded to maintain the project and provide commercial add-ons and support.

Many years later, Varnish has become an essential part of the content delivery pipelines of websites small and big, as well as of social media and content sites such as Facebook, Twitter, Vimeo, and Tumblr. Varnish can also be found as the central building block of global content delivery networks like Fastly.

Table of Contents