Monday, December 18, 2017

Configuring better load balancing and health checks for Plone with HAProxy

In the previous post of this blog I enumerated some problems we face when using normal load balancer health checks on Plone instances and I described a possible solution using five.z2monitor. In this post I'm going to show you how to configure HAProxy and other components of the stack in order to get the most of it.

As mentioned previously, we can do load balancing using nginx, Varnish or HAProxy. Both nginx and Varnish provide more features on their commercial versions, but I'm going to focus only on the community versions. In my opinion, if you ever need more features in your load balancer you should try HAProxy before buying one of those subscriptions.

On nginx the load balancer is handled by the ngx_http_upstream_module, it's very easy to enable and configure and it has some nice features (like backup servers and the least connections method), but also many limitations (like basic health checks and no queue control). This works very well on small sites.

On Varnish the load balancer is handled by the directors module, and I have to admit that I'm not impressed by its features as it lacks support for least connections which is the chief method with Plone instances. On the other side, backend definitions and health checks are more configurable, but support HTTP requests only. You may consider Varnish as a load balancer if you already use it on high traffic sites and is going to do a good and honest work.

(Previously, I was arguing in favor of the hash method, but later tests showed that the standard round robin is less prone to instance overloads.)

As you can see both nginx and Varnish can only handle HTTP requests for health checks and we have very limited control of request queues, something that in my experience has prove to be an issue.

(I never care about the ngnix queue, but in Varnish I always configure a max_connections directive to avoid flooding instances with requests that could make them unreachable for a long time.)

Let's go back on track: HAProxy is a very complex piece of software; it's so complex and its documentation so huge that I preferred just to removed it from the equation when I first had to take care of the infrastructure in our company some years ago.

That choice proved to be right at the moment, but I always like to investigate and act as my own devil's advocate from time to time. So, when one of our main sites started receiving some very nasty DoS attacks I started playing with it again.

It took me a whole week to prepare, test and review the configuration I'm going to share with you, but first I'm going to talk a little bit about our infrastructure and rationale.

The sites I'm talking about use (at least) 2 servers with a full stack: nginx, Varnish, HAProxy and Zope/Plone; both servers are configured using ZODB Replicated Storage (ZRS), one acting as the master ZEO server and the other as the slave. All Plone instances point to both ZEO servers. When the master ZEO server fails, all instances connect automatically to the slave in read-only mode. When all instances in one server fail, Varnish connects to the HAProxy in the other server as a backup. When Varnish fails, nginx tries to use the Varnish on the other server. That's more or less how it works.

This is our working haproxy.cfg and I going to explain some choices I made:


First the standard listen section: we use option tcp-check (layer 4) to make the health checks as is way faster that doing a HTTP (layer 7) check: we ask Zope on the alternate binded port for the ok command and expect the OK string as a result.

In my tests layer 4 checks took just a couple of milliseconds to finish, 2 orders of magnitude less than layer 7 checks. Faster, non-blocking responses mean you can check more frequently and you're going to detect sooner any failure on the instance.

Health checks are done every 2 seconds and, in case of failure, the slowstart parameter avoids flooding an instance during it's defined warming up period of one minute.

We inform the status of the backend upstream to Varnish using the monitor-uri directive: if we have no backends alive, HAProxy's monitor will also fail.

In Varnish we have have configured the backend like this:


Varnish will detect any backend failure in 2 seconds, at worst; Varnish will then try to use the HAProxy backend on the other server as a backup.

Back into HAProxy's configuration, we set maxconn 4 on backend servers to avoid sending more that 4 request to any instance; similarly, we set timeout queue 1m to keep the queue size under control.

Finally, the global section: you'll see we're using maxconn 256; in my opinion there is no reason to use a bigger value: if you have more that 256 requests on queue, you obviously have a problem and you need more instances to serve the traffic.

The typical stats screen of HAProxy with this configuration looks like this:

As you can see, over a period of almost 25 days we have no backend failures in this server, even with a large number of individual instance failures.

Last, but not least: if you want to cut the start up time of your instances you have to include the following directives on your buildout configuration (more about that on this Plone forum thread):

[instance]

# play with this value depending on the number of objects in your ZODB
zeo-client-cache-size = 256MB
zeo-client-client = zeoclient

Share and enjoy!

Monday, December 11, 2017

We have been doing health checks wrong in Plone

In the previous post of this blog I was arguing on how to increase the performance of high-traffic Plone sites. As mentioned there, we have different ways to do so: increasing the number of server threads, increasing the number of instances, and increasing both.

Increasing the number of threads is easier and consumes less memory but, as mentioned, is less resilient and can be affected by the Python GIL on multicore servers. Increasing the number of instances, on the other side, will increase the complexity of our stack as we will need to install a load balancer, a piece of hardware or software that distributes the incoming traffic across all the backend instances.

Over the years we have tried different software load balancers depending on the traffic of a site: we use nginx alone as web server and load balancer on smaller sites and we add Varnish as web accelerator and load balancer when the load increases; lately we started using HAProxy again to solve extreme problems on sites being continuously attacked (I'll write about on a different post).

When you use a load balancer you have to do health checking, as the load balancer needs to know when one of the backend instances has become unavailable because is quite busy to answer further requests, has been restarted, is out for maintenance, or is simply dead. And, in my opinion, we have been doing it wrong.

The typical configuration for health checking is to send requests to the same port used by the Zope HTTP server and this has some fundamental problems: First, the Zope HTTP server is slow to answer requests: it can take hundreds of milliseconds to answer even the most simple HEAD request.

To make things worst, the requests that are answered by the Zope HTTP server are the slower ones (content not in ZODB cache, search results, you name it…), as the most common requests are already being served by the intermediate caches. In the tests I made, I found that even a well configured server running a mature code base can take as many as 10 seconds to answer this kind of requests. This is a huge problem as health check requests start queuing and timing out taking perfectly functional instances out of the pool, and making things just worst.

To avoid this problem we normally configure health checks with long intervals (typically 10 seconds) and windows of 3 failures for every 5 checks. And this, of course, creates another problem as the load balancer takes, in our case, up to 30 seconds to discover that an instance has been restarted when using things like Supervisor and its memmon plugin, leading to a lot of 503 Service Unavailable errors in the mean time.

So, it's a complete mess no mater how you analyze it and I needed to find a way to solve it: enters five.z2monitor.

five.z2monitor plugs zc.monitor and zc.z3monitor into Zope 2, enabling another thread and port to handle monitoring. To install it you just need to add something like this into your buildout configuration:

[buildout]
eggs =
    …
    five.z2monitor
zcml =
    …
    five.z2monitor

[instance]

zope-conf-additional =
    <product-config five.z2monitor>
        bind 127.0.0.1:8881
    </product-config>


After running buildout and restarting your instance you can communicate with your Zope server over the new port using different commands, called probes:

$ bin/instance monitor help
Supported commands:
  dbinfo -- Get database statistics
  help -- Get help about server commands
  interactive -- Turn on monitor's interactive mode
  monitor -- Get general process info
  ok -- Return the string 'OK'.
  quit -- Quit the monitor
  zeocache -- Get ZEO client cache statistics
  zeostatus -- Get ZEO client status information


Suppose you want to get the ZEO client cache statistics for this instance; all you have to do is use the following command:

$ bin/instance monitor zeocache main
417554 895451465 435095 900622900 35429160


You can also use the Netcat utility to get the same information:

$ echo 'zeocache main' | nc -i 1 127.0.0.1 8881
417753 896710955 435422 901905068 35467686


It's easy to extend the list of supported commands by writing you own probes; in the list above I have added to the default command set one that I create to know if the server is running or not; it's called "ok", and here is its source code:


We have now a dedicate port and thread that can be used for health checking:

$ echo 'ok' | nc -i 1 127.0.0.1 8881
OK


With this we solved most of the problems I mentioned above: we have faster response time and no queuing; we can decrease the health check interval to a couple of seconds and we are almost sure that a failure is a failure and not just a timeout.

Note we can't use this with nginx, nor Varnish, as their health checks are limited and expect the same port used for HTTP requests; only HAProxy supports this configuration.

So, in the next post I'll show you how to configure HAProxy health checks to use this probe and how to reduce the latency to a couple of milliseconds.