Posted By: cody
Last Updated: Sunday May 24, 2009
This is just going to be a quick overview of some common methods of load balancing your web services via software load balancers. This post will have several parts, each covering different software load balancers. This particular part we will be covering Perlbal.
In this post I’ll be using two different VPS accounts as an example since we will need to spread the load across something. I’ll be referencing them as VPS1 and VPS2. In the examples they will be running Ubuntu, though most of the tutorial can be applied to any distribution. One thing to note is I’ll be using Aptitude for package management which is fairly specific to Debian based distributions. Usually Yum will be your alternative - and in the worse case scenario you will have to compile the packages manually.
Perlbal is a single-threaded event-based server supporting HTTP load balancing, web serving, and a mix of the two.
Perlbal is a software load balancer created by Danga Interactive for the use on LiveJournal, Typepad, and other ventures. It’s a lightweight reverse proxy that has the ability to serve as a web server, though that is out of the scope of this post.
According to Wikipedia a reverse proxy is:
A reverse proxy or surrogate is a proxy server that is installed in a server network. Typically, reverse proxies are used in front of Web servers. All connections coming from the Internet addressed to one of the Web servers are routed through the proxy server, which may either deal with the request itself or pass the request wholly or partially to the main web servers.
I believe this description is very fitting - it’s essentially a “middle man” for a request (in this case HTTP request) that decides where it should be routed to. This could mean simply deciding to put it on VPS1, or VPS2 - or it could mean if the HTTP request is for an image we’ll send it to Lighty, and if it’s a normal request to a PHP page we’ll send it to Apache. Think of it as a “router” in the literal sense.
I’ll give two simple scenario’s to try and explain the useflness of reverse proxies:
You’re serving 100 requests/second and are running out of memory on your server - you notice that a good 80% of the requests are simply for static files, such as images and stylesheets - unfortunately Apache still takes a large amount of memory / CPU for these requests causing you to exhaust your resources. One way you could reduce the usage is simply setting up an alternative web server such as Lighty that has a lighter footprint to serve your static files, and keep Apache for the requests serving your application.
We’ll also use the example of serving 100 requests/second - but let’s just say your server simply can’t handle the traffic. You could setup another server identical to your current one, and have the reverse proxy listen for incoming requests and it will direct the traffic to both servers seemlessly - allowing you to split the resources across the board and maintaining a fast / stable website.
These examples may have some flaws, though they’re simply here to illustrate common uses / the benefits of using a reverse proxy (or software load balancer).
First you need to find where your bottleneck is in your application. Is it the web server or is it the database server? Could it simply be slow SQL queries that could be fixed by adding a index? Ultimately you want to make sure that your application is optimized as much as possible - and make sure the bottleneck can’t simply be fixed by a few tweaks.
In this setup we’re going to use Apache to serve all HTTP requests relating to PHP - or your application, and use Lighty to serve static files such as images. The goal here is to use Lighty because of its low footprint to serve static files such as images and to lower the overall usage of your server.
You will need the following packages:
In Ubuntu each of these should be in the repositories, so you can install them with via Aptitude:
aptitude install apache2 lighttpd perlbal -y
I believe most package repositories have Apache and Lighttpd, though Perlbal could be a hit or miss. Luckily Perlbal is in CPAN so you can install it fairly easily:
perl -MCPAN -e ‘install perlbal’
Once you’ve installed the packages you will need to configure both of the webservers to listen on a port other than port 80. The reason for this is Perlbal will be listening on port 80 and deciding what to do with the requests. In this example I will have Apache listen on port 8080, and Lighty listening on port 8181:
Apache2 Configuration:
Listen 8080
It’s up to you to configure Apache completely, the point here is to simply have it listen on port 8080 instead of 80.
Lighttpd Configuration:
server.port = 8181
Once again it’s up to you to configure Lighty, we’re just having it listen on port 8181 instead of 80.
Now we have to setup Perlbal to handle the request. What we are going to do is have every request that is going to the subdomain “static.domain.com” be served via Lighttpd, and everything else via Apache. A quick note before we begin - Perlbal is capable of serving files, so Lighttpd is not required. I’m simply using it to illustrate how one would go about doing it.
Perlbal configuration:
load Vhosts
CREATE SERVICE select SET listen = 127.0.0.1:80 SET roles = selector SET plugins = vhosts
VHOST *.domain.com = apache_server VHOST static.domain.com = lightttpd_server ENABLE select
CREATE POOL apache_server POOL apache_server ADD 127.0.0.1:8080 ENABLE apache_server
CREATE POOL lighttpd_server POOL lighttpd_server ADD 127.0.0.1:8181 ENABLE lighttpd_server
In a nutshell this configuration does three important things - it tells Perlbal to listen on port 80 so it can accept the normal HTTP requests, it then says any request going to “*.domain.com” will be sent to our Apache instance, and anything going to “static.domain.com” go to our Lighty instance. For more information on Perlbals configuration please check out their documentation & mailing list.
Now we simply need to start up our web servers and Perlbal accordingly:
/etc/init.d/apache2 start /etc/init.d/lighttpd start /etc/init.d/perlbal start
Last but not least we need to make sure everything works. Simply visit “domain.com” and make sure it loads accordingly, then visit “static.domain.com” and verify that loads. If both load fine it should be working - if you want to make sure you can take a peak in the logs to see incoming requests (/var/log/apache2/* , /var/log/lighttpd/*).
In this setup we’re simply going to split the requests across two VPS servers. One of the VPS instances will house the Perlbal instance as well as the web server, and the other VPS will simpy house another web server.
I’m going to assume you already have a web server of your choice setup on both VPS1 and VPS2. All you will need to download is Perlbal:
On Ubuntu:
aptitude install perlbal -y
or installing Perlbal via CPAN:
perl -MCPAN -e ‘install perlbal’
The only configuration required on the web server is simply changing what port it is listening on. Keep in mind you only have to do this if Perlbal is sharing the same system as a web server - if you’ve decided to give Perlbal its own environment this is not necessary. In these examples I’ll assume you changed the web servers to listen on port 8080.
Now we have to setup Perlbal to listen on port 80 and direct the requests among VPS1 and VPS2
Perlbal Configuration:
CREATE POOL web_servers POOL web_servers ADD 1.2.3.4:8080 POOL web_servers ADD 1.2.3.5:8080
CREATE SERVICE balancer SET listen = 0.0.0.0:80 SET role = reverse_proxy SET pool = web_servers SET persist_client = on SET persist_backend = on SET verify_backend = on SET balance_method = random ENABLE balancer
Though the configuration is fairly self explanatory - we’re creating a “pool” and adding our web servers to it. We then create a “service” for Perlbal saying we want it to listen on port 80 and act as a reverse proxy. We also tell it to use the pool we just created. The other options you don’t need to worry about too much - though the “balance_method” option you have two choices: random and round-robin. The differences between them should be fairly obvious - random will choose a random server out of the pool while round-robin will go through each of the servers in the pool in a more orderly manner.
Now all you have to do is start perlbal:
/etc/init.d/perlbal start
Simply visit your domain and see if the page loads - you can refresh a few times to make sure that it’s hitting both servers and both servers are responding accordingly. Once again you may want to check your web servers logs to verify it’s getting the requests.
This post is meant to be only a primer and not used in a production environment. I understand there are some principles I either skimmed over or completely omitted. This post was done solely off of memory and past experiences - so please be careful if you try to use a setup similar to the ones outlined in this post. If you come across any errors please let me know so I can fix them.