[DOC] update architecture guide

Many useful updates to the architecture guide.
diff --git a/doc/architecture.txt b/doc/architecture.txt
index 8b04f99..7d80f1b 100644
--- a/doc/architecture.txt
+++ b/doc/architecture.txt
@@ -117,7 +117,7 @@
    below).
 
  - LB1 becomes a very sensible server. If LB1 dies, nothing works anymore.
-   => you can back it up using keepalived.
+   => you can back it up using keepalived (see below)
 
  - if the application needs to log the original client's IP, use the
    "forwardfor" option which will add an "X-Forwarded-For" header with the
@@ -134,6 +134,29 @@
         LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b " combined
         CustomLog /var/log/httpd/access_log combined
 
+Hints :
+-------
+Sometimes on the internet, you will find a few percent of the clients which
+disable cookies on their browser. Obviously they have troubles everywhere on
+the web, but you can still help them access your site by using the "source"
+balancing algorithm instead of the "roundrobin". It ensures that a given IP
+address always reaches the same server as long as the number of servers remains
+unchanged. Never use this behind a proxy or in a small network, because the
+distribution will be unfair. However, in large internal networks, and on the
+internet, it works quite well. Clients which have a dynamic address will not
+be affected as long as they accept the cookie, because the cookie always has
+precedence over load balancing :
+
+    listen webfarm 192.168.1.1:80
+       mode http
+       balance source
+       cookie SERVERID insert indirect
+       option httpchk HEAD /index.html HTTP/1.0
+       server webA 192.168.1.11:80 cookie A check
+       server webB 192.168.1.12:80 cookie B check
+       server webC 192.168.1.13:80 cookie C check
+       server webD 192.168.1.14:80 cookie D check
+       
 
 ==================================================================
 2. HTTP load-balancing with cookie prefixing and high availability
@@ -191,10 +214,35 @@
 use keep-alive (eg: Apache 1.3 in reverse-proxy mode), you can remove this
 option.
 
+
+Configuration for keepalived on LB1/LB2 :
+-----------------------------------------
+
+    vrrp_script chk_haproxy {           # Requires keepalived-1.1.13
+        script "killall -0 haproxy"     # cheaper than pidof
+        interval 2                      # check every 2 seconds
+	weight 2                        # add 2 points of prio if OK
+    }
+
+    vrrp_instance VI_1 {
+        interface eth0
+        state MASTER
+        virtual_router_id 51
+        priority 101                    # 101 on master, 100 on backup
+        virtual_ipaddress {
+            192.168.1.1
+        }
+        track_script {
+            chk_haproxy
+        }
+    }
+
 
 Description :
 -------------
- - LB1 is VRRP master (keepalived), LB2 is backup.
+ - LB1 is VRRP master (keepalived), LB2 is backup. Both monitor the haproxy
+   process, and lower their prio if it fails, leading to a failover to the
+   other node.
  - LB1 will receive clients requests on IP 192.168.1.1.
  - both load-balancers send their checks from their native IP.
  - if a request does not contain a cookie, it will be forwarded to a valid
@@ -240,6 +288,21 @@
   <-- HTTP/1.0 200 OK ---------------< |
                                     ( ... )
 
+Hints :
+-------
+Sometimes, there will be some powerful servers in the farm, and some smaller
+ones. In this situation, it may be desirable to tell haproxy to respect the
+difference in performance. Let's consider that WebA and WebB are two old
+P3-1.2 GHz while WebC and WebD are shiny new Opteron-2.6 GHz. If your
+application scales with CPU, you may assume a very rough 2.6/1.2 performance
+ratio between the servers. You can inform haproxy about this using the "weight"
+keyword, with values between 1 and 256. It will then spread the load the most
+smoothly possible respecting those ratios :
+
+       server webA 192.168.1.11:80 cookie A weight 12 check
+       server webB 192.168.1.12:80 cookie B weight 12 check
+       server webC 192.168.1.13:80 cookie C weight 26 check
+       server webD 192.168.1.14:80 cookie D weight 26 check
 
 
 ========================================================
@@ -392,6 +455,27 @@
            group 10
 
 
+Special handling of SSL :
+-------------------------
+Sometimes, you want to send health-checks to remote systems, even in TCP mode,
+in order to be able to failover to a backup server in case the first one is
+dead. Of course, you can simply enable TCP health-checks, but it sometimes
+happens that intermediate firewalls between the proxies and the remote servers
+acknowledge the TCP connection themselves, showing an always-up server. Since
+this is generally encountered on long-distance communications, which often
+involve SSL, an SSL health-check has been implemented to workaround this issue.
+It sends SSL Hello messages to the remote server, which in turns replies with
+SSL Hello messages. Setting it up is very easy :
+
+    listen tcp-syslog-proxy
+       bind :1514      # listen to TCP syslog traffic on this port (SSL)
+       mode tcp
+       balance roundrobin
+       option ssl-hello-chk
+       server syslog-prod-site 192.168.1.10 check
+       server syslog-back-site 192.168.2.10 check backup
+
+
 =========================================================
 3. Simple HTTP/HTTPS load-balancing with cookie insertion
 =========================================================
@@ -499,6 +583,73 @@
 
 
 ========================================
+3.1. Alternate solution using Stunnel
+========================================
+
+When only SSL is required and cache is not needed, stunnel is a cheaper
+solution than Apache+mod_ssl. By default, stunnel does not process HTTP and
+does not add any X-Forwarded-For header, but there is a patch on the official
+haproxy site to provide this feature to recent stunnel versions.
+
+This time, stunnel will only process HTTPS and not HTTP. This means that
+haproxy will get all HTTP traffic, so haproxy will have to add the
+X-Forwarded-For header for HTTP traffic, but not for HTTPS traffic since
+stunnel will already have done it. We will use the "except" keyword to tell
+haproxy that connections from local host already have a valid header.
+
+
+  192.168.1.1    192.168.1.11-192.168.1.14   192.168.1.2
+ -------+-----------+-----+-----+-----+--------+----
+        |           |     |     |     |       _|_db
+     +--+--+      +-+-+ +-+-+ +-+-+ +-+-+    (___)
+     | LB1 |      | A | | B | | C | | D |    (___)
+     +-----+      +---+ +---+ +---+ +---+    (___)
+     stunnel        4 cheap web servers
+     haproxy 
+
+
+Config on stunnel (LB1) :
+-------------------------
+
+    cert=/etc/stunnel/stunnel.pem
+    setuid=stunnel
+    setgid=proxy
+
+    socket=l:TCP_NODELAY=1
+    socket=r:TCP_NODELAY=1
+
+    [https]
+    accept=192.168.1.1:443
+    connect=192.168.1.1:80
+    xforwardedfor=yes
+
+
+Config on haproxy (LB1) :
+-------------------------
+       
+    listen 192.168.1.1:80
+       mode http
+       balance roundrobin
+       option forwardfor except 192.168.1.1
+       cookie SERVERID insert indirect nocache
+       option httpchk HEAD /index.html HTTP/1.0
+       server webA 192.168.1.11:80 cookie A check
+       server webB 192.168.1.12:80 cookie B check
+       server webC 192.168.1.13:80 cookie C check
+       server webD 192.168.1.14:80 cookie D check
+
+Description :
+-------------
+ - stunnel on LB1 will receive clients requests on port 443
+ - it forwards them to haproxy bound to port 80
+ - haproxy will receive HTTP client requests on port 80 and decrypted SSL
+   requests from Stunnel on the same port.
+ - stunnel will add the X-Forwarded-For header
+ - haproxy will add the X-Forwarded-For header for everyone except the local
+   address (stunnel).
+
+
+========================================
 4. Soft-stop for application maintenance
 ========================================
 
@@ -1124,3 +1275,165 @@
        server from7to1 10.1.1.1:80 source 10.1.2.7
        server from8to1 10.1.1.1:80 source 10.1.2.8
 
+
+=============================================
+7. Managing high loads on application servers
+=============================================
+
+One of the roles often expected from a load balancer is to mitigate the load on
+the servers during traffic peaks. More and more often, we see heavy frameworks
+used to deliver flexible and evolutive web designs, at the cost of high loads
+on the servers, or very low concurrency. Sometimes, response times are also
+rather high. People developing web sites relying on such frameworks very often
+look for a load balancer which is able to distribute the load in the most
+evenly fashion and which will be nice with the servers.
+
+There is a powerful feature in haproxy which achieves exactly this : request
+queueing associated with concurrent connections limit.
+
+Let's say you have an application server which supports at most 20 concurrent
+requests. You have 3 servers, so you can accept up to 60 concurrent HTTP
+connections, which often means 30 concurrent users in case of keep-alive (2
+persistent connections per user).
+
+Even if you disable keep-alive, if the server takes a long time to respond,
+you still have a high risk of multiple users clicking at the same time and
+having their requests unserved because of server saturation. To workaround
+the problem, you increase the concurrent connection limit on the servers,
+but their performance stalls under higher loads.
+
+The solution is to limit the number of connections between the clients and the
+servers. You set haproxy to limit the number of connections on a per-server
+basis, and you let all the users you want connect to it. It will then fill all
+the servers up to the configured connection limit, and will put the remaining
+connections in a queue, waiting for a connection to be released on a server.
+
+This ensures five essential principles :
+
+  - all clients can be served whatever their number without crashing the
+    servers, the only impact it that the response time can be delayed.
+
+  - the servers can be used at full throttle without the risk of stalling,
+    and fine tuning can lead to optimal performance.
+
+  - response times can be reduced by making the servers work below the
+    congestion point, effectively leading to shorter response times even
+    under moderate loads.
+
+  - no domino effect when a server goes down or starts up. Requests will be
+    queued more or less, always respecting servers limits.
+
+  - it's easy to achieve high performance even on memory-limited hardware.
+    Indeed, heavy frameworks often consume huge amounts of RAM and not always
+    all the CPU available. In case of wrong sizing, reducing the number of
+    concurrent connections will protect against memory shortages while still
+    ensuring optimal CPU usage.
+
+
+Example :
+---------
+
+Haproxy is installed in front of an application servers farm. It will limit
+the concurrent connections to 4 per server (one thread per CPU), thus ensuring
+very fast response times.
+
+
+  192.168.1.1   192.168.1.11-192.168.1.13   192.168.1.2
+ -------+-------------+-----+-----+------------+----
+        |             |     |     |           _|_db
+     +--+--+        +-+-+ +-+-+ +-+-+        (___)
+     | LB1 |        | A | | B | | C |        (___)
+     +-----+        +---+ +---+ +---+        (___)
+     haproxy       3 application servers
+                   with heavy frameworks
+
+
+Config on haproxy (LB1) :
+-------------------------
+       
+    listen appfarm 192.168.1.1:80
+       mode http
+       maxconn 10000
+       option httpclose
+       option forwardfor
+       balance roundrobin
+       cookie SERVERID insert indirect
+       option httpchk HEAD /index.html HTTP/1.0
+       server railsA 192.168.1.11:80 cookie A maxconn 4 check
+       server railsB 192.168.1.12:80 cookie B maxconn 4 check
+       server railsC 192.168.1.13:80 cookie C maxconn 4 check
+       contimeout 60000
+
+
+Description :
+-------------
+The proxy listens on IP 192.168.1.1, port 80, and expects HTTP requests. It
+can accept up to 10000 concurrent connections on this socket. It follows the
+roundrobin algorithm to assign servers to connections as long as servers are
+not saturated.
+
+It allows up to 4 concurrent connections per server, and will queue the
+requests above this value. The "contimeout" parameter is used to set the
+maximum time a connection may take to establish on a server, but here it
+is also used to set the maximum time a connection may stay unserved in the
+queue (1 minute here).
+
+If the servers can each process 4 requests in 10 ms on average, then at 3000
+connections, response times will be delayed by at most :
+
+   3000 / 3 servers / 4 conns * 10 ms = 2.5 seconds
+
+Which is not that dramatic considering the huge number of users for such a low
+number of servers.
+
+When connection queues fill up and application servers are starving, response
+times will grow and users might abort by clicking on the "Stop" button. It is
+very undesirable to send aborted requests to servers, because they will eat
+CPU cycles for nothing.
+
+An option has been added to handle this specific case : "option abortonclose".
+By specifying it, you tell haproxy that if an input channel is closed on the
+client side AND the request is still waiting in the queue, then it is highly
+likely that the user has stopped, so we remove the request from the queue
+before it will get served.
+
+
+Managing unfair response times
+------------------------------
+
+Sometimes, the application server will be very slow for some requests (eg:
+login page) and faster for other requests. This may cause excessive queueing
+of expectedly fast requests when all threads on the server are blocked on a
+request to the database. Then the only solution is to increase the number of
+concurrent connections, so that the server can handle a large average number
+of slow connections with threads left to handle faster connections.
+
+But as we have seen, increasing the number of connections on the servers can
+be detrimental to performance (eg: Apache processes fighting for the accept()
+lock). To improve this situation, the "minconn" parameter has been introduced.
+When it is set, the maximum connection concurrency on the server will be bound
+by this value, and the limit will increase with the number of clients waiting
+in queue, till the clients connected to haproxy reach the proxy's maxconn, in
+which case the connections per server will reach the server's maxconn. It means
+that during low-to-medium loads, the minconn will be applied, and during surges
+the maxconn will be applied. It ensures both optimal response times under
+normal loads, and availability under very high loads.
+
+Example :
+---------
+       
+    listen appfarm 192.168.1.1:80
+       mode http
+       maxconn 10000
+       option httpclose
+       option abortonclose
+       option forwardfor
+       balance roundrobin
+       # The servers will get 4 concurrent connections under low
+       # loads, and 12 when there will be 10000 clients.
+       server railsA 192.168.1.11:80 minconn 4 maxconn 12 check
+       server railsB 192.168.1.12:80 minconn 4 maxconn 12 check
+       server railsC 192.168.1.13:80 minconn 4 maxconn 12 check
+       contimeout 60000
+
+