MEDIUM: lua-thread: Add the lua-load-per-thread directive

The goal is to allow execution of one main lua state per thread.

This patch contains the main job. The lua init is done using these
steps:
 - "lua-load-per-thread" loads the lua code in the first thread
 - it creates the structs
 - it stores loaded files
 - the 1st step load is completed (execution of hlua_post_init)
   and now, we known the number of threads
 - we initilize lua states for all remaining threads
 - for each one, we load the lua file
 - for each one, we execute post-init

Once all is loaded, we control consistency of functions references.
The rules are:
 - a function reference cannot be in the shared lua state and in
   a per-thread lua state at the same time.
 - if a function reference is declared in a per-thread lua state, it
   must be declared in all per-thread lua states
diff --git a/doc/configuration.txt b/doc/configuration.txt
index bc2ad01..9ca2436 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -837,6 +837,7 @@
    - log-tag
    - log-send-hostname
    - lua-load
+   - lua-load-per-thread
    - lua-prepend-path
    - mworker-max-reloads
    - nbproc
@@ -1363,9 +1364,31 @@
   running on the same host. See also the per-proxy "log-tag" directive.
 
 lua-load <file>
-  This global directive loads and executes a Lua file. This directive can be
+  This global directive loads and executes a Lua file in the shared context
+  that is visible to all threads. Any variable set in such a context is visible
+  from any thread. This is the easiest and recommended way to load Lua programs
+  but it will not scale well if a lot of Lua calls are performed, as only one
+  thread may be running on the global state at a time. A program loaded this
+  way will always see 0 in the "core.thread" variable. This directive can be
   used multiple times.
 
+lua-load-per-thread <file>
+  This global directive loads and executes a Lua file into each started thread.
+  Any global variable has a thread-local visibility so that each thread could
+  see a different value. As such it is strongly recommended not to use global
+  variables in programs loaded this way. An independent copy is loaded and
+  initialized for each thread, everything is done sequentially and in the
+  thread's numeric order from 1 to nbthread. If some operations need to be
+  performed only once, the program should check the "core.thread" variable to
+  figure what thread is being initialized. Programs loaded this way will run
+  concurrently on all threads and will be highly scalable. This is the
+  recommended way to load simple functions that register sample-fetches,
+  converters, actions or services once it is certain the program doesn't depend
+  on global variables. For the sake of simplicity, the directive is available
+  even if only one thread is used and even if threads are disabled (in which
+  case it will be equivalent to lua-load). This directive can be used multiple
+  times.
+
 lua-prepend-path <string> [<type>]
   Prepends the given string followed by a semicolon to Lua's package.<type>
   variable.