MEDIUM: cpu-map: replace the process number with the thread group number

The principle remains the same, but instead of having a single process
and ignoring extra ones, now we set the affinity masks for the respective
threads of all groups.

The doc was updated with a few extra examples.
diff --git a/doc/configuration.txt b/doc/configuration.txt
index 5988438..8ba4ce8 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -1210,58 +1210,57 @@
 
   See also: grace, hard-stop-after, idle-close-on-response
 
-cpu-map [auto:]<process-set>[/<thread-set>] <cpu-set>...
-  On some operating systems, it is possible to bind a process or a thread to a
-  specific CPU set. This means that the process or the thread will never run on
-  other CPUs. The "cpu-map" directive specifies CPU sets for process or thread
-  sets. The first argument is a process set, eventually followed by a thread
-  set. These sets have the format
+cpu-map [auto:]<thread-group>[/<thread-set>] <cpu-set>...
+  On some operating systems, it is possible to bind a thread group or a thread
+  to a specific CPU set. This means that the designated threads will never run
+  on other CPUs. The "cpu-map" directive specifies CPU sets for individual
+  threads or thread groups. The first argument is a thread group range,
+  optionally followed by a thread set. These ranges have the following format:
 
       all | odd | even | number[-[number]]
 
   <number> must be a number between 1 and 32 or 64, depending on the machine's
-  word size. Any process IDs above 1 and any thread IDs above nbthread are
-  ignored. It is possible to specify a range with two such number delimited by
-  a dash ('-'). It also is possible to specify all thraeds at once using
-  "all", only odd numbers using "odd" or even numbers using "even", just like
-  with the bind "thread" directive. The second and forthcoming arguments are
-  CPU sets. Each CPU set is either a unique number starting at 0 for the first
-  CPU or a range with two such numbers delimited by a dash ('-'). Outside of
-  Linux and BSDs, there may be a limitation on the maximum CPU index to either
-  31 or 63. Multiple CPU numbers or ranges may be specified, and the processes
-  or threads will be allowed to bind to all of them. Obviously, multiple
-  "cpu-map" directives may be specified. Each "cpu-map" directive will replace
-  the previous ones when they overlap. A thread will be bound on the
-  intersection of its mapping and the one of the process on which it is
-  attached. If the intersection is null, no specific binding will be set for
-  the thread.
+  word size. Any group IDs above 'thread-groups' and any thread IDs above the
+  machine's word size are ignored. All thread numbers are relative to the group
+  they belong to. It is possible to specify a range with two such number
+  delimited by a dash ('-'). It also is possible to specify all threads at once
+  using "all", only odd numbers using "odd" or even numbers using "even", just
+  like with the "thread" bind directive. The second and forthcoming arguments
+  are CPU sets. Each CPU set is either a unique number starting at 0 for the
+  first CPU or a range with two such numbers delimited by a dash ('-'). Outside
+  of Linux and BSDs, there may be a limitation on the maximum CPU index to
+  either 31 or 63. Multiple CPU numbers or ranges may be specified, and the
+  processes or threads will be allowed to bind to all of them. Obviously,
+  multiple "cpu-map" directives may be specified. Each "cpu-map" directive will
+  replace the previous ones when they overlap.
 
   Ranges can be partially defined. The higher bound can be omitted. In such
   case, it is replaced by the corresponding maximum value, 32 or 64 depending
   on the machine's word size.
 
-  The prefix "auto:" can be added before the process set to let HAProxy
-  automatically bind a process or a thread to a CPU by incrementing threads and
+  The prefix "auto:" can be added before the thread set to let HAProxy
+  automatically bind a set of threads to a CPU by incrementing threads and
   CPU sets. To be valid, both sets must have the same size. No matter the
   declaration order of the CPU sets, it will be bound from the lowest to the
-  highest bound. Having both a process and a thread range with the "auto:"
+  highest bound. Having both a group and a thread range with the "auto:"
   prefix is not supported. Only one range is supported, the other one must be
   a fixed number.
 
-  Note that process ranges are supported for historical reasons. Nowadays, a
-  lone number designates a process and must be 1, and specifying a thread range
-  or number requires to prepend "1/" in front of it. Finally, "1" is strictly
-  equivalent to "1/all" and designates all threads on the process.
+  Note that group ranges are supported for historical reasons. Nowadays, a lone
+  number designates a thread group and must be 1 if thread-groups are not used,
+  and specifying a thread range or number requires to prepend "1/" in front of
+  it if thread groups are not used. Finally, "1" is strictly equivalent to
+  "1/all" and designates all threads in the group.
 
   Examples:
-      cpu-map 1/all 0-3 # bind all threads of the first process on the
+      cpu-map 1/all 0-3 # bind all threads of the first group on the
                         # first 4 CPUs
 
       cpu-map 1/1- 0-   # will be replaced by "cpu-map 1/1-64 0-63"
                         # or "cpu-map 1/1-32 0-31" depending on the machine's
                         # word size.
 
-      # all these lines bind the thread 1 to the cpu 0, the thread 2 to cpu 1
+      # all these lines bind thread 1 to the cpu 0, the thread 2 to cpu 1
       # and so on.
       cpu-map auto:1/1-4   0-3
       cpu-map auto:1/1-4   0-1 2-3
@@ -1276,6 +1275,21 @@
       cpu-map auto:1/1-4   0    # invalid
       cpu-map auto:1/1     0-3  # invalid
 
+      # map 40 threads of those 4 groups to individual CPUs
+      cpu-map auto:1/1-10   0-9
+      cpu-map auto:2/1-10   10-19
+      cpu-map auto:3/1-10   20-29
+      cpu-map auto:4/1-10   30-39
+
+      # Map 80 threads to one physical socket and 80 others to another socket
+      # without forcing assignment. These are split into 4 groups since no
+      # group may have more than 64 threads.
+      cpu-map 1/1-40   0-39 80-119    # node0, siblings 0 & 1
+      cpu-map 2/1-40   0-39 80-119
+      cpu-map 3/1-40   40-79 120-159  # node1, siblings 0 & 1
+      cpu-map 4/1-40   40-79 120-159
+
+
 crt-base <dir>
   Assigns a default directory to fetch SSL certificates from when a relative
   path is used with "crtfile" or "crt" directives. Absolute locations specified