MINOR: thread: provide an alternative to pthread's rwlock

Since version 1.1.0, OpenSSL's libcrypto ignores the provided locking
mechanism and uses pthread's rwlocks instead. The problem is that for
some code paths (e.g. async engines) this results in a huge amount of
syscalls on systems facing a bit of contention, to the point where more
than 80% of the CPU can be spent in the system dealing with spinlocks
just for futex_wake().

This patch provides an alternative by redefining the relevant pthread
rwlocks from the low-overhead version of the progressive rw locks. This
way there will be no more syscalls in case of contention, and CPU will
be burnt in userland. Doing this saves massive amounts of CPU, where
the locks only take 12-15% vs 80% before, which allows SSL to work much
faster on large thread counts (e.g. 24 or more).

The tryrdlock and trywrlock variants have been implemented using a CAS
since their goal is only to succeed on no contention and never to wait.
The pthread_rwlock API is complete except that the timed versions of
the rdlock and wrlock do not wait and simply fall back to trylock
versions.

Since the gains have only been observed with async engines for now,
this option remains disabled by default. It can be enabled at build
time using USE_PTHREAD_EMULATION=1.
2 files changed