MINOR: tools: improve the popcount() operation

We'll call popcount() more often so better use a parallel method
than an iterative one. One optimal design is proposed at the site
below. It requires a fast multiplication though, but even without
it will still be faster than the iterative one, and all relevant
64 bit platforms do have a multiply unit.

     https://graphics.stanford.edu/~seander/bithacks.html
1 file changed