[libcxx] Call __count_bool_true for bitset count

This patch aims to help clang with better information so it can inline
__bit_reference count function usage for both std::biset. Current clang
inliner can not infer that the passed typed will be used only to select
the optimized variant, it evaluates the type argument and type check as
a load plus compare (although later optimization phases correctly
optimized this out).

It is mainly to help llvm inliner to generate better code for std::bitset
count for aarch64. It helps on both runtime and code size, since if inline
decides that _VSTD::count should not be inlined the vectorization will
create both aligned and unaligned variants (which add both code size and
runtime costs)

llvm-svn: 350936
Cr-Mirrored-From: sso://chromium.googlesource.com/_direct/external/github.com/llvm/llvm-project
Cr-Mirrored-Commit: 63ea9585210fb7aa24a78576297e84c620cc3c74
diff --git a/include/bitset b/include/bitset
index 6e28596..98947e0 100644
--- a/include/bitset
+++ b/include/bitset
@@ -991,7 +991,7 @@
 size_t
 bitset<_Size>::count() const _NOEXCEPT
 {
-    return static_cast<size_t>(_VSTD::count(base::__make_iter(0), base::__make_iter(_Size), true));
+    return static_cast<size_t>(__count_bool_true(base::__make_iter(0), _Size));
 }
 
 template <size_t _Size>