[libcxx] Fix a data race in call_once

call_once is using relaxed atomic load to perform double-checked locking, which contains a data race. The fast-path load has to be an acquire atomic load.

Differential Revision: https://reviews.llvm.org/D24028

llvm-svn: 280621
Cr-Mirrored-From: sso://chromium.googlesource.com/_direct/external/github.com/llvm/llvm-project
Cr-Mirrored-Commit: 224264ade0674f8ce120432614abfd880323f105
diff --git a/src/mutex.cpp b/src/mutex.cpp
index 9f808ca..7226abc 100644
--- a/src/mutex.cpp
+++ b/src/mutex.cpp
@@ -199,9 +199,6 @@
 static __libcpp_condvar_t cv = _LIBCPP_CONDVAR_INITIALIZER;
 #endif
 
-/// NOTE: Changes to flag are done via relaxed atomic stores
-///       even though the accesses are protected by a mutex because threads
-///       just entering 'call_once` concurrently read from flag.
 void
 __call_once(volatile unsigned long& flag, void* arg, void(*func)(void*))
 {
@@ -238,7 +235,7 @@
             __libcpp_mutex_unlock(&mut);
             func(arg);
             __libcpp_mutex_lock(&mut);
-            __libcpp_relaxed_store(&flag, ~0ul);
+            __libcpp_atomic_store(&flag, ~0ul, _AO_Release);
             __libcpp_mutex_unlock(&mut);
             __libcpp_condvar_broadcast(&cv);
 #ifndef _LIBCPP_NO_EXCEPTIONS