[BZ #15215] This unifies various pthread_once architecture-specific implementations which were using the same algorithm with slightly different implementations. It also adds missing memory barriers that are required for correctness.