glibc/nptl
Adhemerval Zanella 0edbf12301 nptl: Invert the mmap/mprotect logic on allocated stacks (BZ#18988)
Current allocate_stack logic for create stacks is to first mmap all
the required memory with the desirable memory and then mprotect the
guard area with PROT_NONE if required.  Although it works as expected,
it pessimizes the allocation because it requires the kernel to actually
increase commit charge (it counts against the available physical/swap
memory available for the system).

The only issue is to actually check this change since side-effects are
really Linux specific and to actually account them it would require a
kernel specific tests to parse the system wide information.  On the kernel
I checked /proc/self/statm does not show any meaningful difference for
vmm and/or rss before and after thread creation.  I could only see
really meaningful information checking on system wide /proc/meminfo
between thread creation: MemFree, MemAvailable, and Committed_AS shows
large difference without the patch.  I think trying to use these
kind of information on a testcase is fragile.

The BZ#18988 reports shows that the commit pages are easily seen with
mlockall (MCL_FUTURE) (with lock all pages that become mapped in the
process) however a more straighfoward testcase shows that pthread_create
could be faster using this patch:

--
static const int inner_count = 256;
static const int outer_count = 128;

static
void *thread1(void *arg)
{
  return NULL;
}

static
void *sleeper(void *arg)
{
  pthread_t ts[inner_count];
  for (int i = 0; i < inner_count; i++)
    pthread_create (&ts[i], &a, thread1, NULL);
  for (int i = 0; i < inner_count; i++)
    pthread_join (ts[i], NULL);

  return NULL;
}

int main(void)
{
  pthread_attr_init(&a);
  pthread_attr_setguardsize(&a, 1<<20);
  pthread_attr_setstacksize(&a, 1134592);

  pthread_t ts[outer_count];
  for (int i = 0; i < outer_count; i++)
    pthread_create(&ts[i], &a, sleeper, NULL);
  for (int i = 0; i < outer_count; i++)
    pthread_join(ts[i], NULL);
    assert(r == 0);
  }
  return 0;
}

--

On x86_64 (4.4.0-45-generic, gcc 5.4.0) running the small benchtests
I see:

$ time ./test

real	0m3.647s
user	0m0.080s
sys	0m11.836s

While with the patch I see:

$ time ./test

real	0m0.696s
user	0m0.040s
sys	0m1.152s

So I added a pthread_create benchtest (thread_create) which check
the thread creation latency.  As for the simple benchtests, I saw
improvements in thread creation on all architectures I tested the
change.

Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
arm-linux-gnueabihf, powerpc64le-linux-gnu, sparc64-linux-gnu,
and sparcv9-linux-gnu.

	[BZ #18988]
	* benchtests/thread_create-inputs: New file.
	* benchtests/thread_create-source.c: Likewise.
	* support/xpthread_attr_setguardsize.c: Likewise.
	* support/Makefile (libsupport-routines): Add
	xpthread_attr_setguardsize object.
	* support/xthread.h: Add xpthread_attr_setguardsize prototype.
	* benchtests/Makefile (bench-pthread): Add thread_create.
	* nptl/allocatestack.c (allocate_stack): Call mmap with PROT_NONE and
	then mprotect the required area.
2017-06-14 17:22:35 -03:00
..
alloca_cutoff.c
allocatestack.c nptl: Invert the mmap/mprotect logic on allocated stacks (BZ#18988) 2017-06-14 17:22:35 -03:00
Banner
cancellation.c
ChangeLog.old
cleanup_compat.c
cleanup_defer_compat.c
cleanup_defer.c
cleanup_routine.c
cleanup.c
cond-perf.c
createthread.c Bug 20116: Fix use after free in pthread_create() 2017-01-28 19:21:44 -05:00
default-sched.h
descr.h Remove __need_list_t and __need_res_state. 2017-05-20 19:01:46 -04:00
DESIGN-systemtap-probes.txt
eintr.c
elision-conf.h
errno-loc.c
events.c
forward.c
herrno.c
libc_multiple_threads.c
libc_pthread_init.c
libc-cancellation.c
libc-cleanup.c
libc-lowlevellock.c
lll_timedlock_wait.c
lll_timedwait_tid.c
lowlevellock.c
Makefile Move tst-mutex*8* to tests-internal 2017-05-25 14:53:40 -03:00
nptl_lock_constants.pysym New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
nptl-init.c Narrowing the visibility of libc-internal.h even further. 2017-03-01 20:33:46 -05:00
nptl-printers.py Fix mutex pretty printer test and pretty printer output. 2017-01-20 14:56:39 +01:00
old_pthread_atfork.c
old_pthread_cond_broadcast.c
old_pthread_cond_destroy.c
old_pthread_cond_init.c
old_pthread_cond_signal.c
old_pthread_cond_timedwait.c
old_pthread_cond_wait.c
perf.c
pt-allocrtsig.c
pt-cleanup.c
pt-crti.S
pt-fork.c
pt-interp.c
pt-longjmp.c
pt-raise.c
pt-system.c
pt-vfork.c
pthread_atfork.c
pthread_attr_destroy.c
pthread_attr_getaffinity.c
pthread_attr_getdetachstate.c
pthread_attr_getguardsize.c
pthread_attr_getinheritsched.c
pthread_attr_getschedparam.c
pthread_attr_getschedpolicy.c
pthread_attr_getscope.c
pthread_attr_getstack.c
pthread_attr_getstackaddr.c
pthread_attr_getstacksize.c
pthread_attr_init.c
pthread_attr_setaffinity.c
pthread_attr_setdetachstate.c
pthread_attr_setguardsize.c
pthread_attr_setinheritsched.c
pthread_attr_setschedparam.c
pthread_attr_setschedpolicy.c
pthread_attr_setscope.c
pthread_attr_setstack.c
pthread_attr_setstackaddr.c
pthread_attr_setstacksize.c
pthread_barrier_destroy.c
pthread_barrier_init.c
pthread_barrier_wait.c
pthread_barrierattr_destroy.c
pthread_barrierattr_getpshared.c
pthread_barrierattr_init.c
pthread_barrierattr_setpshared.c
pthread_cancel.c
pthread_clock_gettime.c Narrowing the visibility of libc-internal.h even further. 2017-03-01 20:33:46 -05:00
pthread_clock_settime.c Narrowing the visibility of libc-internal.h even further. 2017-03-01 20:33:46 -05:00
pthread_cond_broadcast.c
pthread_cond_common.c Narrowing the visibility of libc-internal.h even further. 2017-03-01 20:33:46 -05:00
pthread_cond_destroy.c
pthread_cond_init.c
pthread_cond_signal.c
pthread_cond_wait.c
pthread_condattr_destroy.c
pthread_condattr_getclock.c
pthread_condattr_getpshared.c
pthread_condattr_init.c
pthread_condattr_setclock.c
pthread_condattr_setpshared.c
pthread_create.c Bug 20116: Clarify behaviour of PD->lock. 2017-05-03 15:24:43 -04:00
pthread_detach.c
pthread_equal.c
pthread_exit.c
pthread_getaffinity.c
pthread_getattr_default_np.c
pthread_getattr_np.c
pthread_getconcurrency.c
pthread_getcpuclockid.c
pthread_getname.c
pthread_getschedparam.c Bug 20116: Fix use after free in pthread_create() 2017-01-28 19:21:44 -05:00
pthread_getspecific.c
pthread_join.c
pthread_key_create.c
pthread_key_delete.c
pthread_kill_other_threads.c
pthread_kill.c
pthread_mutex_cond_lock.c robust mutexes: Fix broken x86 assembly by removing it 2017-01-13 17:16:07 +01:00
pthread_mutex_consistent.c
pthread_mutex_destroy.c
pthread_mutex_getprioceiling.c
pthread_mutex_init.c
pthread_mutex_lock.c Add compiler barriers around modifications of the robust mutex list. 2017-01-13 23:12:32 +01:00
pthread_mutex_setprioceiling.c
pthread_mutex_timedlock.c Add compiler barriers around modifications of the robust mutex list. 2017-01-13 23:12:32 +01:00
pthread_mutex_trylock.c
pthread_mutex_unlock.c Add compiler barriers around modifications of the robust mutex list. 2017-01-13 23:12:32 +01:00
pthread_mutexattr_destroy.c
pthread_mutexattr_getprioceiling.c
pthread_mutexattr_getprotocol.c
pthread_mutexattr_getpshared.c
pthread_mutexattr_getrobust.c
pthread_mutexattr_gettype.c
pthread_mutexattr_init.c
pthread_mutexattr_setprioceiling.c
pthread_mutexattr_setprotocol.c
pthread_mutexattr_setpshared.c
pthread_mutexattr_setrobust.c
pthread_mutexattr_settype.c
pthread_once.c
pthread_rwlock_common.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
pthread_rwlock_destroy.c
pthread_rwlock_init.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
pthread_rwlock_rdlock.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
pthread_rwlock_timedrdlock.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
pthread_rwlock_timedwrlock.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
pthread_rwlock_tryrdlock.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
pthread_rwlock_trywrlock.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
pthread_rwlock_unlock.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
pthread_rwlock_wrlock.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
pthread_rwlockattr_destroy.c
pthread_rwlockattr_getkind_np.c
pthread_rwlockattr_getpshared.c
pthread_rwlockattr_init.c
pthread_rwlockattr_setkind_np.c
pthread_rwlockattr_setpshared.c
pthread_self.c
pthread_setaffinity.c
pthread_setattr_default_np.c
pthread_setcancelstate.c
pthread_setcanceltype.c
pthread_setconcurrency.c
pthread_setegid.c
pthread_seteuid.c
pthread_setgid.c
pthread_setname.c
pthread_setregid.c
pthread_setresgid.c
pthread_setresuid.c
pthread_setreuid.c
pthread_setschedparam.c Bug 20116: Fix use after free in pthread_create() 2017-01-28 19:21:44 -05:00
pthread_setschedprio.c Bug 20116: Fix use after free in pthread_create() 2017-01-28 19:21:44 -05:00
pthread_setspecific.c
pthread_setuid.c
pthread_sigmask.c
pthread_sigqueue.c
pthread_spin_destroy.c
pthread_spin_init.c Optimize generic spinlock code and use C11 like atomic macros. 2017-06-06 09:41:56 +02:00
pthread_spin_lock.c Optimize generic spinlock code and use C11 like atomic macros. 2017-06-06 09:41:56 +02:00
pthread_spin_trylock.c Optimize generic spinlock code and use C11 like atomic macros. 2017-06-06 09:41:56 +02:00
pthread_spin_unlock.c Optimize generic spinlock code and use C11 like atomic macros. 2017-06-06 09:41:56 +02:00
pthread_testcancel.c
pthread_timedjoin.c
pthread_tryjoin.c
pthread_yield.c
pthread-errnos.sym
pthread-pi-defines.sym
pthread-pids.h
pthreadP.h Remove __ASSUME_REQUEUE_PI 2017-04-04 18:02:02 -03:00
register-atfork.c
res.c
sem_close.c
sem_destroy.c
sem_getvalue.c
sem_init.c
sem_open.c
sem_post.c
sem_timedwait.c
sem_unlink.c
sem_wait.c
sem_waitcommon.c
semaphoreP.h
shlib-versions
sigaction.c
smp.h
sockperf.c
stack-aliasing.h nptl: Remove COLORING_INCREMENT 2017-02-06 15:58:32 -02:00
test-cond-printers.c
test-cond-printers.py
test-condattr-printers.c
test-condattr-printers.py
test-mutex-printers.c
test-mutex-printers.py Fix mutex pretty printer test and pretty printer output. 2017-01-20 14:56:39 +01:00
test-mutexattr-printers.c
test-mutexattr-printers.py
test-rwlock-printers.c
test-rwlock-printers.py New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
test-rwlockattr-printers.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
test-rwlockattr-printers.py New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
TODO
TODO-kernel
TODO-testing
tpp.c Bug 20116: Fix use after free in pthread_create() 2017-01-28 19:21:44 -05:00
tst-_res1.c
tst-_res1mod1.c
tst-_res1mod2.c
tst-abstime.c
tst-align3.c
tst-align.c
tst-atfork1.c
tst-atfork2.c
tst-atfork2mod.c
tst-attr1.c
tst-attr2.c
tst-attr3.c
tst-backtrace1.c
tst-bad-schedattr.c
tst-barrier1.c
tst-barrier2.c
tst-barrier3.c
tst-barrier4.c
tst-barrier5.c
tst-basic1.c
tst-basic2.c
tst-basic3.c
tst-basic4.c
tst-basic5.c
tst-basic6.c
tst-basic7.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-cancel1.c
tst-cancel2.c
tst-cancel3.c
tst-cancel4_1.c nptl: Using libsupport for tst-cancel4* 2017-05-01 15:41:46 -03:00
tst-cancel4_2.c nptl: Using libsupport for tst-cancel4* 2017-05-01 15:41:46 -03:00
tst-cancel4-common.c nptl: Using libsupport for tst-cancel4* 2017-05-01 15:41:46 -03:00
tst-cancel4-common.h nptl: Using libsupport for tst-cancel4* 2017-05-01 15:41:46 -03:00
tst-cancel4.c posix: Implement preadv2 and pwritev2 2017-05-31 17:35:46 -03:00
tst-cancel5.c
tst-cancel6.c
tst-cancel7.c
tst-cancel8.c
tst-cancel9.c
tst-cancel10.c
tst-cancel11.c
tst-cancel12.c
tst-cancel13.c
tst-cancel14.c
tst-cancel15.c
tst-cancel16.c
tst-cancel17.c
tst-cancel18.c
tst-cancel19.c
tst-cancel20.c
tst-cancel21-static.c
tst-cancel21.c
tst-cancel22.c
tst-cancel23.c
tst-cancel24-static.cc
tst-cancel24.cc
tst-cancel25.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-cancel26.c
tst-cancel27.c
tst-cancel-self-cancelstate.c
tst-cancel-self-canceltype.c
tst-cancel-self-cleanup.c
tst-cancel-self-testcancel.c
tst-cancel-self.c
tst-cancel-wrappers.sh
tst-cancelx1.c
tst-cancelx2.c
tst-cancelx3.c
tst-cancelx4.c
tst-cancelx5.c
tst-cancelx6.c
tst-cancelx7.c
tst-cancelx8.c
tst-cancelx9.c
tst-cancelx10.c
tst-cancelx11.c
tst-cancelx12.c
tst-cancelx13.c
tst-cancelx14.c
tst-cancelx15.c
tst-cancelx16.c
tst-cancelx17.c
tst-cancelx18.c
tst-cancelx20.c
tst-cancelx21.c
tst-cleanup0.c
tst-cleanup0.expect
tst-cleanup1.c
tst-cleanup2.c
tst-cleanup3.c
tst-cleanup4.c
tst-cleanup4aux.c
tst-cleanupx0.c
tst-cleanupx0.expect
tst-cleanupx1.c
tst-cleanupx2.c
tst-cleanupx3.c
tst-cleanupx4.c
tst-cleanupx4aux.c
tst-clock1.c
tst-clock2.c
tst-cond1.c
tst-cond2.c
tst-cond3.c
tst-cond4.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-cond5.c
tst-cond6.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-cond7.c
tst-cond8-static.c
tst-cond8.c
tst-cond9.c
tst-cond10.c
tst-cond11.c
tst-cond12.c
tst-cond13.c
tst-cond14.c
tst-cond15.c
tst-cond16.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-cond17.c
tst-cond18.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-cond19.c
tst-cond20.c
tst-cond21.c
tst-cond22.c
tst-cond23.c
tst-cond24.c
tst-cond25.c
tst-cond-except.c
tst-context1.c
tst-create-detached.c Bug 20116: Fix use after free in pthread_create() 2017-01-28 19:21:44 -05:00
tst-default-attr.c
tst-detach1.c
tst-dlsym1.c
tst-eintr1.c
tst-eintr2.c
tst-eintr3.c
tst-eintr4.c
tst-eintr5.c
tst-exec1.c
tst-exec2.c
tst-exec3.c
tst-exec4.c
tst-exec5.c
tst-execstack-mod.c
tst-execstack.c
tst-exit1.c
tst-exit2.c
tst-exit3.c
tst-fini1.c
tst-fini1mod.c
tst-flock1.c
tst-flock2.c
tst-fork1.c Use test-driver in ntpl/tst-fork1.c 2017-05-10 09:38:18 +02:00
tst-fork2.c
tst-fork3.c Use test-driver in nptl/tst-fork3.c 2017-05-31 09:45:38 +02:00
tst-fork4.c
tst-getpid3.c
tst-initializers1-c11.c
tst-initializers1-c89.c
tst-initializers1-c99.c
tst-initializers1-gnu11.c
tst-initializers1-gnu89.c
tst-initializers1-gnu99.c
tst-initializers1.c
tst-join1.c
tst-join2.c
tst-join3.c
tst-join4.c
tst-join5.c
tst-join6.c
tst-join7.c
tst-join7mod.c Miscellaneous low-risk changes preparing for _ISOMAC testsuite. 2017-03-01 20:32:50 -05:00
tst-key1.c
tst-key2.c
tst-key3.c
tst-key4.c
tst-kill1.c
tst-kill2.c
tst-kill3.c
tst-kill4.c
tst-kill5.c
tst-kill6.c
tst-locale1.c
tst-locale2.c
tst-mutex1.c Split DIAG_* macros to new header libc-diag.h. 2017-02-25 09:59:46 -05:00
tst-mutex2.c
tst-mutex3.c
tst-mutex4.c
tst-mutex5.c
tst-mutex5a.c
tst-mutex6.c
tst-mutex7.c
tst-mutex7a.c
tst-mutex8-static.c
tst-mutex8.c
tst-mutex9.c
tst-mutex-errorcheck.c
tst-mutexpi1.c
tst-mutexpi2.c
tst-mutexpi3.c
tst-mutexpi4.c
tst-mutexpi5.c
tst-mutexpi5a.c
tst-mutexpi6.c
tst-mutexpi7.c
tst-mutexpi7a.c
tst-mutexpi8-static.c
tst-mutexpi8.c
tst-mutexpi9.c
tst-mutexpp1.c
tst-mutexpp6.c
tst-mutexpp10.c
tst-oddstacklimit.c
tst-once1.c
tst-once2.c
tst-once3.c
tst-once4.c
tst-once5.cc
tst-oncex3.c
tst-oncex4.c
tst-popen1.c
tst-pthread-attr-affinity.c
tst-pthread-getattr.c
tst-pthread-mutexattr.c
tst-raise1.c
tst-robust1.c
tst-robust2.c
tst-robust3.c
tst-robust4.c
tst-robust5.c
tst-robust6.c
tst-robust7.c
tst-robust8.c
tst-robust9.c
tst-robust10.c
tst-robust-fork.c nptl: Add tst-robust-fork 2017-01-27 06:53:20 +01:00
tst-robustpi1.c
tst-robustpi2.c
tst-robustpi3.c
tst-robustpi4.c
tst-robustpi5.c
tst-robustpi6.c
tst-robustpi7.c
tst-robustpi8.c
tst-robustpi9.c
tst-rwlock1.c
tst-rwlock2.c
tst-rwlock2a.c
tst-rwlock2b.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
tst-rwlock3.c
tst-rwlock4.c
tst-rwlock5.c
tst-rwlock6.c
tst-rwlock7.c
tst-rwlock8.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
tst-rwlock9.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
tst-rwlock10.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
tst-rwlock11.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
tst-rwlock12.c
tst-rwlock13.c
tst-rwlock14.c
tst-rwlock15.c
tst-rwlock16.c
tst-rwlock17.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
tst-rwlock18.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
tst-rwlock19.c New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
tst-sched1.c
tst-sem1.c
tst-sem2.c
tst-sem3.c
tst-sem4.c
tst-sem5.c
tst-sem6.c
tst-sem7.c
tst-sem8.c
tst-sem9.c
tst-sem10.c
tst-sem11-static.c
tst-sem11.c
tst-sem12-static.c
tst-sem12.c
tst-sem13.c
tst-sem14.c
tst-sem15.c
tst-sem16.c
tst-setuid1-static.c
tst-setuid1.c
tst-setuid2.c
tst-setuid3.c
tst-signal1.c
tst-signal2.c
tst-signal3.c
tst-signal4.c
tst-signal5.c
tst-signal6.c
tst-signal7.c
tst-spin1.c
tst-spin2.c
tst-spin3.c
tst-spin4.c
tst-stack1.c
tst-stack2.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-stack3.c
tst-stack4.c
tst-stack4mod.c
tst-stackguard1-static.c
tst-stackguard1.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-stdio1.c
tst-stdio2.c
tst-sysconf.c
tst-thread_local1.cc
tst-tls1.c
tst-tls2.c
tst-tls3-malloc.c
tst-tls3.c
tst-tls3mod.c
tst-tls4.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-tls4moda.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-tls4modb.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-tls5.c
tst-tls5.h Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-tls5mod.c
tst-tls5moda.c
tst-tls5modb.c
tst-tls5modc.c
tst-tls5modd.c
tst-tls5mode.c
tst-tls5modf.c
tst-tls6.sh
tst-tpp.h
tst-tsd1.c
tst-tsd2.c
tst-tsd3.c
tst-tsd4.c
tst-tsd5.c
tst-tsd6.c
tst-typesizes.c
tst-umask1.c
tst-unload.c
tst-vfork1.c
tst-vfork1x.c
tst-vfork2.c
tst-vfork2x.c
unregister-atfork.c
unwind.c
unwindbuf.sym
vars.c
version.c Update copyright dates not handled by scripts/update-copyrights. 2017-01-01 00:26:24 +00:00
Versions