Go to file
H.J. Lu fb0f7a6755 X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508]
There is transition penalty when SSE instructions are mixed with 256-bit
AVX or 512-bit AVX512 load instructions.  Since _dl_runtime_resolve_avx
and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM
registers, there is transition penalty when SSE instructions are used
with lazy binding on AVX and AVX512 processors.

To avoid SSE transition penalty, if only the lower 128 bits of the first
8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers
with the zero upper bits.

For AVX and AVX512 processors which support XGETBV with ECX == 1, we can
use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers
or the upper 256 bits of ZMM registers are zero.  We can restore only the
non-zero portion of vector registers with AVX/AVX512 load instructions
which will zero-extend upper bits of vector registers.

This patch adds _dl_runtime_resolve_sse_vex which saves and restores
XMM registers with 128-bit AVX store/load instructions.  It is used to
preserve YMM/ZMM registers when only the lower 128 bits are non-zero.
_dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added
and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so
that we store and load only the non-zero portion of vector registers.
This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and
_dl_runtime_profile_avx512 when only the lower 128 bits of vector
registers are used.

_dl_runtime_resolve_avx_slow is added and used for AVX processors which
don't support XGETBV with ECX == 1.  Since there is no SSE transition
penalty on AVX512 processors which don't support XGETBV with ECX == 1,
_dl_runtime_resolve_avx512_slow isn't provided.

	[BZ #20495]
	[BZ #20508]
	* sysdeps/x86/cpu-features.c (init_cpu_features): For Intel
	processors, set Use_dl_runtime_resolve_slow and set
	Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1.
	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
	New.
	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
	(index_arch_Use_dl_runtime_resolve_opt): Likewise.
	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use
	_dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt
	if Use_dl_runtime_resolve_opt is set.  Use
	_dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set.
	* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
	(_dl_runtime_resolve_opt): New.  Defined for AVX and AVX512.
	(_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow):
	New.
	(_dl_runtime_resolve_opt): Likewise.
	(_dl_runtime_profile): Define only if _dl_runtime_profile is
	defined.
2016-09-06 08:51:07 -07:00
argp argp: Do not override GCC keywords with macros [BZ #16907] 2016-08-18 11:15:42 +02:00
assert
benchtests Clear destination buffer updated by the previous run 2016-05-18 05:51:59 -07:00
bits Support __STDC_WANT_IEC_60559_FUNCS_EXT__ feature test macro. 2016-08-03 22:21:37 +00:00
catgets
conform conform tests: call perl with '-I.' 2016-09-05 22:53:22 +02:00
crypt
csu elf: Avoid using memalign for TLS allocations [BZ #17730] 2016-08-03 16:15:38 +02:00
ctype
debug Add tests for fortification of bcopy and bzero. 2016-08-19 09:04:35 -04:00
dirent
dlfcn tst-rec-dlopen: Fix build fail due to missing inclusion of string.h 2016-06-06 11:03:04 +02:00
elf Set NODELETE flag after checking for NULL pointer 2016-09-03 20:25:59 +02:00
gmon
gnulib
grp Don't install the internal header grp-merge.h 2016-07-18 09:33:21 -03:00
gshadow
hesiod hesiod: Avoid heap overflow in get_txt_records [BZ #20031] 2016-05-02 16:04:32 +02:00
hurd
iconv Fix UTF-16 surrogate handling. [BZ #19727] 2016-05-25 17:18:06 +02:00
iconvdata Fix UTF-16 surrogate handling. [BZ #19727] 2016-05-25 17:18:06 +02:00
include malloc: Simplify static malloc interposition [BZ #20432] 2016-08-26 23:20:41 +02:00
inet Fix macro API for __USE_KERNEL_IPV6_DEFS. 2016-06-02 23:52:06 -04:00
intl
io 2016-06-11 Paul Pluzhnikov <ppluzhnikov@google.com> 2016-06-11 14:50:16 -07:00
libidn
libio Support __STDC_WANT_LIB_EXT2__ feature test macro. 2016-08-02 17:40:35 +00:00
locale S390: Fix relocation of _nl_current_LC_CATETORY_used in static build. [BZ #19860] 2016-06-28 12:28:53 +02:00
localedata localedata: lt_LT: use hyphens in d_fmt [BZ #20497] 2016-08-24 16:07:02 -04:00
login 2016-06-11 Paul Pluzhnikov <ppluzhnikov@google.com> 2016-06-11 14:50:16 -07:00
mach mach: Add mach_print sycsall declaration 2016-06-09 01:43:49 +02:00
malloc malloc: Automated part of conversion to __libc_lock 2016-09-06 12:49:54 +02:00
manual Add fetestexceptflag. 2016-08-29 11:47:21 +00:00
math Remove unneeded stubs for k_rem_pio2l. 2016-09-01 09:31:06 -05:00
mathvec
misc Reduce memory size of tsearch red-black tree. 2016-08-25 23:48:05 +02:00
nis Return proper status from _nss_nis_initgroups_dyn (bug 20262) 2016-06-30 13:55:36 +02:00
nptl malloc: Simplify static malloc interposition [BZ #20432] 2016-08-26 23:20:41 +02:00
nptl_db Update and install proc_service.h [BZ #20311] 2016-08-03 16:26:32 +02:00
nscd Fix incorrect double-checked locking related to _res_hconf.initialized. 2016-08-18 20:53:37 +02:00
nss Fix incorrect double-checked locking related to _res_hconf.initialized. 2016-08-18 20:53:37 +02:00
po Update PO files. 2016-08-04 11:41:27 -04:00
posix Deprecate inclusion of <sys/sysmacros.h> by <sys/types.h> 2016-08-03 15:28:49 -04:00
pwd
resolv Fix incorrect double-checked locking related to _res_hconf.initialized. 2016-08-18 20:53:37 +02:00
resource
rt Fix rt/tst-aio64.c as well, and mention login/tst-utmp.c in ChangeLog 2016-06-11 14:59:27 -07:00
scripts mach: Add more allowed external headers 2016-08-21 03:24:55 +02:00
setjmp
shadow
signal
socket
soft-fp Fix soft-fp extended.h unpacking (GCC bug 77265). 2016-08-16 17:11:46 +00:00
stdio-common vfscanf: Avoid multiple reads of multi-byte character width 2016-09-02 15:59:34 +02:00
stdlib Add tst-wcstod-round 2016-08-19 11:17:07 -05:00
streams
string string: More tests for strcmp, strcasecmp, strncmp, strncasecmp 2016-08-26 14:28:46 +02:00
sunrpc CVE-2016-4429: sunrpc: Do not use alloca in clntudp_call [BZ #20112] 2016-05-23 20:18:34 +02:00
sysdeps X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508] 2016-09-06 08:51:07 -07:00
sysvipc
termios Declare tcgetsid for XPG4 (bug 20055). 2016-05-11 18:05:37 +00:00
time
timezone
wcsmbs Add tst-wcstod-round 2016-08-19 11:17:07 -05:00
wctype
.gitattributes
.gitignore
abi-tags
aclocal.m4
BUGS
ChangeLog X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508] 2016-09-06 08:51:07 -07:00
ChangeLog.1
ChangeLog.2
ChangeLog.3
ChangeLog.4
ChangeLog.5
ChangeLog.6
ChangeLog.7
ChangeLog.8 ChangeLog: change Winblowz to Windows 2016-08-10 00:49:28 +08:00
ChangeLog.9
ChangeLog.10
ChangeLog.11 ChangeLog: change Winblowz to Windows 2016-08-10 00:49:28 +08:00
ChangeLog.12
ChangeLog.13
ChangeLog.14
ChangeLog.15
ChangeLog.16
ChangeLog.17
ChangeLog.old-ports
ChangeLog.old-ports-aarch64
ChangeLog.old-ports-aix
ChangeLog.old-ports-alpha
ChangeLog.old-ports-am33
ChangeLog.old-ports-arm
ChangeLog.old-ports-cris
ChangeLog.old-ports-hppa
ChangeLog.old-ports-ia64
ChangeLog.old-ports-linux-generic
ChangeLog.old-ports-m68k
ChangeLog.old-ports-microblaze
ChangeLog.old-ports-mips
ChangeLog.old-ports-powerpc
ChangeLog.old-ports-tile
config.h.in S390: Do not set FE_INEXACT with feraiseexcept (FE_OWERFLOW|FE_UNDERFLOW). 2016-08-31 14:54:55 +02:00
config.make.in
configure x86-64: Properly align stack in _dl_tlsdesc_dynamic [BZ #20309] 2016-07-12 06:30:08 -07:00
configure.ac x86-64: Properly align stack in _dl_tlsdesc_dynamic [BZ #20309] 2016-07-12 06:30:08 -07:00
CONFORMANCE
COPYING
COPYING.LIB
cppflags-iterator.mk
extra-lib.mk
extra-modules.mk
gen-locales.mk
INSTALL 2016-06-05 Paul Pluzhnikov <ppluzhnikov@google.com> 2016-06-05 08:41:13 -07:00
libc-abis
LICENSES
Makeconfig Revert "Add pretty printers for the NPTL lock types" 2016-07-11 20:32:12 +05:30
Makefile Support __STDC_WANT_LIB_EXT2__ feature test macro. 2016-08-02 17:40:35 +00:00
Makefile.in New make target to only build benchmark binaries 2016-04-20 10:23:28 +05:30
Makerules Revert "Add pretty printers for the NPTL lock types" 2016-07-11 20:32:12 +05:30
NAMESPACE
NEWS Base <sys/quota.h> on Linux kernel headers [BZ #20525] 2016-09-01 15:53:13 +02:00
o-iterator.mk
PROJECTS
README
Rules Revert "Add pretty printers for the NPTL lock types" 2016-07-11 20:32:12 +05:30
shlib-versions
test-skeleton.c Fix test-skeleton C99 designed initialization 2016-08-26 17:33:47 -03:00
version.h Open development for 2.25. 2016-08-01 23:00:21 -04:00
WUR-REPORT

This directory contains the sources of the GNU C Library.
See the file "version.h" for what release version you have.

The GNU C Library is the standard system C library for all GNU systems,
and is an important part of what makes up a GNU system.  It provides the
system API for all programs written in C and C-compatible languages such
as C++ and Objective C; the runtime facilities of other programming
languages use the C library to access the underlying operating system.

In GNU/Linux systems, the C library works with the Linux kernel to
implement the operating system behavior seen by user applications.
In GNU/Hurd systems, it works with a microkernel and Hurd servers.

The GNU C Library implements much of the POSIX.1 functionality in the
GNU/Hurd system, using configurations i[4567]86-*-gnu.  The current
GNU/Hurd support requires out-of-tree patches that will eventually be
incorporated into an official GNU C Library release.

When working with Linux kernels, this version of the GNU C Library
requires Linux kernel version 3.2 or later on all architectures except
i[4567]86 and x86_64, where Linux kernel version 2.6.32 or later
suffices.

Also note that the shared version of the libgcc_s library must be
installed for the pthread library to work correctly.

The GNU C Library supports these configurations for using Linux kernels:

	aarch64*-*-linux-gnu
	alpha*-*-linux-gnu
	arm-*-linux-gnueabi
	hppa-*-linux-gnu	Not currently functional without patches.
	i[4567]86-*-linux-gnu
	x86_64-*-linux-gnu	Can build either x86_64 or x32
	ia64-*-linux-gnu
	m68k-*-linux-gnu
	microblaze*-*-linux-gnu
	mips-*-linux-gnu
	mips64-*-linux-gnu
	powerpc-*-linux-gnu	Hardware or software floating point, BE only.
	powerpc64*-*-linux-gnu	Big-endian and little-endian.
	s390-*-linux-gnu
	s390x-*-linux-gnu
	sh[34]-*-linux-gnu
	sparc*-*-linux-gnu
	sparc64*-*-linux-gnu
	tilegx-*-linux-gnu
	tilepro-*-linux-gnu

If you are interested in doing a port, please contact the glibc
maintainers; see http://www.gnu.org/software/libc/ for more
information.

See the file INSTALL to find out how to configure, build, and install
the GNU C Library.  You might also consider reading the WWW pages for
the C library at http://www.gnu.org/software/libc/.

The GNU C Library is (almost) completely documented by the Texinfo manual
found in the `manual/' subdirectory.  The manual is still being updated
and contains some known errors and omissions; we regret that we do not
have the resources to work on the manual as much as we would like.  For
corrections to the manual, please file a bug in the `manual' component,
following the bug-reporting instructions below.  Please be sure to check
the manual in the current development sources to see if your problem has
already been corrected.

Please see http://www.gnu.org/software/libc/bugs.html for bug reporting
information.  We are now using the Bugzilla system to track all bug reports.
This web page gives detailed information on how to report bugs properly.

The GNU C Library is free software.  See the file COPYING.LIB for copying
conditions, and LICENSES for notices about a few contributions that require
these additional notices to be distributed.  License copyright years may be
listed using range notation, e.g., 1996-2015, indicating that every year in
the range, inclusive, is a copyrightable year that would otherwise be listed
individually.