Commit Graph

727 Commits

Author SHA1 Message Date
Andreas Jaeger
1e3cdfda74 Merge branch 'elf-move'
Conflicts:
	debug/backtracesymsfd.c
	sysdeps/generic/elf/backtracesymsfd.c
	sysdeps/i386/configure.in
2012-03-27 21:35:36 +02:00
H.J. Lu
1532c7ac9a Make sure x86_64 GOT entry slot is always 8 bytes 2012-03-23 11:06:57 -07:00
Joseph Myers
7c69cd143b Fix cexp overflow (bug 13892). 2012-03-22 19:38:09 +00:00
H.J. Lu
81b035fe63 Replace Elf64_XXX with ElfW(XXX) in dl-irel.h 2012-03-22 10:17:05 -07:00
H.J. Lu
1da7940c77 Replace unsigned long with uint64_t 2012-03-22 10:02:57 -07:00
H.J. Lu
2ff87f3f18 Add sysdeps/x86_64/preconfigure 2012-03-22 08:28:39 -07:00
H.J. Lu
c8e43ba739 Add x32 support to dynamic linker audit 2012-03-21 17:14:49 -07:00
Andreas Schwab
7998fa7899 Disable use of FMA instructions in branred 2012-03-21 23:58:50 +01:00
Joseph Myers
1a4ac776eb Remove inaccurate x86 cexp implementations (bug 13883). 2012-03-21 15:28:05 +00:00
Joseph Myers
2460d3aa21 Fix pow of zero and infinity to large powers. 2012-03-21 12:16:00 +00:00
Andreas Jaeger
d6373f9ce3 Merge branch 'master' into elf-move 2012-03-20 20:40:16 +01:00
H.J. Lu
d1af992d0d Check __x86_64__ instead of __WORDSIZE in mathinline.h 2012-03-20 08:54:58 -07:00
H.J. Lu
114883e00a Support x86-64 __jmp_buf with __WORDSIZE != 64 2012-03-20 08:53:42 -07:00
H.J. Lu
95443d88af Use atomic64_t with 64bit atomic macros 2012-03-19 17:29:26 -07:00
H.J. Lu
490df6c441 Check __x86_64__ instead of __WORDSIZE for fenv_t 2012-03-19 16:10:51 -07:00
H.J. Lu
5e52b189f0 Check __x86_64__ instead of __WORDSIZE in mathdef.h 2012-03-19 16:00:52 -07:00
H.J. Lu
b4c35121c4 Use int64_t in x86_64/fpu/math_private.h 2012-03-19 15:17:48 -07:00
H.J. Lu
56965fd71c Cast _Unwind_GetCFA return to _Unwind_Ptr first 2012-03-19 13:33:20 -07:00
Joseph Myers
1897ad4432 Fix clog overflow/underflow (bug 13629). 2012-03-19 20:14:26 +00:00
Andreas Jaeger
6870c2ed1f Fix last line in configure.in. 2012-03-19 21:09:50 +01:00
Andreas Jaeger
8f599ddb67 Fix last line. 2012-03-19 21:06:14 +01:00
Andreas Jaeger
1221efbda9 Remove now obsolete elf/configure file. 2012-03-19 20:58:43 +01:00
Andreas Jaeger
0894643164 Merge contents from sysdeps/i386/configure.in (without i686 check) 2012-03-19 20:58:05 +01:00
Andreas Jaeger
9d639b9918 Move x86_64/elf files to x86_64 2012-03-19 20:55:26 +01:00
Richard Henderson
bd37f2ee31 Optimize private 387 fenv access; share code between i386 and x86_64. 2012-03-19 06:51:39 -07:00
Richard Henderson
d0adc92230 i386/x86_64: Optimize feholdexcept. 2012-03-19 06:51:06 -07:00
Richard Henderson
0fe0f1f86f Create and use libc_feupdateenv_test.
We can reduce the number of STMXCSR, and often we can avoid the
call to __feraiseexcept.
2012-03-19 06:50:41 -07:00
Richard Henderson
eb92c487b3 Create and use SET_RESTORE_ROUND{,_NOEX,_53BIT}{,F,L}. 2012-03-19 06:49:44 -07:00
Richard Henderson
b4dabbb47a Convert libc_feholdexcept et al from macros to inline functions. 2012-03-19 06:48:27 -07:00
Richard Henderson
4851a949b4 Make inline __isnan, __isinf_ns, __finite generic.
For code generation to stay identical on x86_64, this requires that
we define the fp word manipulation macros before including the
generic header.
2012-03-19 06:47:43 -07:00
H.J. Lu
473c3ef325 Define x86-64 ffsl alias only if __LP64__ is defined 2012-03-16 15:20:45 -07:00
Joseph Myers
c36e1d2369 Disable Bessel function TLOSS errors in POSIX mode. 2012-03-16 20:08:02 +00:00
Joseph Myers
11b90b9f50 Fix tan, tanl for large inputs. 2012-03-16 20:05:37 +00:00
Jan Kratochvil
6a1bd2a100 * sysdeps/x86_64/elf/start.S: Include <sysdep.h>.
(_start): Add cfi_startproc, cfi_undefined for rip and cfi_endproc.
2012-03-16 20:49:23 +01:00
Joseph Myers
8848d99dce Implement ldbl-96 sinl / cosl / sincosl (bug 13851). 2012-03-16 12:30:05 +00:00
Andreas Jaeger
356a10ee3e Merge branch 'master' into bug13658-branch 2012-03-14 16:36:17 +01:00
Joseph Myers
e456826d7a Fix csqrt overflow/underflow (bug 13841). 2012-03-14 11:53:32 +00:00
Paul Eggert
c524201ab0 Replace FSF snail mail address with URL in miscellaneous files. 2012-03-10 00:45:35 +00:00
Richard Henderson
b8c036204f Use include_next to chain math_private.h headers. 2012-03-09 16:11:26 -08:00
Richard Henderson
15194b4b3d x86_64: Convert __rint* and __floor* from macros to inlines. 2012-03-09 11:15:47 -08:00
Richard Henderson
64e21edef1 x86_64: Convert __ieee754_sqrt{,f,l} from macros to inlines. 2012-03-09 11:15:19 -08:00
Joseph Myers
d1d3431a3a Fix signs of zeros from casinh, cacosh etc. (bug 10716). 2012-03-07 15:15:19 +00:00
Andreas Jaeger
b35fe25ed9 [BZ #13658]
* sysdeps/x86_64/fpu/s_sincos.S: Delete.

	* math/libm-test.inc (sincos_test): Add test for large input.
2012-03-07 14:51:39 +01:00
Marek Polacek
a53b7a4e4b Fix up long double fphex. 2012-03-06 22:08:16 +01:00
Joseph Myers
b7cd39e8f8 Fix pow in non-default rounding modes (bug 3976). 2012-03-05 12:22:46 +00:00
Joseph Myers
ca811b2256 Test cosh, sinh in non-default rounding modes (bug 3976). 2012-03-05 12:20:24 +00:00
Marek Polacek
bc957d531c Remove oldish __GNUC_PREREQ. 2012-03-03 22:57:00 +01:00
Joseph Myers
804360ed83 Fix sin, cos, tan in non-default rounding modes (bug 3976). 2012-03-02 20:51:39 +00:00
Joseph Myers
28afd92dbd Fix exp in non-default rounding modes (bug 3976). 2012-03-02 15:12:53 +00:00
Joseph Myers
7b1902cb3e Improve erfc accuracy. 2012-03-01 21:15:38 +00:00
Joseph Myers
0fcad3e243 Add test for bug 5794 (incorrect expm1 overflow). 2012-02-29 20:49:20 +00:00
Joseph Myers
5b8a4d4a09 Reduce large expected errors from libm tests on x86 and x86_64. 2012-02-29 20:40:50 +00:00
Ulrich Drepper
39adf059fc Optimized expf for x86-64 2012-02-28 20:06:39 -05:00
Joseph Myers
5ad91f6e6f Resort ULPs files with gen-libm-test.pl -n in C locale. 2012-02-20 18:06:05 +00:00
Marek Polacek
ed656b4065 Provide crt[in].S for x86-64. 2012-02-14 13:50:44 +01:00
Paul Eggert
59ba27a63a Replace FSF snail mail address with URLs. 2012-02-09 23:18:22 +00:00
Andreas Schwab
6c6dbc6300 Reduce ldouble ULPs for jn tests on x86 2012-02-08 22:25:15 +01:00
Marek Polacek
622c86f480 Remove __ELF__ conditionals 2012-02-07 00:41:11 +01:00
Joseph Myers
3b1004624e Fix makefile/configure problems with sse2avx changes. 2012-01-30 19:55:15 +00:00
Ulrich Drepper
96bc5b45a6 Optimize x86-64 math inline header a bit 2012-01-28 21:20:06 -05:00
Ulrich Drepper
56f6f6a240 Use -msse2avx option for x86-64 libm functions 2012-01-28 14:48:46 -05:00
Ulrich Drepper
73139a7628 Simplify use of AVX instructions in internal math macros 2012-01-28 11:19:06 -05:00
Ulrich Drepper
08cf777f9e Really fix AVX tests
There is no problem with strcmp, it doesn't use the YMM registers.
The math routines might since gcc perhaps generates such code.
Introduce bit_YMM_USBALE and use it in the math routines.
2012-01-26 09:45:54 -05:00
Ulrich Drepper
afc5ed09cb Reset bit_AVX in __cpu_features is OS support is missing 2012-01-26 07:45:14 -05:00
Ulrich Drepper
a0da5fe1e4 More fallout from supporting only ELF 2012-01-08 00:45:01 -05:00
Ulrich Drepper
a784e50247 Remove pre-ISO C support
No more __const.
2012-01-07 23:57:22 -05:00
Ulrich Drepper
0269750ca6 Remove non-ELF support 2012-01-07 20:30:26 -05:00
Ulrich Drepper
305502f69d Fix up a comment 2012-01-07 11:49:33 -05:00
Ulrich Drepper
e40bca1ef9 Yet more ia64 removal fallout 2012-01-07 11:44:02 -05:00
Marek Polacek
530a32499a Fix typos in comments 2011-12-23 13:59:40 -05:00
Ulrich Drepper
67371b5666 Prevent warnings due to long long constants 2011-12-23 13:52:59 -05:00
Liubov Dmitrieva
15db4de19d Fix overrun in destination buffer 2011-12-23 12:02:15 -05:00
Ulrich Drepper
21eaf3a5f9 Use __REDIRECT_NTH for __feraiseexcept_renamed 2011-12-22 08:05:21 -05:00
Rafael Ávila de Espíndola
d2daaa1eb6 Define x86_64 feraiseexcept inline only under __USE_EXTERN_INLINES. 2011-12-21 13:27:09 -08:00
Ulrich Drepper
370a7d88f7 WP fixes 2011-12-17 14:41:05 -05:00
Ulrich Drepper
1d3e4b618a Optimized wcschr and wcscpy for x86-64 and x86-32 2011-12-17 14:39:23 -05:00
Ulrich Drepper
aff2453df7 Fix more warnings 2011-12-03 21:49:35 -05:00
Ulrich Drepper
34372fc6d3 Fix test of non-ASCII locales in x86-64 strcasecmp et.al. 2011-11-01 16:46:23 -04:00
Ulrich Drepper
52e4b9eb62 More cleanups of x86-64 strstr 2011-10-28 19:01:48 -04:00
Ulrich Drepper
fd52bc6dc4 Clean up x86-64 strcasestr
Actually describe in the C code what is going on.
2011-10-28 18:18:04 -04:00
Ulrich Drepper
a5b81e1fb7 Remove code without too much effects
Some of the AVX-specific code is not giving enough speed-up to
justify the extra code.
2011-10-28 16:55:01 -04:00
Ulrich Drepper
e0016b11d6 Add AVX optimized versions for some x86-64 math functions 2011-10-25 21:34:55 -04:00
Ulrich Drepper
618280a192 Optimize x86-64 SSE4.2+ strcmp a bit more 2011-10-25 14:50:31 -04:00
Ulrich Drepper
31ea014d8b Use VEX encoding in inline math functions on x86-64 when possible 2011-10-25 08:17:57 -04:00
Ulrich Drepper
31d3cc00b0 Cleanup FMA4 patch
Move the FMA4 code into its own section.  Avoid some of the duplication
of data resulting from the double use of source files.
2011-10-25 00:56:33 -04:00
Ulrich Drepper
202c9deb15 Better DLA_FMS
It's better to use __builtin_fma if it works.  Use it for gcc 4.6 and
higher.  Move the x86-64 dla.h to the correct place.
2011-10-24 22:11:21 -04:00
Ulrich Drepper
a0cf1edd4c Use inline asm for DLA_FMS because of broken old compilers 2011-10-24 21:17:10 -04:00
Ulrich Drepper
af968f62f2 Optimize accurate 64-bit routines for FMA4 on x86-64 2011-10-24 20:19:17 -04:00
Ulrich Drepper
09229f3e1b Fix WS 2011-10-23 14:57:28 -04:00
Liubov Dmitrieva
ce7dd29f28 Optimized strnlen and wcscmp for x86-64 2011-10-23 14:56:04 -04:00
Ulrich Drepper
f17424ed53 Fix WS 2011-10-23 13:35:24 -04:00
Liubov Dmitrieva
95584d3b33 Fix signedness in wcscmp comparison 2011-10-23 13:34:15 -04:00
Ulrich Drepper
774a2669af Clean up FMA use
The macro's name should reflect that subtraction is being done.  And
use __builtin_fma, it seems to work after all.
2011-10-23 13:31:01 -04:00
Ulrich Drepper
c8b3296bbe Clean up last dla.h change 2011-10-23 12:50:28 -04:00
Ulrich Drepper
fb24de5932 Fix typo in last change 2011-10-22 20:09:58 -04:00
Ulrich Drepper
0d355eb7c7 Update ULPs for x86-64 2011-10-22 20:06:23 -04:00
Ulrich Drepper
c196fed8f0 Fix compilation problems in x86-64 init-arch 2011-10-21 20:47:20 -04:00
Ulrich Drepper
1a97a8c78f Don't use NULL in last s_fma{,f} change 2011-10-21 07:39:28 -04:00
Ulrich Drepper
ed72b6545f Check for FMA4 support and generate appropriate fma functions 2011-10-20 22:43:15 -04:00
Ulrich Drepper
8d4f46c613 Move fma routines to right place 2011-10-20 21:55:41 -04:00
Ulrich Drepper
855d156018 Optimize x86-64 rawmemchr and add test 2011-10-19 22:22:29 -04:00
Ulrich Drepper
d9a4d2ab27 Add optimized str{,n}casecmp for AVX on x86-64 2011-10-19 12:42:38 -04:00
Andreas Schwab
8f3b1ffefa Fix PLT use for feraiseexcept on x86_64 2011-10-19 13:03:31 +02:00
Ulrich Drepper
d9a8d0abcc Use new internal libc_fe* interfaces in more functions 2011-10-18 15:11:31 -04:00
Ulrich Drepper
4855e3ddf5 Provide combined internal feholdexcept/fesetround interface 2011-10-18 09:59:04 -04:00
Ulrich Drepper
23ce562780 Pretty print last change to x86-64 mathinline.h 2011-10-18 09:38:47 -04:00
Ulrich Drepper
581d30e386 Add optimized nearbyint{,f} for x86-64 2011-10-18 09:13:23 -04:00
Ulrich Drepper
d38f1dba00 Start optimizing the use of the fenv interfaces in libm itself 2011-10-18 09:00:46 -04:00
Andreas Schwab
83c7615c2d Fix last change 2011-10-18 14:11:29 +02:00
Andreas Schwab
caa6c9d845 Fix linkage conflict with feraiseexcept 2011-10-18 11:46:51 +02:00
Ulrich Drepper
228a984d54 Relax asm requirements for recently added x86-64 math interfaces 2011-10-17 20:30:52 -04:00
Ulrich Drepper
c8553a6a6f Makr x86-64 math_private.h more robust 2011-10-17 16:00:39 -04:00
Ulrich Drepper
ed22dcf691 Provide internal optimizations on x86-64 with SSE4.1
Provide macros so that the internal users can, if possible, directly use
the new instructions.

Also fix up the mathinline.h header when compiling with SSE4.1 enabled.
2011-10-17 11:23:40 -04:00
Ulrich Drepper
b171c13768 Fix last x86-64 mathinline change
Use correct function names.
2011-10-17 10:37:00 -04:00
Ulrich Drepper
ad0f5cad15 Use rounds{s,d} for x86 rint, ceil, floor 2011-10-16 20:58:17 -04:00
Ulrich Drepper
2d1f3a4db6 Fix WS 2011-10-15 11:11:12 -04:00
Liubov Dmitrieva
be13f7bff6 Optimized memcmp and wmemcmp for x86-64 and x86-32 2011-10-15 11:10:08 -04:00
Andreas Schwab
6b1f68c91f Fix lost feraiseexcept symbol 2011-10-14 11:21:23 +02:00
Andreas Schwab
714fad23c6 Fix PLT use in feupdateenv on x86_64 2011-10-13 15:26:45 +02:00
Andreas Schwab
81dcc7fb74 Check for zero size in memrchr for x86_64 2011-10-13 13:34:41 +02:00
Ulrich Drepper
0ac5ae2335 Optimize libm
libm is now somewhat integrated with gcc's -ffinite-math-only option
and lots of the wrapper functions have been optimized.
2011-10-12 11:27:51 -04:00
Ulrich Drepper
7edb55ce06 Optimize use of isnan, isinf, finite 2011-10-08 10:18:26 -04:00
Ulrich Drepper
66fb11b1da Fix whitespace 2011-10-07 11:50:21 -04:00
Liubov Dmitrieva
093ecf9299 Improve 64 bit memchr, memrchr, rawmemchr with SSE2 2011-10-07 11:49:10 -04:00
Andreas Schwab
3a62d00d40 Don't call ifunc functions in trace mode 2011-10-05 14:35:40 +02:00
Andreas Schwab
bf972c9dfc Fix parse error in bits/mathinline.h with --std=c99 2011-09-26 14:01:30 +02:00
Ulrich Drepper
4c1a1f71c0 Add fmax and fmin inlines for x86-64 2011-09-15 13:11:08 -04:00
Ulrich Drepper
ee4d03150a Use correct section to allow merging 2011-09-14 13:43:24 -04:00
Ulrich Drepper
cd20565401 Optimized lrint and llrint for x86-64 2011-09-14 12:58:43 -04:00
Andreas Schwab
e529793b50 Avoid macro clash between <sys/select.h> and <linux/posix_types.h> 2011-09-13 15:16:38 +02:00
Ulrich Drepper
83cd142045 Remove --wth-tls option, TLS support is required 2011-09-11 15:02:01 -04:00
Ulrich Drepper
d063d16433 Remove support for !USE___THREAD 2011-09-10 16:50:28 -04:00
Petr Baudis
1248c1c415 Fix jn precision 2011-09-09 22:16:10 -04:00
H.J. Lu
08a300c956 Simplify AVX check 2011-09-07 21:38:23 -04:00
Ulrich Drepper
ceaa0c5dc3 Move Atom-optimized code out of the way and together 2011-09-06 21:53:03 -04:00
Ulrich Drepper
8e1294e83f Remove now-wrong comment 2011-09-06 17:20:33 -04:00
Ulrich Drepper
6d18b67f4d Fix whitespaces 2011-09-05 21:42:12 -04:00
Liubov Dmitrieva
a5f524e479 Add Atom-optimized strchr and strrchr for x86-64 2011-09-05 21:34:03 -04:00
Ulrich Drepper
49d42c37ba Add optimized x86-64 wcscmp 2011-09-05 14:08:23 -04:00
Ulrich Drepper
0276a718c0 Fix minor CFI problem in regular x86-64 trampoline 2011-08-20 08:58:44 -04:00
Ulrich Drepper
c88f17668b Fix CFI info in x86-64 trampolines for non-AVX code 2011-08-20 08:56:30 -04:00
Ulrich Drepper
8e999d2962 Minor optimization of popcount in l10nflist 2011-08-11 14:07:04 -04:00
Andreas Schwab
8c1a459f9a Fix inline strncat/strncmp on x86 2011-08-04 14:59:25 -04:00
Ulrich Drepper
bba33c289b One more typo in AVX test 2011-07-23 15:18:13 -04:00
Ulrich Drepper
2ee5518515 Merge branch 'master' of ssh://sourceware.org/git/glibc
Conflicts:
	ChangeLog
2011-07-23 00:04:15 -04:00
Ulrich Drepper
1aae088a8a One more change to XSAVE patch 2011-07-22 23:33:22 -04:00
Andreas Schwab
1d002f2539 Fix AVX check 2011-07-22 14:33:47 -04:00
Ulrich Drepper
21137f89c5 Fix overflow bug is optimized strncat for x86-64 2011-07-21 12:32:36 -04:00
Ulrich Drepper
5644ef5461 Fix check for AVX enablement
The AVX bit is set if the CPU supports AVX.  But this doesn't mean the
kernel does.  Add checks according to Intel's documentation.
2011-07-20 21:21:03 -04:00
Ulrich Drepper
6986b98a18 Force :a_x86_64_ymm to be 16-byte aligned 2011-07-20 14:20:00 -04:00
Ulrich Drepper
8002999481 Fix whitespaces 2011-07-19 17:27:09 -04:00
Liubov Dmitrieva
99710781cc Improve 64 bit strcat functions with SSE2/SSSE3 2011-07-19 17:11:54 -04:00
Ulrich Drepper
ecaddd6699 Rebuild configure scripts 2011-07-06 21:29:02 -04:00
H.J. Lu
8912479f9e Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64 2011-06-24 15:14:22 -04:00
H.J. Lu
0b1cbaaef5 Optimized st{r,p}{,n}cpy for SSE2/SSSE3 on x86-32 2011-06-24 14:15:32 -04:00
David S. Miller
42675c6ff0 Add an elf_ifunc_invoke interface so that architectures can implement
the ifunc resolver calls however they wish.
2011-06-20 19:56:40 -07:00
H.J. Lu
3d29045b5e Assume Intel Core i3/i5/i7 processor if AVX is available 2011-06-03 07:01:25 -04:00
H.J. Lu
8db736347c Fix typo in x86-64 powl 2011-05-18 19:50:48 -04:00
Mike Frysinger
4c559bcdf3 Fix static linking with checking x86/x86-64 memcpy. 2011-04-17 22:20:47 -04:00
Ulrich Drepper
e6c6149412 Fix memory leak in TLS of loaded objects. 2011-04-10 22:43:01 -04:00
Ulrich Drepper
dedc7c7b05 Fix typo in cache information table for x86-{32,64}. 2011-04-03 09:32:31 -04:00
H.J. Lu
0354e35501 Work around old buggy program which cannot cope with memcpy semantics. 2011-04-01 19:38:21 -04:00
Ulrich Drepper
bb2420590c Last change caused infinite loops because of missing loop increment. 2011-03-22 01:52:43 -04:00
H.J. Lu
c97a1282a4 Handle page boundaries in x86 SSE4.2 strncmp. 2011-03-21 05:35:38 -04:00
Ulrich Drepper
2a11560107 Implement x86 cpuid handling of leaf4 for cache information. 2011-03-20 08:14:30 -04:00
Harsha Jagasia
7e4ba49cd3 Enable SSE2 memset for AMD'supcoming Orochi processor.
This patch enables SSE2 memset for AMD's upcoming Orochi processor.
This patch also fixes the following bug:
For misaligned blocks larger than > 144 Bytes, memset branches into
the integer code path depending on the value of misalignment even if
the startup code chooses the SSE2 code path upfront, when multiarch
is enabled.
2011-03-04 23:30:08 -05:00
Ulrich Drepper
baa6c69a57 Work around empty line at end file generated by autoconf. 2011-02-17 01:26:07 -05:00
Ulrich Drepper
e943389325 Remove use of ranlib. 2011-02-15 14:52:29 -05:00
Roland McGrath
a0bf67cca2 Fix some warning nits. 2011-02-04 10:53:51 -08:00
Ulrich Drepper
f257bbd77d Clean up some bits/select.h headers. 2011-01-09 16:49:17 -05:00
Ryan S. Arnold
30950a5fd2 Make PowerPC64 default to nonexecutable stack 2010-12-19 22:49:01 -05:00
H.J. Lu
13b695749a Support Intel processor model 6 and model 0x2. 2010-11-12 03:48:52 -05:00
H.J. Lu
8ca52c6e3b Fix one exit path in x86-64 SSE4.2 str{,n}casecmp. 2010-11-10 03:05:37 -05:00
Ulrich Drepper
69da074d7a Fix warnings in __bswap_16. 2010-11-10 02:38:35 -05:00
H.J. Lu
ff02d5280b Use IFUNC on x86-64 memset 2010-11-08 03:41:34 -05:00
Ulrich Drepper
c0dde15b5d 32bit memset-sse2.S fails with uneven cache size
32bit memset-sse2.S assumes cache size is multiple of 128 bytes.  If
it isn't true, memset-sse2.S will fail.  For example, a processor can
have 24576 KB L3 cache and 20 cores. That is 2516582 byte per core. Half
of it is 1258291, which isn't helpful for vector instructions.  This
patch rounds cache sizes to multiple of 256 bytes and adds "raw" cache
sizes.
2010-11-05 07:57:46 -04:00
Richard Li
dbf3a06904 Fix x86-64 strchr propagation of search byte into all bytes of SSE register 2010-10-25 14:13:17 -04:00
Ulrich Drepper
18edac4857 Provide FP_FAST_FMA{,F,L} definitions for x86/x86-64. 2010-10-19 12:56:42 -04:00
Jakub Jelinek
5e908464b9 Implement accurate fma. 2010-10-13 22:27:03 -04:00
Jakub Jelinek
9ff8d36f27 Correct implementation of fmaf. 2010-10-11 09:27:05 -04:00
Ulrich Drepper
45db99c7d0 Fix handling of tail bytes of buffer in SSE2/SSSE3 x86-64 version strn{,case}cmp 2010-10-03 22:10:30 -04:00
Ulrich Drepper
015a4c6193 Re-enable all strncasecmp versions. 2010-09-20 20:18:00 -07:00
Ulrich Drepper
8ffcee4a04 Fix limit detection in x86-64 SSE2 strncasecmp. 2010-09-20 14:02:23 -07:00
Ulrich Drepper
0959ffc97b Update x86-64 mpn routines from GMP 5.0.1. 2010-09-02 23:36:25 -07:00
Ulrich Drepper
01d2601561 Fix typo in last commit. 2010-08-26 22:35:42 -07:00
Ulrich Drepper
9ea3de11f1 Move slow Atom code to separate section. 2010-08-26 22:17:03 -07:00
Ulrich Drepper
107b2fa56c Shorten x86-64 strlen a bit. 2010-08-26 22:12:16 -07:00
H.J. Lu
623aac7f84 Unroll x86-64 strlen 2010-08-26 22:09:34 -07:00
H.J. Lu
b416a90085 Missing comma in last commit. 2010-08-26 13:18:46 -07:00
Roland McGrath
8b2b771538 Clean up warnings in new x86_64/multiarch code. 2010-08-25 12:13:08 -07:00
H.J. Lu
e73015f2d6 Unroll 32bit SSE strlen and handle slow bsf 2010-08-25 10:07:37 -07:00
Ulrich Drepper
1cdfe7242f Add missing copyright year updated and pretty printing. 2010-08-24 11:42:19 -07:00
Richard Henderson
73f27d5e72 Clean up SSE variable shifts 2010-08-24 11:35:01 -07:00
Ulrich Drepper
9da4bb316f Fix two typos in x86-64 SSE4.2 strncasecmp implementation. 2010-08-19 09:20:44 -07:00
Ulrich Drepper
1feccb6caf Fix fourth parameter of SSE4.2 strcmp for x86-64. 2010-08-15 20:46:09 -07:00
Ulrich Drepper
28c90b2cf5 Use correct register for fourth parameter of x86-64 strncasecmp_l. 2010-08-15 17:42:12 -07:00
Ulrich Drepper
25244f174f Undo inccorect change. 2010-08-15 10:34:33 -07:00
Ulrich Drepper
e9f82e0d1d Add optimized strncasecmp versions for x86-64. 2010-08-14 22:04:01 -07:00
Ulrich Drepper
ca6bb004eb Fix x86-64 build without multiarch. 2010-08-14 14:56:32 -07:00
Andi Kleen
d22e4cc939 x86: Add support for frame pointer less mcount 2010-08-07 21:24:05 -07:00
Ulrich Drepper
73507d3ae0 Add support for SSSE3 and SSE4.2 versions of strcasecmp on x86-64. 2010-07-31 21:41:09 -07:00
Ulrich Drepper
66f6765a47 Pretty printing x86-64 SSE4.3 strcmp. 2010-07-30 12:54:37 -07:00
Ulrich Drepper
42e08a5438 Implement optimized strcaecmp for x86-64. 2010-07-30 00:14:04 -07:00
Ulrich Drepper
fe36dd025e Fix tolower operation in strcasestr. 2010-07-30 00:09:07 -07:00
Ulrich Drepper
880113d91e Avoid compiling unneeded file in ld.so. 2010-07-27 21:12:59 -07:00
Ulrich Drepper
24fb0f88ed Add optimized x86-64 implementation of strnlen.
While at it, beef up the test suite for strnlen and add performance
tests for it, too.
2010-07-26 08:37:08 -07:00
Ulrich Drepper
8e96b93aa7 Speed up x86-64 strcasestr a bit moew.
Using the new SSE4.2 instructions is cool but not really the fastest.
Some older SSE instructions can do the trick faster.
2010-07-24 08:34:44 -07:00
Andreas Schwab
f6a31e0eb6 Add strcasestr-nonascii to i386 build 2010-07-21 07:26:18 -07:00
Ulrich Drepper
d02dc4ba08 Fix non-ASCII case of SSE4.2 strcasstr. 2010-07-16 16:00:22 -07:00
Ulrich Drepper
cc9f2e47a0 Speed up SSE4.2 strcasestr by avoiding indirect function call. 2010-07-16 15:37:38 -07:00
H.J. Lu
6fb8cbcb58 Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7
This patch includes optimized 64bit memcpy/memmove for Atom, Core 2 and
Core i7.  It improves memcpy by up to 3X on Atom, up to 4X on Core 2 and
up to 1X on Core i7.  It also improves memmove by up to 3X on Atom, up to
4X on Core 2 and up to 2X on Core i7.
2010-06-30 08:26:11 -07:00
H.J. Lu
3c88fe1e3a Incorrect x86 CPU family and model check. 2010-05-27 11:14:18 -07:00
Ulrich Drepper
94a27fabeb Whitespace fix. 2010-04-14 22:29:51 -07:00
H.J. Lu
a11ec63713 Add x86-32 FMA support 2010-04-14 22:27:59 -07:00
H.J. Lu
df87f54923 Check DATA_CACHE_SIZE_HALF 2010-04-14 22:18:27 -07:00
H.J. Lu
dd37cd1a12 Optimie x86-64 SSE4 memcmp for unaligned data. 2010-04-14 17:53:44 -07:00
H.J. Lu
404a6e3201 x86-64 SSE4 optimized memcmp
This is 64bit SSE4 optimized memcmp. It improves memcmp by upto 3X
on Intel Core i7.
2010-04-14 00:12:53 -07:00
Ulrich Drepper
bbbdd77809 Update x86-64 cpu multiarch selection header. 2010-04-13 19:17:10 -07:00
Ulrich Drepper
22f4f44b67 Fix concurrent handling of __cpu_features. 2010-04-04 00:25:46 -07:00
H.J. Lu
7d9335ecd7 Don't define __strpbrk_sse42 in static library 2010-03-24 12:16:24 -07:00
Richard Guenther
e39acb1f16 Fix R_X86_64_PC32 overflow detection 2010-03-04 19:33:41 -08:00
Ulrich Drepper
4a1297d761 We can use the 64-bit register versions of the double functions. 2010-02-24 20:00:30 -08:00
Andreas Schwab
7eb22e757e Avoid PLT call to fegetenv on s390 2010-02-09 22:34:17 -08:00
Ulrich Drepper
f69190e74a Prevent silent errors should x86-64 strncmp be needed outside libc. 2010-01-14 08:09:32 -08:00
H.J. Lu
5a7af22fbb Unroll the loop x86-64 SSE4.2 strlen. 2010-01-13 07:51:48 -08:00
H.J. Lu
3af48cbdfa Optimize 32bit memset/memcpy with SSE2/SSSE3. 2010-01-12 11:22:03 -08:00
H.J. Lu
2510d01ddb Define bit_SSE2 and index_SSE2. 2009-12-13 15:23:02 -08:00
H.J. Lu
51ddd2c01e Define bit_XXX and index_XXX.
This patch defines bit_XXX and index_XXX and use them to check processor
feature in assembly code.  It can prevent typos in processor feature
check.
2009-12-13 09:47:02 -08:00
Ulrich Drepper
823bc6da65 Fix whitespaces. 2009-10-22 22:50:00 -07:00
H.J. Lu
001659f4d5 Implement SSE4.2 optimized strchr and strrchr. 2009-10-22 22:47:12 -07:00
Roland McGrath
b0f3a2e43f Clean up unnecessary libc_hidden_builtin_def fiddling in x86 multiarch definitions. 2009-10-06 20:01:23 -07:00
Roland McGrath
9d6982d5d2 Clean up x86 multiarch HAS_FOO macros. 2009-10-06 19:59:03 -07:00
Roland McGrath
7967983fd4 configure tweaks, support $libc_add_on_config_subdirs 2009-09-15 14:14:42 -07:00
Jakub Jelinek
22bb992d51 Fix strstr/strcasestr/fma/fmaf on x86_64. 2009-09-02 19:43:04 -07:00
Jakub Jelinek
240441038f Fix x86_64 bits/mathinline.h for -m32 compilation. 2009-09-01 15:30:12 -07:00
Andreas Schwab
c2735e958a Fix parse error in bits/mathinline.h with --std=c99 2009-08-31 17:26:14 +02:00
H.J. Lu
5a4eb7282e Remove ENABLE_SSSE3_ON_ATOM.
It turns that SSSE3 isn't slow on Atom. The problem is bsf. This patch
removes ENABLE_SSSE3_ON_ATOM.
2009-08-28 14:54:46 -07:00
Ulrich Drepper
65b14bcee2 Optimize out duplicated scalbln code for x86-64. 2009-08-25 16:46:34 -07:00
Ulrich Drepper
7423a3456a Optimized signbit{,f} for x86-64. 2009-08-25 14:54:12 -07:00
Ulrich Drepper
84088310ce Handle AVX saving on x86-64 in interrupted smbol lookups.
If a signal arrived during a symbol lookup and the signal handler also
required a symbol lookup, the end of the lookup in the signal handler reset
the flag whether restoring AVX/SSE registers is needed.  Resetting means
in this case that the tail part of the outer lookup code will try to
restore the registers and this can fail miserably.  We now restore to the
previous value which makes nesting calls possible.
2009-08-25 10:42:30 -07:00
Ulrich Drepper
cf00cc00bc Add ceil implementation for 64-bit machines.
On 64-bit machines we should not split doubles into two 32 bit
integer and handle the words separately.  We have wide registers.
This patch implements a 64-bit ceil version.  Ideally all other
functions will be converted over time.
2009-08-24 18:05:48 -07:00
Ulrich Drepper
9a1ea1525e Optimize float construction/extraction on x86-64. 2009-08-24 14:52:49 -07:00
Ulrich Drepper
ef72d5f1b9 Optimize x86-64 signbit{,f} a bit. 2009-08-24 10:20:58 -07:00
H.J. Lu
4e1e2f4247 Support mixed SSE/AVX audit and check AVX only once.
This patch fixes mixed SSE/AVX audit and checks AVX only once in
_dl_runtime_profile. When an AVX or SSE register value in pltenter is
modified, we have to make sure that the SSE part value is the same in both
lr_xmm and lr_vector fields so that pltexit will get the correct value
from either lr_xmm or lr_vector fields. AVX-enabled pltenter should
update both lr_xmm and lr_vector fields to support stacked AVX/SSE
pltenter functions.
2009-08-08 10:54:42 -07:00
Ulrich Drepper
8e436522e1 Move SSE4.2 functions together. 2009-08-08 09:38:32 -07:00
Ulrich Drepper
0fda545d5f Add SSSE3-optimized implementation of str{,n}cmp for x86-64. 2009-08-07 22:51:02 -07:00
Ulrich Drepper
57b378ac89 Avoid warning through fake initialization. 2009-08-07 16:19:54 -07:00
Ulrich Drepper
3aa2588d4a Fix whitespaces in last checkin. 2009-08-07 09:47:12 -07:00
H.J. Lu
a546baa9cd Properly count number of logical processors on Intel CPUs.
The meaning of the 25-14 bits in EAX returned from cpuid with EAX = 4
has been changed from "the maximum number of threads sharing the cache"
to "the maximum number of addressable IDs for logical processors sharing
the cache" if cpuid takes EAX = 11.  We need to use results from both
EAX = 4 and EAX = 11 to get the number of threads sharing the cache.

The 25-14 bits in EAX on Core i7 is 15 although the number of logical
processors is 8.  Here is a white paper on this:

http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/

This patch correctly counts number of logical processors on Intel CPUs
with EAX = 11 support on cpuid.  Tested on Dinnington, Core i7 and
Nehalem EX/EP.

It also fixed Pentium Ds workaround since EBX may not have the right
value returned from cpuid with EAX = 1.
2009-08-07 09:39:36 -07:00
H.J. Lu
02cea47161 Add x86 32-bit SSE4.2 string functions.
This patch adds 32bit SSE4.2 string functions.  It uses -16L instead of
0xfffffffffffffff0L, which works for both 32bit and 64bit long.  Tested
on 32bit Core i7 and Core 2.
2009-08-04 12:13:43 -07:00