Atomic ops are issued directly from the core, rather than
potentially sitting in the write buffer, so can improve the
performance of other waiters. In addition, if we didn't end
up pulling a copy of the cache line where the lock is into cache,
by using an atomic op we don't have to acquire the cache line
before we can unlock.
With gcc 4.8 tilegx has support for -mcmodel=large, to tolerate very
large shared objects. This option changes the compiler output to
not include direct jump instructions, which have a range of only
2^30, i.e +/- 512MB. Instead the compiler marshalls the target PCs
into registers and then uses jump- or call-to-register instructions.
For glibc, the upshot is that we need to arrange for a few functions
to tolerate the possibility of a large range between the PC and
the target. In particular, the crti.S and start.S code needs
to be able to reach from .init to the PLT, as does gmon-start.c.
The elf-init.c code has the reverse problem, needing to call from
libc_nonshared.a (linked at the end of shared objects) back to the
_init section at the beginning.
No other functions in *_nonshared.a need to be built this way, as
they only call the PLT (or potentially each other), but all of that
code is linked at the very end of the shared object.
We don't build the standard -static archives with this option as the
performance cost is high enough and the use case is rare enough that
it doesn't seem worthwhile. Instead, we would encourage developers
who need the -static model with huge executables to build a private
copy of glibc and configure it with -mcmodel=large.
Note that libc.so et al don't need any changes; the only changes
are for code that is statically linked into user code built with
-mcmodel=large.
For the assembly code, I just rewrote it so that it unconditionally
uses the large model. To be able to pass -mcmodel=large to
csu/elf-init.c and csu/gmon-start.c, I need to check to see if the
compiler supports that flag, since gcc 4.7 doesn't; I added the
support by creating a small Makefile fragment that just runs the
compiler to check.
Normally, the simulator is notified of absolute pathnames by the
_dl_load_hook hook. However, when a relative pathname is used, the
simulator may not know that the relative path matches a path that
it could figure out in the file system that it has access to.
Instead we provide a simplified version of the realpath function
so we can pass a plausible absolute pathname to the simulator.
Since we're now doing more work at object load time, we also add
a guard so we do no work at all if we're not running on the simulator.
- Override <memcopy.h> so we use full 8-byte word copies on tilegx32
for memmove, then use op_t in memcpy instead of the previous
locally-defined word_t just to avoid proliferating identical types.
- Fix bug in memcpy prefetch that caused us to never prefetch past
the first cache line.
- Optimize misaligned memcpy by inlining _wordcopy_fwd_dest_aligned
instead of just doing a dumb word-at-a-time copy.
- Make memcpy safe for forward copies by doing all the loads from
a given cache line prior to doing a wh64 (cache line zero-fill)
on the destination. Remove now-redundant src == dst check.
- Copy and optimize the generic wordcopy.c routines to use the tile
"double align" instruction instead of the MERGE macro; to avoid
offset addressing mode (which tile doesn't have) by rewriting the
pointer math to load and store with a zero index; and to use
post-increment addresses in the inner loops to improve scheduling.
This hook is useful for any arch-specific functionality that
should be done on loaded objects. For the tile architecture,
the hook is already provided (though we switch to using the new
macro name with this commit) and implements a simulator notifier
so that the simulator can load Elf symbols to match the object
and generate better error messages for PC's.
Also, remove a spurious definition of DL_UNMAP in dl-runtime.c
We must save and restore r19 in both PIC and non-PIC
situations since the kernel paths that clobber r19
are independent of that PIC-ness of userspace.
In addition we choose r4 as the temporary register over
r3 which is being used by recent gcc's as the frame
pointer.
* sysdeps/unix/sysv/linux/x86/bits/fcntl.h (__O_LARGEFILE)
[!__x86_64]: Do not define, take value from <bits/fcntl-linux.h>.
* sysdeps/unix/sysv/linux/s390/bits/fcntl.h (__O_LARGEFILE):
[__WORDSIZE != 64]: Likewise.
* sysdeps/unix/sysv/linux/generic/bits/fcntl.h: (__O_LARGEFILE)
[__WORDSIZE != 64]: Do not define, take value from
<bits/fcntl-linux.h>.
* sysdeps/unix/sysv/linux/hppa/bits/fcntl.h: Remove all
definitions and declarations that are provided by
<bits/fcntl-linux.h> and include <bits/fcntl-linux.h>.
(__O_PATH): Define.
* sysdeps/unix/sysv/linux/m68k/bits/fcntl.h: Remove all
definitions and declarations that are provided by
<bits/fcntl-linux.h> and include <bits/fcntl-linux.h>.
* sysdeps/unix/sysv/linux/generic/bits/fcntl.h: Remove all
definitions and declarations that are provided by
<bits/fcntl-linux.h> and include <bits/fcntl-linux.h>.
* sysdeps/unix/sysv/linux/ia64/bits/fcntl.h: Remove all
definitions and declarations that are provided by
<bits/fcntl-linux.h> and include <bits/fcntl-linux.h>.
* sysdeps/unix/sysv/linux/mips/bits/fcntl.h: Remove all
definitions and declarations that are provided by
<bits/fcntl-linux.h> and include <bits/fcntl-linux.h>.
* sysdeps/unix/sysv/linux/arm/bits/fcntl.h: Remove all
definitions and declarations that are provided by
<bits/fcntl-linux.h> and include <bits/fcntl-linux.h>.
We can discover our x,y coordinate in the core mesh with an
mfspr instruction, multiply y by the core mesh width, and have
the core number without needing to ask the kernel.
Updates the hppa-specific pthread.h from the generic version.
After this update the only difference between the generic
version and the hppa version is the footer protected by the
_PTHREAD_H_HPPA_ guard.
The new strtod function wants rounding information from the C lib, so
move the guts of the ia64 version into a header file for it to use.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
The ia64 gcc port has never shipped a crtbeginT.o, so keep using the
old crtbegin.o object when static linking.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Fix a build failure by using __prlimit64 as the internal
function name for the versioned symbol prlimit64. Without
this patch the build system attempts to alias prlimit64
to itself and that is invalid.
The convert_bit macro allows the compiler to translate the bit
positions more efficiently. The assumption of only running at
program startup allows eliding the __ieee_get_fp_control call.
The previous dummy definition (as type int) was fine in general, since
tile doesn't have floating-point registers, but it confused gdb's
configure, leading to later compile errors. This change also makes
prfpregset_t parallel to prgregset_t, which seems like generally the
right thing regardless of the non-existence of the actual registers :-)
We've been carrying this in Gentoo for quite a long time to fix some test
failures that people hit.
Original message:
> make[4]: *** [/glibc/glibc-package-2.3/mips-linux/obj/math/test-fpucw.out] Error 1
This test fails since the read back fpu control word is 0x80000 instead
of 0x0. I wonder if this patch is correct:
...
which additionally masks out the condition bit 23 - note that the other
condition bits (25-31) are masked out too?
URL: http://sourceware.org/ml/libc-alpha/2002-10/msg00392.html
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
While most arches have had the fdatasync syscall for a long time, the
alpha port didn't add it until the 2.6.22 release.
This is heavily based on Aurelien Jarno's initial work.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>