binutils-gdb/ld/testsuite/ld-x86-64/ibt-plt-3c-x32.d
Andrew Burgess 202be274a4 opcodes/i386: remove trailing whitespace from insns with zero operands
While working on another patch[1] I had need to touch this code in
i386-dis.c:

  ins->obufp = ins->mnemonicendp;
  for (i = strlen (ins->obuf) + prefix_length; i < 6; i++)
    oappend (ins, " ");
  oappend (ins, " ");
  (*ins->info->fprintf_styled_func)
    (ins->info->stream, dis_style_mnemonic, "%s", ins->obuf);

What this code does is add whitespace after the instruction mnemonic
and before the instruction operands.

The problem I ran into when working on this code can be seen by
assembling this input file:

    .text
    nop
    retq

Now, when I disassemble, here's the output.  I've replaced trailing
whitespace with '_' so that the issue is clearer:

    Disassembly of section .text:

    0000000000000000 <.text>:
       0:	90                   	nop
       1:	c3                   	retq___

Notice that there's no trailing whitespace after 'nop', but there are
three spaces after 'retq'!

What happens is that instruction mnemonics are emitted into a buffer
instr_info::obuf, then instr_info::mnemonicendp is setup to point to
the '\0' character at the end of the mnemonic.

When we emit the whitespace, this is then added starting at the
mnemonicendp position.  Lets consider 'retq', first the buffer is
setup like this:

  'r' 'e' 't' 'q' '\0'

Then we add whitespace characters at the '\0', converting the buffer
to this:

  'r' 'e' 't' 'q' ' ' ' ' ' ' '\0'

However, 'nop' is actually an alias for 'xchg %rax,%rax', so,
initially, the buffer is setup like this:

  'x' 'c' 'h' 'g' '\0'

Then in NOP_Fixup we spot that we have an instruction that is an alias
for 'nop', and adjust the buffer to this:

  'n' 'o' 'p' '\0' '\0'

The second '\0' is left over from the original buffer contents.
However, when we rewrite the buffer, we don't afjust mnemonicendp,
which still points at the second '\0' character.

Now, when we insert whitespace we get:

  'n' 'o' 'p' '\0' ' ' ' ' ' ' ' ' '\0'

Notice the whitespace is inserted after the first '\0', so, when we
print the buffer, the whitespace is not printed.

The fix for this is pretty easy, I can change NOP_Fixup to adjust
mnemonicendp, but now a bunch of tests start failing, we now produce
whitespace after the 'nop', which the tests don't expect.

So, I could update the tests to expect the whitespace....

...except I'm not a fan of trailing whitespace, so I'd really rather
not.

Turns out, I can pretty easily update the whitespace emitting code to
spot instructions that have zero operands and just not emit any
whitespace in this case.  So this is what I've done.

I've left in the fix for NOP_Fixup, I think updating mnemonicendp is
probably a good thing, though this is not really required any more.

I've then updated all the tests that I saw failing to adjust the
expected patterns to account for the change in whitespace.

[1] https://sourceware.org/pipermail/binutils/2022-April/120610.html
2022-05-27 14:12:33 +01:00

44 lines
1.7 KiB
D

#source: ibt-plt-3.s
#as: --x32
#ld: -shared -m elf32_x86_64 -z ibt --hash-style=sysv -z max-page-size=0x200000 -z noseparate-code
#objdump: -dw
.*: +file format .*
Disassembly of section .plt:
[a-f0-9]+ <.plt>:
+[a-f0-9]+: ff 35 ([0-9a-f]{2} ){4}[ ]+push 0x[a-f0-9]+\(%rip\) # [a-f0-9]+ <_GLOBAL_OFFSET_TABLE_\+0x8>
+[a-f0-9]+: ff 25 ([0-9a-f]{2} ){4}[ ]+jmp \*0x[a-f0-9]+\(%rip\) # [a-f0-9]+ <_GLOBAL_OFFSET_TABLE_\+0x10>
+[a-f0-9]+: 0f 1f 40 00 nopl 0x0\(%rax\)
+[a-f0-9]+: f3 0f 1e fa endbr64
+[a-f0-9]+: 68 00 00 00 00 push \$0x0
+[a-f0-9]+: e9 e2 ff ff ff jmp [a-f0-9]+ <bar1@plt-0x30>
+[a-f0-9]+: 66 90 xchg %ax,%ax
+[a-f0-9]+: f3 0f 1e fa endbr64
+[a-f0-9]+: 68 01 00 00 00 push \$0x1
+[a-f0-9]+: e9 d2 ff ff ff jmp [a-f0-9]+ <bar1@plt-0x30>
+[a-f0-9]+: 66 90 xchg %ax,%ax
Disassembly of section .plt.sec:
[a-f0-9]+ <bar1@plt>:
+[a-f0-9]+: f3 0f 1e fa endbr64
+[a-f0-9]+: ff 25 ([0-9a-f]{2} ){4}[ ]+jmp \*0x[a-f0-9]+\(%rip\) # [a-f0-9]+ <bar1>
+[a-f0-9]+: 66 0f 1f 44 00 00 nopw 0x0\(%rax,%rax,1\)
[a-f0-9]+ <bar2@plt>:
+[a-f0-9]+: f3 0f 1e fa endbr64
+[a-f0-9]+: ff 25 ([0-9a-f]{2} ){4}[ ]+jmp \*0x[a-f0-9]+\(%rip\) # [a-f0-9]+ <bar2>
+[a-f0-9]+: 66 0f 1f 44 00 00 nopw 0x0\(%rax,%rax,1\)
Disassembly of section .text:
[a-f0-9]+ <foo>:
+[a-f0-9]+: 48 83 ec 08 sub \$0x8,%rsp
+[a-f0-9]+: e8 e7 ff ff ff call [a-f0-9]+ <bar2@plt>
+[a-f0-9]+: 48 83 c4 08 add \$0x8,%rsp
+[a-f0-9]+: e9 ce ff ff ff jmp [a-f0-9]+ <bar1@plt>
#pass