It makes no sense for insn operand predicates, as long as they accept a
register operand, to be more restrictive than the set of the associated
constraints, because expand will choose the insn based on the relevant
operand being a pseudo register then and reload will keep it happily as
an immediate if a constraint permits it. So the restriction posed by
such a predicate will be happily ignored, and moreover if a splitter is
added, such as required for MODE_CC support, the new instructions will
reject the original operands supplied, causing an ICE like below:
.../gcc/testsuite/gfortran.dg/graphite/PR67518.f90:44:0: Error: could not split insn
(insn 90 662 663 (set (reg:DI 10 %r10 [orig:97 _235 ] [97])
(mult:DI (sign_extend:DI (mem/c:SI (plus:SI (reg/f:SI 13 %fp)
(const_int -800 [0xfffffffffffffce0])) [14 %sfp+-800 S4 A32]))
(sign_extend:DI (const_int -51 [0xffffffffffffffcd])))) 299 {mulsidi3}
(expr_list:REG_EQUAL (mult:DI (sign_extend:DI (subreg:SI (mem/c:DI (plus:SI (reg/f:SI 13 %fp)
(const_int -800 [0xfffffffffffffce0])) [14 %sfp+-800 S8 A32]) 0))
(const_int -51 [0xffffffffffffffcd]))
(nil)))
during RTL pass: final
.../gcc/testsuite/gfortran.dg/graphite/PR67518.f90:44:0: internal compiler error: in final_scan_insn_1, at final.c:3073
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://gcc.gnu.org/bugs/> for instructions.
Change the predicates used with the widening multiply and multiply-add
insns to allow immediates then, just as the constraints and the machine
instructions produced permit.
Also give the insns names, for easier reference here and elsewhere.
gcc/
* config/vax/vax.md (mulsidi3): Fix the multiplicand predicates.
(*maddsidi4, *maddsidi4_const): Likewise. Name insns.
It makes no sense for insn operand predicates, as long as they accept a
register operand, to be more restrictive than the set of the associated
constraints, because expand will choose the insn based on the relevant
operand being a pseudo register then and reload keep it happily as a
memory reference if a constraint permits it. So the restriction posed
by such a predicate will be happily ignored, and moreover if a splitter
is added, such as required for MODE_CC support, the new instructions
will reject the original operands supplied, causing an ICE. An actual
example will be given with a subsequent change.
Therefore, similarly to EXTV/EXTZV/INSV insns, remove inconsistencies
with predicates and constraints of bit-field comparison insns, observing
that a bit-field located in memory is byte-addressed by the respective
machine instructions and therefore SImode may only be used with a
register or an offsettable memory operand (i.e. not an indexed,
pre-decremented, or post-incremented one).
Also give the insns names, for easier reference here and elsewhere.
gcc/
* config/vax/vax.md (*cmpv_2): Name insn.
(*cmpv, *cmpzv, *cmpzv_2): Likewise. Fix location predicate and
constraint.
With the `*insv_aligned', `*extzv_aligned' and `*extv_aligned' insns we
are going to adjust the bit-field location if it is in memory, so only
allow such location addresses that can be offset, excluding external
symbol references in the PIC mode in particular.
This fixes an ICE like:
during RTL pass: final
In file included from .../gcc/testsuite/gcc.dg/torture/vshuf-v16qi.c:11:
.../gcc/testsuite/gcc.dg/torture/vshuf-main.inc: In function 'test_13':
.../gcc/testsuite/gcc.dg/torture/vshuf-main.inc:27:1: internal compiler error: in change_address_1, at emit-rtl.c:2275
.../gcc/testsuite/gcc.dg/torture/vshuf-16.inc:16:1: note: in expansion of macro 'T'
.../gcc/testsuite/gcc.dg/torture/vshuf-main.inc:28:1: note: in expansion of macro 'TESTS'
0x10a34b33 change_address_1
.../gcc/emit-rtl.c:2275
0x10a358af adjust_address_1(rtx_def*, machine_mode, poly_int<1u, long>, int, int, int, poly_int<1u, long>)
.../gcc/emit-rtl.c:2409
0x11d2505f output_97
.../gcc/config/vax/vax.md:806
0x10adec4b get_insn_template(int, rtx_insn*)
.../gcc/final.c:2070
0x10ae1c5b final_scan_insn_1
.../gcc/final.c:3039
0x10ae2257 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
.../gcc/final.c:3152
0x10ade9a3 final_1
.../gcc/final.c:2020
0x10ae6157 rest_of_handle_final
.../gcc/final.c:4658
0x10ae6697 execute
.../gcc/final.c:4736
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
compiler exited with status 1
FAIL: gcc.dg/torture/vshuf-v16qi.c -O2 (internal compiler error)
triggered by an RTL instruction like:
(insn 97 96 98 (set (reg:SI 5 %r5 [88])
(zero_extract:SI (mem/c:SI (symbol_ref:SI ("b") <var_decl 0x7ffff7f801b0 b>) [0 b+0 S4 A128])
(const_int 8 [0x8])
(const_int 24 [0x18]))) ".../gcc/testsuite/gcc.dg/torture/vshuf-main.inc":28:1 97 {*extzv_aligned}
(nil))
and removes these regressions:
FAIL: gcc.dg/torture/vshuf-v16qi.c -O2 (internal compiler error)
FAIL: gcc.dg/torture/vshuf-v16qi.c -O2 (test for excess errors)
FAIL: gcc.dg/torture/vshuf-v4hi.c -O2 (internal compiler error)
FAIL: gcc.dg/torture/vshuf-v4hi.c -O2 (test for excess errors)
FAIL: gcc.dg/torture/vshuf-v8hi.c -O2 (internal compiler error)
FAIL: gcc.dg/torture/vshuf-v8hi.c -O2 (test for excess errors)
FAIL: gcc.dg/torture/vshuf-v8qi.c -O2 (internal compiler error)
FAIL: gcc.dg/torture/vshuf-v8qi.c -O2 (test for excess errors)
However expand typically presents pseudo-registers rather than memory
references to these insns, so a further rework is required to make a
better use of the code variant they are supposed to produce. This at
least fixes the problem at hand.
gcc/
* config/vax/vax.md (*insv_aligned, *extzv_aligned)
(*extv_aligned): Also make sure the memory address of a bit-field
location can be adjusted in the PIC mode.
It makes no sense for insn operand predicates, as long as they accept a
register operand, to be more restrictive than the set of the associated
constraints, because expand will choose the insn based on the relevant
operand being a pseudo register then and reload keep it happily as a
memory reference if a constraint permits it. So the restriction posed
by such a predicate will be happily ignored, and moreover if a splitter
is added, such as required for MODE_CC support, the new instructions
will reject the original operands supplied, causing an ICE. An actual
example will be given with a subsequent change.
Remove such inconsistencies we have with the EXTV/EXTZV/INSV insns then,
observing that a bit-field located in memory is byte-addressed by the
respective machine instructions and therefore SImode may only be used
with a register or an offsettable memory operand (i.e. not an indexed,
pre-decremented, or post-incremented one), which has already been taken
into account with the constraints currently used, except for `*insv_2'.
The QI machine mode may be used for the bit-field location with any kind
of memory operand, but we got the constraint wrong, although harmlessly
in reality, with `*insv'. Fix that for consistency though.
Also give the insns names, for easier reference here and elsewhere.
gcc/
* config/vax/vax.md (*insv_aligned, *extzv_aligned)
(*extv_aligned, *extv_non_const, *extzv_non_const): Name insns.
Fix location predicate.
(*extzv): Name insn.
(*insv): Likewise. Fix location constraint.
(*insv_2): Likewise, and the predicate.
The MOVC3 machine instruction has `memmove' semantics[1]:
"The operation of the instruction is such that overlap of the source and
destination strings does not affect the result."
so use it to provide the `movmemhi' instruction as well.
References:
[1] DEC STD 032-0 "VAX Architecture Standard", Digital Equipment
Corporation, A-DS-EL-00032-00-0 Rev J, December 15, 1989, Section
3.10 "Character-String Instructions", p. 3-162
gcc/
* config/vax/vax.md (cpymemhi1): Rename insn to...
(movmemhi1): ... this.
(cpymemhi): Update accordingly. Remove constraints.
(movmemhi): New expander.
gcc/testsuite/
* gcc.target/vax/movmem.c: New test.
The middle end does not refer to `ctzqi2'/`ctzhi2' or `ffsqi2'/`ffshi2'
patterns by name where `__builtin_ctz' or `__builtin_ffs' respectively
is invoked for an argument of the QImode or HImode type, and instead it
extends the data type before passing it to `ctzsi2' or `ffssi2'.
Avoid the redundant operation and use a peephole2 to convert it to the
right RTL expression that will collapse the two operations into a single
machine instruction instead unless we need the extended intermediate
result for another purpose.
gcc/
* config/vax/builtins.md: Add a peephole2 for QImode and HImode
`ctz' operations.
(any_extend): New code iterator.
gcc/testsuite/
* gcc.target/vax/ctzhi.c: New test.
* gcc.target/vax/ctzqi.c: New test.
* gcc.target/vax/ffshi.c: New test.
* gcc.target/vax/ffsqi.c: New test.
The FFS machine instruction provides for arbitrary input bit-field widths
so take advantage of this and convert `ffssi2' and `ctzsi2' to templates
for all the three of QI, HI, SI machine modes.
Test cases will be added separately.
gcc/
* config/vax/builtins.md (width): New mode attribute.
(ffssi2): Rework expander into...
(ffs<mode>2): ... this.
(ctzsi2): Rework insn into...
(ctz<mode>2): ... this.
Our `ffssi2_internal' pattern and the machine FFS instruction, which
technically is a bit-field operation, match the `ctz' operation exactly,
with the result produced for the bit-field source operand of zero equal
to its width as specified with another machine instruction operand, not
directly expressed in RTL and currently hardcoded in the assembly code
produced. In our terms this is the bit size of the machine mode used,
and although it's SImode now let's be flexible for an upcoming change.
The operation also sets the Z condition code according to the value of
the source operand.
gcc/
* config/vax/builtins.md (ffssi2_internal): Rename insn to...
(ctzsi2): ... this. Update the RTL operation.
(ffssi2): Update accordingly.
* config/vax/vax.c (vax_notice_update_cc): Handle CTZ.
* config/vax/vax.h (CTZ_DEFINED_VALUE_AT_ZERO): New macro.
gcc/testsuite/
* gcc.target/vax/ctzsi.c: New test.
Remove an ICE like:
during RTL pass: expand
.../libatomic/tas_n.c: In function 'libat_test_and_set_1':
.../libatomic/tas_n.c:39:1: internal compiler error: in patch_jump_insn, at cfgrtl.c:1298
39 | }
| ^
0x108a09ff patch_jump_insn
.../gcc/cfgrtl.c:1298
0x108a0b07 redirect_branch_edge
.../gcc/cfgrtl.c:1325
0x108a124b rtl_redirect_edge_and_branch
.../gcc/cfgrtl.c:1458
0x1087f6d3 redirect_edge_and_branch(edge_def*, basic_block_def*)
.../gcc/cfghooks.c:373
0x11d6264b try_forward_edges
.../gcc/cfgcleanup.c:562
0x11d6b0eb try_optimize_cfg
.../gcc/cfgcleanup.c:2960
0x11d6ba4f cleanup_cfg(int)
.../gcc/cfgcleanup.c:3174
0x10870b3f execute
.../gcc/cfgexpand.c:6763
triggered with an RTL pattern like:
(jump_insn 8 7 20 2 (parallel [
(set (pc)
(if_then_else (ne (zero_extract:SI (mem/v:QI (mem/f/c:SI (reg/f:SI 16 virtual-incoming-args) [1 mptr+0 S4 A32]) [-1 S1 A8])
(const_int 1 [0x1])
(const_int 0 [0]))
(const_int 0 [0]))
(label_ref 10)
(pc)))
(set (zero_extract:SI (mem/v:QI (mem/f/c:SI (reg/f:SI 16 virtual-incoming-args) [1 mptr+0 S4 A32]) [-1 S1 A8])
(const_int 1 [0x1])
(const_int 0 [0]))
(const_int 1 [0x1]))
]) ".../libatomic/tas_n.c":38:12 -1
(nil)
-> 10)
caused by a volatile memory reference used that is not accepted by the
`memory_operand' predicate of the `jbbssiqi' insn explicitly referred
from the `sync_lock_test_and_setqi' expander. Also seen with:
FAIL: gcc.dg/pr61756.c (internal compiler error)
Define a new `any_memory_operand' predicate accepting both ordinary and
volatile memory references and use it with the `jbb<ccss>i<mode>' insn,
so as to address the ICE.
Also remove useless operations from the `sync_lock_test_and_set<mode>'
and `sync_lock_release<mode>' expanders as those always either complete
or fail and therefore never fall through to using their template other
than to match operands. Wrap `jbb<ccss>i<mode>' into `unspec_volatile'
instead so that the jump does not get removed or reordered. Share one
index to avoid a complication around the iterators since the index is
nowhere referred to anyway and the pattern required pulled by its name.
Test cases will be added separately.
gcc/
* config/vax/predicates.md (volatile_mem_operand)
(any_memory_operand): New predicates.
* config/vax/builtins.md (VUNSPEC_UNLOCK): Remove constant.
(sync_lock_test_and_set<mode>): Remove `set' and `unspec'
operations, match operands only. Reformat.
(sync_lock_release<mode>): Likewise. Remove cruft.
(jbb<ccss>i<mode>): Wrap into `unspec_volatile', use
`any_memory_operand' predicate.
With mode-specific interlocked branch insns already folded into iterated
templates now fold the two templates into one too, observing that the
only difference between them is the value of the bit branched on, which
is of course reflected both in the RTL expression and the instruction
produced. Use an int iterator to iterate over the bit value, making use
of the newly-added wide integer support, and substituting patterns as
necessary to produce equivalent individual insns. No functional change.
gcc/
* config/vax/builtins.md (bit): New int iterator.
(ccss): New int attribute.
(jbbssi<mode>, jbbcci<mode>): Fold insns into...
(jbb<ccss>i<mode>): ... this.
Regardless of the machine mode all the interlocked branches of the same
kind, one of the two provided by the ISA, use the same RTL patterns and
machine instructions, except for the memory operand's constraint.
Remove code duplication then and make use of a mode iterator combined
with an attribute to expand the same insn patterns with the constraint
suitably substituted from a single template. No functional change.
gcc/
* config/vax/builtins.md (bb_mem): New mode attribute.
(jbbssiqi, jbbssihi, jbbssisi): Fold insns into...
(jbbssi<mode>): ... this.
(jbbcciqi, jbbccihi, jbbccisi): Likewise...
(jbbcci<mode>): ... this.
VAX has interlocked branch instructions used for atomic operations and
we want to have them wrapped in UNSPEC_VOLATILE so as not to have code
carried across. This however breaks with jump optimization and leads
to an ICE in the build of libbacktrace like:
.../libbacktrace/mmap.c:190:1: internal compiler error: in fixup_reorder_chain, at cfgrtl.c:3934
190 | }
| ^
0x1087d46b fixup_reorder_chain
.../gcc/cfgrtl.c:3934
0x1087f29f cfg_layout_finalize()
.../gcc/cfgrtl.c:4447
0x1087c74f execute
.../gcc/cfgrtl.c:3662
on RTL like:
(jump_insn 18 17 150 4 (unspec_volatile [
(set (pc)
(if_then_else (eq (zero_extract:SI (mem/v:SI (reg/f:SI 23 [ _2 ]) [-1 S4 A32])
(const_int 1 [0x1])
(const_int 0 [0]))
(const_int 1 [0x1]))
(label_ref 20)
(pc)))
(set (zero_extract:SI (mem/v:SI (reg/f:SI 23 [ _2 ]) [-1 S4 A32])
(const_int 1 [0x1])
(const_int 0 [0]))
(const_int 1 [0x1]))
] 101) ".../libbacktrace/mmap.c":135:14 158 {jbbssisi}
(nil)
-> 20)
when those branches are enabled with a follow-up change. Also showing
with:
FAIL: gcc.dg/pr61756.c (internal compiler error)
Handle branches wrapped in UNSPEC_VOLATILE then and, for consistency,
also in UNSPEC. The presence of UNSPEC_VOLATILE will prevent such
branches from being removed as they won't be accepted by `onlyjump_p',
we just need to let them through.
gcc/
* jump.c (pc_set): Also accept a jump wrapped in UNSPEC or
UNSPEC_VOLATILE.
(any_uncondjump_p, any_condjump_p): Update comment accordingly.
If any unconditional jumps within a block have side effects then the
block cannot be considered empty.
gcc/
* cfgrtl.c (rtl_block_empty_p): Return false if `!onlyjump_p'
too.
Do not try to remove a conditional jump if it has side effects.
gcc/
* sel-sched-ir.c (maybe_tidy_empty_bb): Only try to remove a
conditional jump if `onlyjump_p'.
Ignore jumps that have side effects in loop processing as pasting the
body of a loop multiple times within is semantically equivalent to jump
deletion (between the iterations unrolled) even if we do not physically
delete the jump RTL insn.
gcc/
* loop-iv.c (simplify_using_initial_values): Only process jumps
that match `onlyjump_p'.
(check_simple_exit): Likewise.
Do not convert a conditional jump into conditional execution (and remove
the jump as a consequence) if the jump has side effects.
gcc/
* ifcvt.c (dead_or_predicable) [!IFCVT_MODIFY_TESTS]: Bail out
if `!onlyjump_p'.
Add wide integer aka 'w' rtx format support to int iterators so that
machine description can iterate over `const_int' expressions.
This is made by expanding standard integer aka 'i' format support,
observing that any standard integer already present in any of our
existing RTL code will also fit into HOST_WIDE_INT, so there is no need
for a separate handler. Any truncation of the number parsed is made by
the caller. An assumption is made however that no place relies on
capping out of range values to INT_MAX.
Now the 'p' format is handled explicitly rather than being implied by
rtx being a SUBREG, so actually assert that it is, just to play safe.
gcc/
* read-rtl.c: Add a page-feed separator at the start of iterator
code.
(struct iterator_group): Change the return type to HOST_WIDE_INT
for the `find_builtin' member. Likewise the second parameter
type for the `apply_iterator' member.
(atoll) [!HAVE_ATOQ]: Reorder.
(find_mode, find_code): Change the return type to HOST_WIDE_INT.
(apply_mode_iterator, apply_code_iterator)
(apply_subst_iterator): Change the second parameter type to
HOST_WIDE_INT.
(find_int): Handle input suitable for HOST_WIDE_INT output.
(apply_int_iterator): Rewrite in terms of explicit format
interpretation.
(rtx_reader::read_rtx_operand) <'w'>: Fold into...
<'i', 'n', 'p'>: ... this.
* doc/md.texi (Int Iterators): Document 'w' rtx format support.
The `builtins.md' machine description fragment is not included anywhere
and is therefore dead code, which has become bitrotten due to non-use.
If actually enabled, it does not build due to the use of an unknown `t'
constraint:
.../gcc/config/vax/builtins.md:42:1: error: undefined machine-specific constraint at this point: "t"
.../gcc/config/vax/builtins.md:42:1: note: in operand 1
which came from commit becb93d02cc1 ("builtins.md (ffssi2_internal):
Correct constraint."), which was not applied as posted and reviewed; `T'
was meant to be used instead.
Once this has been fixed this code still fails building:
.../gcc/config/vax/builtins.md: In function 'rtx_def* gen_ffssi2(rtx, rtx)':
.../gcc/config/vax/builtins.md:35:19: error: 'gen_bne' was not declared in this
scope; did you mean 'gen_use'?
35 | emit_jump_insn (gen_bne (label));
| ^~~~~~~
| gen_use
make[2]: *** [Makefile:1122: insn-emit.o] Error 1
Finally the FFS machine instruction sets the Z condition code according
to the comparison of the value held in the source operand against zero
rather than the value held in the target operand. If the source operand
is found hold zero, then the target operand is set to the width of the
source operand, 32 for SImode (FFS supports arbitrary widths).
Correct the build issues then and update RTL to match the operation of
the machine instruction. A test case will be added separately.
gcc/
* config/vax/builtins.md (ffssi2): Make preparation statements
actually buildable.
(ffssi2_internal): Fix input constraints; make the RTL pattern
match reality for `cc0'.
Expression costs are required to be given in terms of COSTS_N_INSNS (n),
which is defined to stand for the count of single fast instructions, and
actually returns `n * 4'. The VAX backend however instead operates on
naked numbers, causing an anomaly for the integer const zero rtx, where
the cost given is 4 as opposed to 1 for integers in the [1:63] range, as
well as -1 for comparisons. This is because the value of 0 returned by
`vax_rtx_costs' is converted to COSTS_N_INSNS (1) in `pattern_cost':
return cost > 0 ? cost : COSTS_N_INSNS (1);
Consequently, where feasible, 1 or -1 are preferred over 0 by the middle
end causing code pessimization, e.g. rather than producing this:
subl2 $4,%sp
movl 4(%ap),%r0
jgtr .L2
addl2 $2,%r0
.L2:
ret
or this:
subl2 $4,%sp
addl3 4(%ap),8(%ap),%r0
jlss .L6
addl2 $2,%r0
.L6:
ret
code is produced like this:
subl2 $4,%sp
movl 4(%ap),%r0
cmpl %r0,$1
jgeq .L2
addl2 $2,%r0
.L2:
ret
or this:
subl2 $4,%sp
addl3 4(%ap),8(%ap),%r0
cmpl %r0,$-1
jleq .L6
addl2 $2,%r0
.L6:
ret
from this:
int
compare_mov (int x)
{
if (x > 0)
return x;
else
return x + 2;
}
and this:
int
compare_add (int x, int y)
{
int z;
z = x + y;
if (z < 0)
return z;
else
return z + 2;
}
respectively, which is slower and larger both at a time.
Furthermore once the backend is converted to MODE_CC this anomaly makes
it usually impossible to remove redundant comparisons in the comparison
elimination pass, because most VAX instructions set the condition codes
as per the relation of the instruction's result to 0 and not -1.
The middle end has some other assumptions as to rtx costs being given in
terms of COSTS_N_INSNS, so wrap all the VAX rtx costs then as they stand
into COSTS_N_INSNS invocations, effectively scaling the costs by 4 while
preserving their relative values, except for the integer const zero rtx
given the value of `COSTS_N_INSNS (1) / 2', half of a fast instruction
(this can be further halved if needed in the future).
Adjust address costs likewise so that they remain proportional to the
new absolute values of rtx costs.
Code size stats are as follows, collected from 17639 executables built
in `check-c' GCC testing:
samples average median
--------------------------------------
regressions 1420 0.400% 0.195%
unchanged 13811 0.000% 0.000%
progressions 2408 -0.504% -0.201%
--------------------------------------
total 17639 -0.037% 0.000%
with a small number of outliers only (over 5% size change):
old new change %change filename
----------------------------------------------------
4991 5249 258 5.1693 981001-1.exe
2637 2777 140 5.3090 interchange-6.exe
2187 2307 120 5.4869 sprintf.x7
3969 4197 228 5.7445 pr28982a.exe
8264 8816 552 6.6795 vector-compare-1.exe
5199 5575 376 7.2321 pr28982b.exe
2113 2411 298 14.1031 20030323-1.exe
2113 2411 298 14.1031 20030323-1.exe
2113 2411 298 14.1031 20030323-1.exe
so it seems we are looking good, and we have complementing reductions
to compensate:
old new change %change filename
----------------------------------------------------
2919 2631 -288 -9.8663 pr57521.exe
3427 3167 -260 -7.5868 sabd_1.exe
2985 2765 -220 -7.3701 ssad-run.exe
2985 2765 -220 -7.3701 ssad-run.exe
2985 2765 -220 -7.3701 usad-run.exe
2985 2765 -220 -7.3701 usad-run.exe
4509 4253 -256 -5.6775 vshuf-v2sf.exe
4541 4285 -256 -5.6375 vshuf-v2si.exe
4673 4417 -256 -5.4782 vshuf-v2df.exe
2993 2841 -152 -5.0785 abs-2.x4
2993 2841 -152 -5.0785 abs-3.x4
This actually causes `loop-8.c' to regress:
FAIL: gcc.dg/loop-8.c scan-rtl-dump-times loop2_invariant "Decided" 1
FAIL: gcc.dg/loop-8.c scan-rtl-dump-not loop2_invariant "without introducing a new temporary register"
but upon a closer inspection this is a red herring. Old code looks as
follows:
.file "loop-8.c"
.text
.align 1
.globl f
.type f, @function
f:
.word 0
subl2 $4,%sp
movl 4(%ap),%r2
movl 8(%ap),%r3
movl $42,(%r2)
clrl %r0
movl $42,%r1
movl %r1,%r4
jbr .L2
.L5:
movl %r4,%r1
.L2:
movl %r1,(%r3)[%r0]
incl %r0
cmpl %r0,$100
jeql .L6
movl $42,(%r2)[%r0]
bicl3 $-2,%r0,%r1
jeql .L5
movl %r0,%r1
jbr .L2
.L6:
ret
.size f, .-f
while new one is like below:
.file "loop-8.c"
.text
.align 1
.globl f
.type f, @function
f:
.word 0
subl2 $4,%sp
movl 4(%ap),%r2
movl $42,(%r2)+
movl 8(%ap),%r1
clrl %r0
movl $42,%r3
movzbl $100,%r4
movl %r3,%r5
jbr .L2
.L5:
movl %r5,%r3
.L2:
movl %r3,(%r1)+
incl %r0
cmpl %r0,%r4
jeql .L6
movl $42,(%r2)+
bicl3 $-2,%r0,%r3
jeql .L5
movl %r0,%r3
jbr .L2
.L6:
ret
.size f, .-f
and is clearly better: not only it is smaller, but it also uses the
post-increment rather than indexed addressing mode in the loop, of
which the former comes for free in terms of both performance and code
size while the latter causes an extra byte per operand to be produced
for the index register and also incurs an execution penalty for the
extra address calculation.
Exclude the case from VAX testing then, as already done for some other
targets and discussed with commit d242fdaec186 ("gcc.dg/loop-8.c: Skip
for mmix.").
gcc/
* config/vax/vax.c (vax_address_cost): Express the cost in terms
of COSTS_N_INSNS.
(vax_rtx_costs): Likewise.
gcc/testsuite/
* gcc.dg/loop-8.c: Exclude for `vax-*-*'.
* gcc.target/vax/compare-add-zero.c: New test.
* gcc.target/vax/compare-mov-zero.c: New test.
It makes sense to use what other targets do and run all the VAX test
cases over all the usual optimization levels, so make `vax.exp' use our
`gcc-dg-runtest' rather than the generic `dg-runtest' test driver.
This breaks `pr56875.c' however, which is optimized away at levels above
`-O0' as a result of how it has been written for calculations to make no
effect:
FAIL: gcc.target/vax/pr56875.c -O1 scan-assembler ashq .*,\\$0xffffffffffffffff,
FAIL: gcc.target/vax/pr56875.c -O2 scan-assembler ashq .*,\\$0xffffffffffffffff,
FAIL: gcc.target/vax/pr56875.c -O3 -g scan-assembler ashq .*,\\$0xffffffffffffffff,
FAIL: gcc.target/vax/pr56875.c -Os scan-assembler ashq .*,\\$0xffffffffffffffff,
FAIL: gcc.target/vax/pr56875.c -O2 -flto -fno-use-linker-plugin -flto-partition=none scan-assembler ashq .*,\\$0xffffffffffffffff,
FAIL: gcc.target/vax/pr56875.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects scan-assembler ashq .*,\\$0xffffffffffffffff,
Rather than keeping it at `-O0' update the test case for its code to do
make effect while retaining its sense. Also reformat it according to
our requirements.
gcc/testsuite/
* gcc.target/vax/vax.exp: Use `gcc-dg-runtest' rather than
`dg-runtest'.
* gcc.target/vax/pr56875.c (dg-options): Make empty.
(a): Rewrite for calculations to make effect. Reformat.
The VAX ELF psABI does not permit the use of all hardware operand modes
for PIC symbol references due to the need to use PC-relative addressing
for symbols that end up local and the need to make references indirect
symbols that end up global.
Therefore symbols referred as immediates may only be used with the move
and push address (MOVA and PUSHA) instructions and their PC-relative
displacement address mode, as there is no genuine PC-relative immediate
available that all the other instructions would have to use.
Furthermore global symbol references must not have an offset applied,
which has to be added with a separate instruction, because there is no
support now for GOT entries for external `symbol+offset' references, so
any indirect GOT references made by the static linker from the original
direct symbol references must not have an addend applied. Consequently
no addend is allowed even if a given external symbol turns out local,
for whatever reason, at the static link time.
Define the LEGITIMATE_PIC_OPERAND_P macro then, a corresponding function
and predicate to exclude the relevant expressions as required, and then
a constraint so that reloads are produced where needed, and use the new
facilities in the machine description, folding corresponding duplicated
patterns for local and external symbols together. Rewrite predicates to
make use of the new function, rename them to match their sense and also
remove ones no longer used.
All this fixing an ICE like this:
during RTL pass: postreload
.../gcc/testsuite/gcc.c-torture/execute/20040709-2.c: In function 'testE':
.../gcc/testsuite/gcc.c-torture/execute/20040709-2.c:89:1: internal compiler error: in reload_combine_note_use, at postreload.c:1559
.../gcc/testsuite/gcc.c-torture/execute/20040709-2.c:96:65: note: in expansion of macro 'T'
0x10fe84cb reload_combine_note_use
.../gcc/postreload.c:1559
0x10fe8857 reload_combine_note_use
.../gcc/postreload.c:1621
0x10fe8303 reload_combine_note_use
.../gcc/postreload.c:1517
0x10fe7c7b reload_combine
.../gcc/postreload.c:1408
0x10fe3417 reload_cse_regs
.../gcc/postreload.c:67
0x10feaf9f execute
.../gcc/postreload.c:2358
due to the presence of a pseudo register post-reload:
(insn 435 228 229 13 (set (reg:SI 1 %r1)
(mem/c:SI (reg/f:SI 341) [25 sE+12 S4 A8])) ".../gcc/testsuite/gcc.c-torture/execute/20040709-2.c":96:65 12 {movsi_2}
(nil))
(due to the use of an offset `sE+12' symbol reference) and removing
these regressions:
FAIL: gcc.c-torture/execute/20040709-2.c -O2 (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-2.c -O2 (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-2.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-2.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-2.c -O3 -g (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-2.c -O3 -g (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-2.c -Os (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-2.c -Os (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-3.c -O2 (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-3.c -O2 (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-3.c -O3 -g (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-3.c -O3 -g (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-3.c -Os (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-3.c -Os (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors)
FAIL: gcc.dg/torture/pr52028.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error)
FAIL: gcc.dg/torture/pr52028.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors)
gcc/
* config/vax/constraints.md (A): New constraint.
* config/vax/predicates.md (external_symbolic_operand)
(external_const_operand): Remove predicates.
(local_symbolic_operand): Rename to...
(pic_symbolic_operand): ... this, and rework.
(external_memory_operand): Rename to...
(non_pic_external_memory_operand): ... this, and rework.
(illegal_blk_memory_operand, illegal_addsub_di_memory_operand):
Update accordingly.
* config/vax/vax-protos.h (vax_acceptable_pic_operand_p): New
prototype.
* config/vax/vax.c (vax_acceptable_pic_operand_p): New function.
(vax_output_int_add): Update according to predicate rework.
* config/vax/vax.h (LEGITIMATE_PIC_OPERAND_P): New macro.
* config/vax/vax.md (pushlclsymreg, pushextsymreg): Fold
together, and rename to...
(*pushsymreg): ... this. Use the `pic_symbolic_operand'
predicate and the `A' constraint for the displacement operand.
(movlclsymreg, movextsymreg): Fold together, and rename to...
(*movsymreg): ... this. Use the `pic_symbolic_operand'
predicate and the `A' constraint for the displacement operand.
(pushextsym, pushlclsym): Fold together, and rename to...
(*pushsym): ... this. Use the `pic_symbolic_operand' predicate
and the `A' constraint for the displacement operand.
(movextsym, movlclsym): Fold together, and rename to...
(*movsym): ... this. Use the `pic_symbolic_operand' predicate
and the `A' constraint for the displacement operand.
The `c' operand format specifier is handled directly by the middle end
in `output_asm_insn':
%cN means require operand N to be a constant
and print the constant expression with no punctuation.
however it resorts to the target for constants that are not valid
addresses:
else if (letter == 'c')
{
if (CONSTANT_ADDRESS_P (operands[opnum]))
output_addr_const (asm_out_file, operands[opnum]);
else
output_operand (operands[opnum], 'c');
}
The VAX backend expects the fallback never to happen and overloads `c'
with the branch condition code. This is confusing however and it is not
like we are short of letters, so instead make the branch condition code
use `k', and then for consistency make `K' the reverse branch condition
code format specifier. This is safe to do as we provide no means to use
a computed branch condition code in user `asm'.
gcc/
* config/vax/vax.c (print_operand): Replace `c' and `C' with
`k' and `K' respectively.
* config/vax/vax.md (*branch, *branch_reversed): Update
accordingly.
Do not allow direct adjustment of pre-header initialization instruction for
count register if is read in some instruction below in that basic block.
gcc/ChangeLog:
PR rtl-optimization/97421
* modulo-sched.c (generate_prolog_epilog): Remove forward
declaration, adjust last argument name and type.
(const_iteration_count): Add bool pointer parameter to return
whether count register is read in pre-header after its
initialization.
(sms_schedule): Fix count register initialization adjustment
procedure according to what const_iteration_count said.
gcc/testsuite/ChangeLog:
PR rtl-optimization/97421
* gcc.c-torture/execute/pr97421-1.c: New test.
* gcc.c-torture/execute/pr97421-2.c: New test.
* gcc.c-torture/execute/pr97421-3.c: New test.
2020-12-05 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/98016
* resolve.c (resolve_symbol): Set formal_arg_flag before
resolving an array spec and restore value afterwards.
gcc/testsuite/
PR fortran/98016
* gfortran.dg/pr98016.f90: New test.
The change in major version (and the increment from Darwin19 to 20)
caused libtool tests to fail which resulted in incorrect build settings
for shared libraries.
We take this opportunity to sort out the shared undefined symbols state
rather than propagating the current unsound behaviour into a new rev.
This change means that we default to the case that missing symbols are
considered an error, and if one wants to allow this intentionally, the
confiuration for that case should be set appropriately.
Three existing cases need undefined dynamic lookup:
libitm, where there is already a configuration mechanism to add the
flags.
libcc1, where we add simple configuration to add the flags for Darwin.
libsanitizer, where we can add to the existing extra flags.
libcc1/ChangeLog:
PR target/97865
* Makefile.am: Add dynamic_lookup to LD flags for Darwin.
* configure.ac: Test for Darwin host and set a flag.
* Makefile.in: Regenerate.
* configure: Regenerate.
libitm/ChangeLog:
PR target/97865
* configure.tgt: Add dynamic_lookup to XLDFLAGS for Darwin.
* configure: Regenerate.
libsanitizer/ChangeLog:
PR target/97865
* configure.tgt: Add dynamic_lookup to EXTRA_CXXFLAGS for
Darwin.
* configure: Regenerate.
ChangeLog:
PR target/97865
* libtool.m4: Update handling of Darwin platform link flags
for Darwin20.
gcc/ChangeLog:
PR target/97865
* configure: Regenerate.
libatomic/ChangeLog:
PR target/97865
* configure: Regenerate.
libbacktrace/ChangeLog:
PR target/97865
* configure: Regenerate.
libffi/ChangeLog:
PR target/97865
* configure: Regenerate.
libgfortran/ChangeLog:
PR target/97865
* configure: Regenerate.
libgomp/ChangeLog:
PR target/97865
* configure: Regenerate.
libhsail-rt/ChangeLog:
PR target/97865
* configure: Regenerate.
libobjc/ChangeLog:
PR target/97865
* configure: Regenerate.
libphobos/ChangeLog:
PR target/97865
* configure: Regenerate.
libquadmath/ChangeLog:
PR target/97865
* configure: Regenerate.
libssp/ChangeLog:
PR target/97865
* configure: Regenerate.
libstdc++-v3/ChangeLog:
PR target/97865
* configure: Regenerate.
libvtv/ChangeLog:
PR target/97865
* configure: Regenerate.
zlib/ChangeLog:
PR target/97865
* configure: Regenerate.
Here is the patch to simplify the newly added combine splitters,
when we split into 2 insns anyway, no reason to split into the masking
define_insn_and_split we'd be splitting shortly after.
2020-12-05 Jakub Jelinek <jakub@redhat.com>
PR target/96226
* config/i386/i386.md (splitter after *<rotate_insn><mode>3_mask,
splitter after *<rotate_insn><mode>3_mask_1): Drop the masking from
the patterns to split into.
We currently incorrectly reject the first testcase, because
cxx_fold_indirect_ref_1 doesn't attempt to handle UNION_TYPEs.
As the second testcase shows, it isn't that easy, because I believe we need
to take into account the active member and prefer that active member over
other members, because if we pick a non-active one, we might reject valid
programs.
2020-12-05 Jakub Jelinek <jakub@redhat.com>
PR c++/98122
* constexpr.c (cxx_union_active_member): New function.
(cxx_fold_indirect_ref_1): Add ctx argument, pass it through to
recursive call. Handle UNION_TYPE.
(cxx_fold_indirect_ref): Add ctx argument, pass it to recursive calls
and cxx_fold_indirect_ref_1.
(cxx_eval_indirect_ref): Adjust cxx_fold_indirect_ref calls.
* g++.dg/cpp1y/constexpr-98122.C: New test.
* g++.dg/cpp2a/constexpr-98122.C: New test.
We were using the old name, but nothing noticed because it is a weak
reference that is permitted to be nil, so that it works with code that
does not use the field tracking library.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/275449
The check in do_class_deduction to handle passing one class placeholder
template parm as an argument for itself needed to be extended to also handle
equivalent parms from other templates.
gcc/cp/ChangeLog:
PR c++/93083
* pt.c (convert_template_argument): Handle equivalent placeholders.
(do_class_deduction): Look through EXPR_PACK_EXPANSION, too.
gcc/testsuite/ChangeLog:
PR c++/93083
* g++.dg/cpp2a/nontype-class40.C: New test.
It looks cleaner if we can use a vec* directly as a range for the C++11
range-based 'for' loop, without needing to indirect from it, and also works
with null pointers.
The change in cp_parser_late_parsing_default_args is an example of how this
can be used to simplify a simple loop over a vector. Reverse or subset
iteration will require adding range adaptors.
I deliberately didn't format the new overloads for etags since they are
trivial.
gcc/ChangeLog:
* vec.h (begin, end): Add overloads for vec*.
* tree.c (build_constructor_from_vec): Remove *.
gcc/cp/ChangeLog:
* decl2.c (clear_consteval_vfns): Remove *.
* pt.c (do_auto_deduction): Remove *.
* parser.c (cp_parser_late_parsing_default_args): Change loop
to use range 'for'.
The recent change to rs6000.c for DWARF in AIX references the macro
PTR_SIZE that only is defined in dwarf2out.c. This patch changes the
reference to the equivalent POINTER_SIZE_UNITS defined in defaults.h.
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_option_override_internal):
Change PTR_SIZE to POINTER_SIZE_UNITS.
We say very little about reads and writes to aggregate /
compound objects, just scalar objects (i.e. assignments don't
cause reads). Let's lets say something safe about aggregate
objects, but only for those that are the same size as a scalar
type.
There's an equal-sounding section (Volatiles) in extend.texi,
but this seems a more appropriate place, as specifying the
behavior of a standard qualifier.
gcc:
2020-12-04 Hans-Peter Nilsson <hp@axis.com>
Martin Sebor <msebor@redhat.com>
PR middle-end/94600
* doc/implement-c.texi (Qualifiers implementation): Add blurb
about access to the whole of a volatile aggregate object, only for
same-size as a scalar object.
As mentioned in the PR, we shouldn't treat non-replaceable operator
new/delete (e.g. with the placement new) as replaceable ones.
There is some pending discussion that perhaps operator delete called from
delete if not replaceable should return some other fnspec, but can we handle
that incrementally, fix this wrong-code and then deal with a missed
optimization? I really don't know what exactly should be returned.
2020-12-04 Jakub Jelinek <jakub@redhat.com>
PR c++/98130
* gimple.c (gimple_call_fnspec): Only return ".co " for replaceable
operator delete or ".mC" for replaceable operator new called from
new/delete.
* g++.dg/opt/pr98130.C: New test.
As mentioned in the PR, we can combine ~(1 << x) into -2 r<< x, but we give
up in the ~(1 << (x & 31)) cases, as *<rotate_insn><mode>3_mask* don't allow
immediate operand 1 and find_split_point prefers to split (x & 31) instead
of the constant.
With these combine splitters we help combine decide how to split those
insns.
2020-12-04 Jakub Jelinek <jakub@redhat.com>
PR target/96226
* config/i386/i386.md (splitter after *<rotate_insn><mode>3_mask,
splitter after *<rotate_insn><mode>3_mask_1): New combine splitters.
* gcc.target/i386/pr96226.c: New test.
The following testcase is rejected, because when trying to encode a zeroing
CONSTRUCTOR, the code was using build_constructor to build initializers for
the elements but when recursing the function handles CONSTRUCTOR only for
aggregate types.
The following patch fixes that by using build_zero_cst instead for
non-aggregates. Another option would be add handling CONSTRUCTOR for
non-aggregates in native_encode_initializer. Or we can do both, I guess
the middle-end generally doesn't like CONSTRUCTORs for scalar variables, but
am not 100% sure if the FE doesn't produce those sometimes.
2020-12-04 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/93121
* fold-const.c (native_encode_initializer): Use build_zero_cst
instead of build_constructor.
* g++.dg/cpp2a/bit-cast6.C: New test.
The changes reverted here are exposing an existing problem with alias
template comparisons. The typename_type changes are also incomplete,
possibly for similar reasons. It seems safer to revert them, fix the
underlying issue and then move forwards.
The testcases is adjusted to more robustly check the specialization
table, and ICEs with and without the c++ changes.
Revert:
62fb1b9e0da c++: Fix array type dependency [PR 98107]
07589ca2b2c c++: typename_type structural comparison
29ae1d7751 c++: Extend build_array_type API
PR c++/98116
gcc/cp/
* cp-tree.h (comparing_typenames): Delete.
(cplus_build_array_type): Remove default parm.
* pt.c (comparing_typenames): Delete.
(spec_hasher::equal): Don't increment it.
* tree.c (set_array_type_canon): Remove dep parm.
(build_cplus_array_type): Remove dep parm changes.
(cp_build_qualified_type_real): Remove dependent array type
changes.
(strip_typedefs): Likewise.
* typeck.c (structural_comptypes): Revert comparing_typename
changes.
gcc/testsuite/
* g++.dg/template/pr98116.C: Enable robust checking.
This provides the inline predicates about module state, and declares
the functions to be provided.
gcc/cp/
* cp-tree.h: Add various inline module state predicates, and
declare the API that will be provided by modules.cc
The PR88587 fix changes DECL_MODE of vars with vector type during inlining/cloning
when the vars are copied, so that their DECL_MODE matches their TYPE_MODE in
the new function. Unfortunately, the following testcase still ICEs, the var
isn't really used in the new function and so it isn't copied, but becomes
just a nonlocalized var. So we can't adjust its DECL_MODE because it
appears in multiple functions and needs different modes in between them.
The following patch changes the DEBUG_INSN creation to use TYPE_MODE instead
of DECL_MODE for vars with vector types.
2020-12-04 Jakub Jelinek <jakub@redhat.com>
PR target/98100
* cfgexpand.c (expand_gimple_basic_block): For vars with
vector type, use TYPE_MODE rather than DECL_MODE.
* gcc.target/i386/pr98100.c: New test.
The following patch makes the choice between 32-bit and 64-bit DWARF formats
selectable by command line switch, rather than being hardcoded through
DWARF_OFFSET_SIZE macro.
The options themselves don't turn on debug info themselves, so one needs
to use -g -gdwarf64 or similar.
2020-12-04 Jakub Jelinek <jakub@redhat.com>
* common.opt (-gdwarf32, -gdwarf64): New options.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Default
dwarf_offset_size to 8 if not overridden from the command line.
* dwarf2out.c: Change all occurrences of DWARF_OFFSET_SIZE to
dwarf_offset_size.
* doc/invoke.texi (-gdwarf32, -gdwarf64): Document.
gcc/testsuite/ChangeLog:
PR testsuite/98123
* gcc.dg/tree-ssa/if-to-switch-4.c: Add param to make the test
stable on all architectures.
* gcc.dg/tree-ssa/if-to-switch-6.c: Likewise.
* gcc.dg/tree-ssa/if-to-switch-8.c: Likewise.