On Thu, Jun 16, 2022 at 09:32:02PM +0100, Jonathan Wakely wrote:
> It looks like clang has addressed this deficiency now:
>
> https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#usage
Thanks, that is roughly what I'd implement anyway and apparently they have
it already since 2015, we've added the -fsanitize-undefined-trap-on-error
support back in 2014 and didn't change it since then.
As a small divergence from clang, I chose -fsanitize-undefined-trap-on-error
to be a (deprecated) alias for -fsanitize-trap aka -fsanitize-trap=all
rather thn -fsanitize-trap=undefined which seems to be what clang does,
because for a deprecated option it is IMHO more important backwards
compatibility with what gcc did over the past 8 years rather than clang
compatibility.
Some sanitizers (e.g. asan, lsan, tsan) don't support traps,
-fsanitize-trap=address etc. will be rejected (if enabled at the end of
command line), -fno-sanitize-trap= can be specified even for them.
This is similar behavior to -fsanitize-recover=.
One complication is vptr sanitization, which can't easily trap,
as the whole slow path of the checking is inside of libubsan.
Previously, -fsanitize=vptr -fsanitize-undefined-trap-on-error
silently ignored vptr sanitization.
This patch similarly to what clang does will accept
-fsanitize-trap=all or -fsanitize-trap=undefined which enable
the vptr bit as trapping and again that causes silent disabling
of vptr sanitization, while -fsanitize-trap=vptr is rejected
(already during option processing).
2022-06-18 Jakub Jelinek <jakub@redhat.com>
gcc/
* common.opt (flag_sanitize_trap): New variable.
(fsanitize-trap=, fsanitize-trap): New options.
(fsanitize-undefined-trap-on-error): Change into deprecated alias
for -fsanitize-trap=all.
* opts.h (struct sanitizer_opts_s): Add can_trap member.
* opts.cc (finish_options): Complain about unsupported
-fsanitize-trap= options.
(sanitizer_opts): Add can_trap values to all entries.
(get_closest_sanitizer_option): Ignore -fsanitize-trap=
options which have can_trap false.
(parse_sanitizer_options): Add support for -fsanitize-trap=.
For -fsanitize-trap=all, enable
SANITIZE_UNDEFINED | SANITIZE_UNDEFINED_NONDEFAULT. Disallow
-fsanitize-trap=vptr here.
(common_handle_option): Handle OPT_fsanitize_trap_ and
OPT_fsanitize_trap.
* sanopt.cc (maybe_optimize_ubsan_null_ifn): Check
flag_sanitize_trap & SANITIZE_{NULL,ALIGNMENT} instead of
flag_sanitize_undefined_trap_on_error.
* gcc.cc (sanitize_spec_function): Use
flag_sanitize & ~flag_sanitize_trap instead of flag_sanitize
and drop use of flag_sanitize_undefined_trap_on_error in
"undefined" handling.
* ubsan.cc (ubsan_instrument_unreachable): Use
flag_sanitize_trap & SANITIZE_??? instead of
flag_sanitize_undefined_trap_on_error.
(ubsan_expand_bounds_ifn, ubsan_expand_null_ifn,
ubsan_expand_objsize_ifn, ubsan_expand_ptr_ifn,
ubsan_build_overflow_builtin, instrument_bool_enum_load,
ubsan_instrument_float_cast, instrument_nonnull_arg,
instrument_nonnull_return, instrument_builtin): Likewise.
* doc/invoke.texi (-fsanitize-trap=, -fsanitize-trap): Document.
(-fsanitize-undefined-trap-on-error): Document as deprecated
alias of -fsanitize-trap.
gcc/c-family/
* c-ubsan.cc (ubsan_instrument_division, ubsan_instrument_shift):
Use flag_sanitize_trap & SANITIZE_??? instead of
flag_sanitize_undefined_trap_on_error. If 2 sanitizers are involved
and flag_sanitize_trap differs for them, emit __builtin_trap only
for the comparison where trap is requested.
(ubsan_instrument_vla, ubsan_instrument_return): Use
lag_sanitize_trap & SANITIZE_??? instead of
flag_sanitize_undefined_trap_on_error.
gcc/cp/
* cp-ubsan.cc (cp_ubsan_instrument_vptr_p): Use
flag_sanitize_trap & SANITIZE_VPTR instead of
flag_sanitize_undefined_trap_on_error.
gcc/testsuite/
* c-c++-common/ubsan/nonnull-4.c: Use -fsanitize-trap=all
instead of -fsanitize-undefined-trap-on-error.
* c-c++-common/ubsan/div-by-zero-4.c: Use
-fsanitize-trap=signed-integer-overflow instead of
-fsanitize-undefined-trap-on-error.
* c-c++-common/ubsan/overflow-add-4.c: Use -fsanitize-trap=undefined
instead of -fsanitize-undefined-trap-on-error.
* c-c++-common/ubsan/pr56956.c: Likewise.
* c-c++-common/ubsan/pr68142.c: Likewise.
* c-c++-common/ubsan/pr80932.c: Use
-fno-sanitize-trap=all -fsanitize-trap=shift,undefined
instead of -fsanitize-undefined-trap-on-error.
* c-c++-common/ubsan/align-8.c: Use -fsanitize-trap=alignment
instead of -fsanitize-undefined-trap-on-error.
The following testcase ICEs because there is NON_LVALUE_EXPR (location
wrapper) around a VAR_DECL and has TYPE_MODE V2SImode and
SCALAR_INT_TYPE_MODE on that ICEs. Or for -m32 -march=i386 TYPE_MODE
is DImode, but SCALAR_INT_TYPE_MODE still uses the raw V2SImode and ICEs
too.
2022-06-18 Jakub Jelinek <jakub@redhat.com>
PR middle-end/105998
* varasm.cc (narrowing_initializer_constant_valid_p): Check
SCALAR_INT_MODE_P instead of INTEGRAL_MODE_P, also break on
! INTEGRAL_TYPE_P and do the same check also on op{0,1}'s type.
* c-c++-common/pr105998.c: New test.
This patch resolves PR tree-optimization/105835, which is a code quality
(dead code elimination) regression at -O1 triggered/exposed by a recent
change to canonicalize X&-Y as X*Y. The new (shorter) form exposes some
missed optimization opportunities that can be handled by adding some
extra simplifications to match.pd.
One transformation is to simplify "(short)(x ? 65535 : 0)" into the
equivalent "x ? -1 : 0", or more accurately x ? (short)-1 : (short)0",
as INTEGER_CSTs record their type, and integer conversions can be
pushed inside COND_EXPRs reducing the number of gimple statements.
The other transformation is that (short)(X * 65535), where X is [0,1],
into the equivalent (short)X * -1, (or again (short)-1 where tree's
INTEGER_CSTs encode their type). This is valid because multiplications
where one operand is [0,1] are guaranteed not to overflow, and hence
integer conversions can also be pushed inside these multiplications.
These narrowing conversion optimizations can be identified by range
analyses, such as EVRP, but these are only performed at -O2 and above,
which is why this regression is only visible with -O1.
2022-06-18 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR tree-optimization/105835
* match.pd (convert (mult zero_one_valued_p@1 INTEGER_CST@2)):
Narrow integer multiplication by a zero_one_valued_p operand.
(convert (cond @1 INTEGER_CST@2 INTEGER_CST@3)): Push integer
conversions inside COND_EXPR where both data operands are
integer constants.
gcc/testsuite/ChangeLog
PR tree-optimization/105835
* gcc.dg/pr105835.c: New test case.
In this case the STATIC_CAST_EXPR expressions in the call aren't
type nor value dependent, but maybe_constant_value still ICEs on those
when processing_template_decl. Calling fold_non_dependent_expr on it
instead fixes the ICE and folds them to INTEGER_CSTs.
2022-06-17 Jakub Jelinek <jakub@redhat.com>
PR c++/106001
* typeck.cc (build_x_shufflevector): Use fold_non_dependent_expr
instead of maybe_constant_value.
* g++.dg/ext/builtin-shufflevector-4.C: New test.
This patch introduces alpha-specific version of store_data_bypass_p that
ignores TRAP_IF that would result in assertion failure (and internal
compiler error) in the generic store_data_bypass_p function.
While at it, also remove ev4_ist_c reservation, store_data_bypass_p
can handle the patterns with multiple sets since some time ago.
2022-06-17 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/105209
* config/alpha/alpha-protos.h (alpha_store_data_bypass_p): New.
* config/alpha/alpha.cc (alpha_store_data_bypass_p): New function.
(alpha_store_data_bypass_p_1): Ditto.
* config/alpha/ev4.md: Use alpha_store_data_bypass_p instead
of generic store_data_bypass_p.
(ev4_ist_c): Remove insn reservation.
gcc/testsuite/ChangeLog:
PR target/105209
* gcc.target/alpha/pr105209.c: New test.
The mode of pointer argument should equal ptr_mode, not Pmode.
2022-06-17 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/105970
* config/i386/i386.cc (ix86_function_arg): Assert that
the mode of pointer argumet is equal to ptr_mode, not Pmode.
gcc/testsuite/ChangeLog:
PR target/105970
* gcc.target/i386/pr105970.c: New test.
REGNO should not be used with register_operand before reload because
subregs of registers or even subregs of memory match the predicate.
The build with RTL checking enabled does not tolerate REGNO with
non-reg operand.
The patch splits the splitter into two related splitters and uses
(match_dup ...) RTXes instead of REGNO comparisons.
2022-06-17 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/105993
* config/i386/sse.md (vpmov splitter): Use (match_dup ...)
instead of REGNO comparisons in combine splitter.
gcc/testsuite/ChangeLog:
PR target/105993
* gcc.target/i386/pr105993.c: New test.
Sigh, another instance where I incorrectly used XUINT instead of
UINTVAL.
I've also made the code here a little more robust (although I think
this case can't in fact be reached) if the 32-bit clear mask includes
bit 31. This case, if reached, would print out an out-of-range value
based on the size of the compiler's HOST_WIDE_INT type due to
sign-extension. We avoid this by masking the value after inversion.
gcc/ChangeLog:
PR target/106004
* config/arm/arm.cc (arm_print_operand, case 'V'): Use UINTVAL.
Clear bits in the mask above bit 31.
Somehow I pushed a different version of this test to the one I actually
tested.
libstdc++-v3/ChangeLog:
* testsuite/21_strings/basic_string/cons/char/105995.cc: Add
missing #include.
A bug in the ordering of the operands in the mve_mov<mode> pattern
meant that all literal values were being pushed to the literal pool.
This patch fixes that and simplifies some of the logic slightly so
that we can use as simple switch statement.
For example:
void f (uint32_t *a)
{
int i;
for (i = 0; i < 100; i++)
a[i] += 1;
}
Now compiles to:
push {lr}
mov lr, #25
vmov.i32 q2, #0x1 @ v4si
...
instead of
push {lr}
mov lr, #25
vldr.64 d4, .L6
vldr.64 d5, .L6+8
...
.L7:
.align 3
.L6:
.word 1
.word 1
.word 1
.word 1
gcc/ChangeLog:
* config/arm/mve.md (*mve_mov<mode>): Re-order constraints
to avoid spilling trivial literals to the constant pool.
gcc/testsuite/ChangeLog:
* gcc.target/arm/acle/cde-mve-full-assembly.c: Adjust expected
output.
Whilst working on SARIF output I noticed some places where followup notes
weren't being properly associated with their warnings in
gcc/gimple-ssa-warn-access.cc.
Fixed thusly.
gcc/ChangeLog:
* gimple-ssa-warn-access.cc (warn_string_no_nul): Add
auto_diagnostic_group to group any warning with its note.
(maybe_warn_for_bound): Likewise.
(check_access): Likewise.
(warn_dealloc_offset): Likewise.
(pass_waccess::maybe_warn_memmodel): Likewise.
(pass_waccess::maybe_check_dealloc_call): Likewise.
(pass_waccess::warn_invalid_pointer): Likewise.
(pass_waccess::check_dangling_stores): Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Whilst working on SARIF output I noticed some places where followup notes
weren't being properly associated with their errors/warnings in c-decl.cc.
Whilst fixing those I noticed some places where we "inform" after a
"warning" without checking that the warning was actually emitted.
Fixed the various issues seen in gcc/c/c-decl.cc thusly.
gcc/c/ChangeLog:
* c-decl.cc (implicitly_declare): Add auto_diagnostic_group to
group the warning with any note.
(warn_about_goto): Likewise to group error or warning with note.
Bail out if the warning wasn't emitted, to avoid emitting orphan
notes.
(lookup_label_for_goto): Add auto_diagnostic_group to
group the error with the note.
(check_earlier_gotos): Likewise.
(c_check_switch_jump_warnings): Likewise for any error/warning.
Conditionalize emission of the notes.
(diagnose_uninitialized_cst_member): Likewise for warning,
conditionalizing emission of the note.
(grokdeclarator): Add auto_diagnostic_group to group the "array
type has incomplete element type" error with any note.
(parser_xref_tag): Add auto_diagnostic_group to group warnings
with their notes. Conditionalize emission of notes.
(start_struct): Add auto_diagnostic_group to group the
"redefinition of" errors with any note.
(start_enum): Likewise for "redeclaration of %<enum %E%>" error.
(check_for_loop_decls): Likewise for pre-C99 error.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
gcc/analyzer/ChangeLog:
* varargs.cc (va_arg_type_mismatch::emit): Associate the warning
with CWE-686 ("Function Call With Incorrect Argument Type").
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/stdarg-1.c
(__analyzer_called_by_test_type_mismatch_1): Verify that
-Wanalyzer-va-arg-type-mismatch is associated with CWE-686.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
gcc/analyzer/ChangeLog:
* varargs.cc: Include "diagnostic-metadata.h".
(va_list_exhausted::emit): Associate the warning with
CWE-685 ("Function Call With Incorrect Number of Arguments").
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/stdarg-1.c
(__analyzer_called_by_test_not_enough_args): Verify that
-Wanalyzer-va-list-exhausted is associated with CWE-685.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
gcc/analyzer/ChangeLog:
* sm-file.cc (double_fclose::emit): Associate the warning with
CWE-1341 ("Multiple Releases of Same Resource or Handle").
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/file-1.c (test_1): Verify that double-fclose is
associated with CWE-1341.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
While working on PR104642 I noticed this wasn't getting set.
gcc/ChangeLog:
* opts.cc (common_handle_option) [OPT_fsanitize_]: Set
opts_set->x_flag_sanitize.
Current implementation checks whether it has to generate a stub method for a
promoted method of an embedded struct field in Type::build_stub_methods(). If
the promoted method is ambiguous it's simply skipped. But struct types that
can fit in an interface value (e.g. structs that consist of a single pointer
field) get a second chance in Type::build_direct_iface_stub_methods().
This patch adds the same check used by Type::build_stub_methods() to
Type::build_direct_iface_stub_methods().
Fixesgolang/go#52870
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/405974
I don't think this is required by the standard, but it's easy to
support.
libstdc++-v3/ChangeLog:
PR libstdc++/105995
* include/bits/basic_string.h (_M_use_local_data): Initialize
the entire SSO buffer.
* testsuite/21_strings/basic_string/cons/char/105995.cc: New test.
As recently done for std::basic_string, __gnu_cxx::__versa_string
equality comparisons can check lengths first for any character type and
traits type, not only for std::char_traits<char>.
libstdc++-v3/ChangeLog:
PR libstdc++/101482
* include/ext/vstring.h (operator==): Always check lengths
before comparing.
There's no point adding no-op initializer fns (that a module might
have) to the static initializer list. Also, we can add any objc
initializer call to a partial initializer function and simplify some
control flow.
gcc/cp/
* decl2.cc (finish_objects): Add startp parameter, adjust.
(generate_ctor_or_dtor_function): Detect empty fn, and don't
generate unnecessary code. Remove objc startup here ...
(c_parse_final_cleanyps): ... do it here.
gcc/testsuite/
* g++.dg/modules/init-2_b.C: Add init check.
* g++.dg/modules/init-2_c.C: Add init check.
The range of an invariant SSA (no outgoing edge range anywhere) is not tracked.
If an inferred range is registered, remove the invariant flag.
* gimple-range-cache.cc (ranger_cache::apply_inferred_ranges): If name
was invaraint before, clear the invariant bit.
* gimple-range-gori.cc (gori_map::set_range_invariant): Add a flag.
* gimple-range-gori.h (gori_map::set_range_invariant): Adjust prototype.
When evaluating the LHS of a stmt, its more efficent/better to call
value_of_stmt directly rather than value_of_expr.
* tree-ssa-propagate.cc (before_dom_children): Call value_of_stmt.
On the following testcase, we only optimize bar where this optimization
is performed at GENERIC folding time, but on GIMPLE it doesn't trigger
anymore, as we actually don't see
(bit_and (ne @1 min_value) (ge @0 @1))
but
(bit_and (ne @1 min_value) (le @1 @0))
genmatch handles :c modifier not just on commutative operations, but
also comparisons and in that case it means it swaps the comparison.
2022-06-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/105983
* match.pd (y == XXX_MIN || x < y -> x <= y - 1,
y != XXX_MIN && x >= y -> x > y - 1): Use :cs instead of :s
on non-equality comparisons.
* gcc.dg/tree-ssa/pr105983.c: New test.
Earlier in the simplification pattern, we require that @0 has compatible
type to the type of IMAGPART_EXPR, but for @1 which is a non-zero constant
all we require is that it the constant fits into that type.
Later the code checks if the constant is negative, because when min / max
values are divided by negative divisor, lo will be higher than hi.
In the following testcase, @1 has unsigned char type, while @0 has
int type, so @1 which is 254 is wi::neg_p and we were swapping lo and hi,
even when @1 cast to int isn't negative.
We could use tree_int_cst_sgn (@1) < 0 as the check instead and it would
work both for narrower types of @1 and even same or wider ones, but
I've noticed we probably don't want to call fold_convert (TREE_TYPE (@0), @1)
twice and when we save that result in a temporary, we can just use wi::neg_p
on that temporary.
2022-06-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/105984
* match.pd (__builtin_mul_overflow_p (x, cst, (stype) 0) ->
x > stype_max / cst || x < stype_min / cst): fold_convert @1
to TREE_TYPE (@0) just once and test for negative divisor
also on that folded constant instead of on @1.
* gcc.c-torture/execute/pr105984.c: New test.
Both IFN_ATOMIC_BIT_TEST_AND_* and IFN_ATOMIC_*_FETCH_CMP_0 ifns
are matched if their corresponding optab is implemented for the particular
mode. The fact that those optabs are implemented doesn't guarantee
they will succeed though, they can just FAIL in their expansion.
The expansion in that case uses expand_atomic_fetch_op as fallback, but
as has been reported and and can be reproduced on the testcases,
even those can fail and we didn't have any fallback after that.
For IFN_ATOMIC_BIT_TEST_AND_* we actually have such calls. One is
done whenever we lost lhs of the ifn at some point in between matching
it in tree-ssa-ccp.cc and expansion. The following patch for that case
just falls through and expands as if there was a lhs, creates a temporary
for it. For the other expand_atomic_fetch_op call in the same expander
and for the only expand_atomic_fetch_op call in the other, this falls
back the hard way, by constructing a CALL_EXPR to the call from which
the ifn has been matched and expanding that. Either it is lucky and manages
to expand inline, or it emits a libatomic API call.
So that we don't have to rediscover which builtin function to call in the
fallback, we record at tree-ssa-ccp.cc time gimple_call_fn (call) in
an extra argument to the ifn.
2022-06-16 Jakub Jelinek <jakub@redhat.com>
PR middle-end/105951
* tree-ssa-ccp.cc (optimize_atomic_bit_test_and,
optimize_atomic_op_fetch_cmp_0): Remember gimple_call_fn (call)
as last argument to the internal functions.
* builtins.cc (expand_ifn_atomic_bit_test_and): Adjust for the
extra call argument to ifns. If expand_atomic_fetch_op fails for the
lhs == NULL_TREE case, fall through into the optab code with
gen_reg_rtx (mode) as target. If second expand_atomic_fetch_op
fails, construct a CALL_EXPR and expand that.
(expand_ifn_atomic_op_fetch_cmp_0): Adjust for the extra call argument
to ifns. If expand_atomic_fetch_op fails, construct a CALL_EXPR and
expand that.
* gcc.target/i386/pr105951-1.c: New test.
* gcc.target/i386/pr105951-2.c: New test.
This patch adds V1TI mode into a new mode iterator used in vector comparison,shift and rotation expands. It also merges some vector comparison, shift and rotation expands for V1T1 and other vector integer modes as they have the similar patterns. The expands for V1TI only are removed.
gcc/
PR target/103316
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable
gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET,
RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT,
RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI.
* config/rs6000/vector.md (VEC_IC): New mode iterator. Add support
for new Power10 V1TI instructions.
(vec_cmp<mode><mode>): Set mode iterator to VEC_IC.
(vec_cmpu<mode><mode>): Likewise.
(vector_nlt<mode>): Set mode iterator to VEC_IC.
(vector_nltv1ti): Remove.
(vector_gtu<mode>): Set mode iterator to VEC_IC.
(vector_gtuv1ti): Remove.
(vector_nltu<mode>): Set mode iterator to VEC_IC.
(vector_nltuv1ti): Remove.
(vector_geu<mode>): Set mode iterator to VEC_IC.
(vector_ngt<mode>): Likewise.
(vector_ngtv1ti): Remove.
(vector_ngtu<mode>): Set mode iterator to VEC_IC.
(vector_ngtuv1ti): Remove.
(vector_gtu_<mode>_p): Set mode iterator to VEC_IC.
(vector_gtu_v1ti_p): Remove.
(vrotl<mode>3): Set mode iterator to VEC_IC. Emit insns for V1TI.
(vrotlv1ti3): Remove.
(vashr<mode>3): Set mode iterator to VEC_IC. Emit insns for V1TI.
(vashrv1ti3): Remove.
gcc/testsuite/
PR target/103316
* gcc.target/powerpc/pr103316.c: New.
* gcc.target/powerpc/fold-vec-cmp-int128.c: New.
Right now, when a \$x escape sequence occures, the
next character after $x is skipped, which is bogus.
The code has very low coverage right now.
gcc/ChangeLog:
* gengtype-state.cc (read_a_state_token): Do not skip extra
character after escaped sequence.
In case where we have 2 equally good candidates like
-ftrivial-auto-var-init=
-Wtrivial-auto-var-init
for -ftrivial-auto-var-init, we should take the candidate that
has a difference in trailing sign symbol.
PR driver/105564
gcc/ChangeLog:
* spellcheck.cc (test_find_closest_string): Add new test.
* spellcheck.h (class best_match): Prefer a difference in
trailing sign symbol.
In rv32 regression test, this cases will report an error:
"cc1: error: ABI requires '-march=rv32'"
Add '-mabi' option will fix this.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105666.c: New options.
Similar for (v + B) * C + D -> C * v + BCD.
Don't simplify it when there's overflow and overflow is UB for type v.
gcc/ChangeLog:
PR tree-optimization/53533
* match.pd: Simplify (B * v + C) * D -> BD * v + CD and
(v + B) * C + D -> C * v + BCD when B,C,D are all INTEGER_CST,
and there's no overflow or !TYPE_OVERFLOW_UNDEFINED.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr53533-1.c: New test.
* gcc.target/i386/pr53533-2.c: New test.
* gcc.target/i386/pr53533-3.c: New test.
* gcc.target/i386/pr53533-4.c: New test.
* gcc.target/i386/pr53533-5.c: New test.
* gcc.dg/vect/slp-11a.c: Adjust testcase.
RTL expansion of substitution to [DS]Cmode hard register includes obstructive
register clobber.
A simplest example:
double _Complex test(double _Complex c) {
return c;
}
will be converted to:
(set (reg:DF 42 [ c ]) (reg:DF 2 a2))
(set (reg:DF 43 [ c+8 ]) (reg:DF 4 a4))
(clobber (reg:DC 2 a2))
(set (reg:DF 2 a2) (reg:DF 42 [ c ]))
(set (reg:DF 4 a4) (reg:DF 43 [ c+8 ]))
(use (reg:DC 2 a2))
(return)
and then finally:
test:
mov a8, a2
mov a9, a3
mov a6, a4
mov a7, a5
mov a2, a8
mov a3, a9
mov a4, a6
mov a5, a7
ret
As you see, it is so ridiculous.
This patch eliminates such clobber in order to prune away the wasted move
instructions by the optimizer:
test:
ret
gcc/ChangeLog:
* config/xtensa/xtensa.md (DSC): New split pattern and mode iterator.
When spilled DFmode registers are reloaded in, once loaded into a pair of
SImode regs and then copied from that regs. Such unwanted reg-reg moves
seems not to be eliminated at the "cprop_hardreg" stage, despite no problem
in output reloads.
Luckily it is easy to resolve such inefficiencies, with the use of peephole2
pattern.
gcc/ChangeLog:
* config/xtensa/predicates.md (reload_operand):
New predicate.
* config/xtensa/xtensa.md: New peephole2 pattern.
This patch offers better RTL representations against straightforward
derivations from some tree optimizers' canonicalized forms.
- rounding up to even, such as '(x + (x & 1))', is canonicalized to
'((x + 1) & -2)', but the former is one instruction less than the latter
in Xtensa ISA.
- signed greater or equal to zero as logical value '((signed)x >= 0)',
is canonicalized to '((unsigned)(x ^ -1) >> 31)', but the equivalent
'(((signed)x >> 31) + 1)' is one instruction less.
gcc/ChangeLog:
* config/xtensa/xtensa.md (*round_up_to_even):
New insn-and-split pattern.
(*signed_ge_zero): Ditto.
This patch introduces support for sibling call optimization, when call0
ABI is in effect.
gcc/ChangeLog:
* config/xtensa/xtensa-protos.h (xtensa_prepare_expand_call,
xtensa_emit_sibcall): New prototypes.
(xtensa_expand_epilogue): Add new argument that specifies whether
or not sibling call.
* config/xtensa/xtensa.cc (TARGET_FUNCTION_OK_FOR_SIBCALL):
New macro definition.
(xtensa_prepare_expand_call): New function in order to share
the common code.
(xtensa_emit_sibcall, xtensa_function_ok_for_sibcall):
New functions.
(xtensa_expand_epilogue): Add new argument sibcall_p and use it
for sibling call handling.
* config/xtensa/xtensa.md (call, call_value):
Use xtensa_prepare_expand_call.
(call_internal, call_value_internal):
Add the condition in order to be disabled if sibling call.
(sibcall, sibcall_value, sibcall_epilogue): New expansions.
(sibcall_internal, sibcall_value_internal): New insn patterns,
and split ones in order to take care of the indirect sibcalls.
gcc/testsuite/ChangeLog:
* gcc.target/xtensa/sibcalls.c: New.
-fanalyzer runs late compared to other code analysis tools, in that in
runs on the partially-optimized gimple-ssa representation. I chose this
point to run in the hope of easy integration with LTO.
As PR analyzer/105962 notes, this means that function inlining can occur
before the -fanalyzer "sees" the user's code. For example given:
void foo (void *p)
{
__builtin_free (p);
}
void bar (void *q)
{
foo (q);
foo (q);
}
Below -O2, -fanalyzer shows the calls and returns:
inline-1.c: In function ‘foo’:
inline-1.c:3:3: warning: double-‘free’ of ‘p’ [CWE-415] [-Wanalyzer-double-free]
3 | __builtin_free (p);
| ^~~~~~~~~~~~~~~~~~
‘bar’: events 1-2
|
| 6 | void bar (void *q)
| | ^~~
| | |
| | (1) entry to ‘bar’
| 7 | {
| 8 | foo (q);
| | ~~~~~~~
| | |
| | (2) calling ‘foo’ from ‘bar’
|
+--> ‘foo’: events 3-4
|
| 1 | void foo (void *p)
| | ^~~
| | |
| | (3) entry to ‘foo’
| 2 | {
| 3 | __builtin_free (p);
| | ~~~~~~~~~~~~~~~~~~
| | |
| | (4) first ‘free’ here
|
<------+
|
‘bar’: events 5-6
|
| 8 | foo (q);
| | ^~~~~~~
| | |
| | (5) returning to ‘bar’ from ‘foo’
| 9 | foo (q);
| | ~~~~~~~
| | |
| | (6) passing freed pointer ‘q’ in call to ‘foo’ from ‘bar’
|
+--> ‘foo’: events 7-8
|
| 1 | void foo (void *p)
| | ^~~
| | |
| | (7) entry to ‘foo’
| 2 | {
| 3 | __builtin_free (p);
| | ~~~~~~~~~~~~~~~~~~
| | |
| | (8) second ‘free’ here; first ‘free’ was at (4)
|
but at -O2, -fanalyzer "sees" this gimple:
void bar (void * q)
{
<bb 2> [local count: 1073741824]:
__builtin_free (q_2(D));
__builtin_free (q_2(D));
return;
}
where "foo" has been inlined away, leading to this unhelpful output:
In function ‘foo’,
inlined from ‘bar’ at inline-1.c:9:3:
inline-1.c:3:3: warning: double-‘free’ of ‘q’ [CWE-415] [-Wanalyzer-double-free]
3 | __builtin_free (p);
| ^~~~~~~~~~~~~~~~~~
‘bar’: events 1-2
|
| 3 | __builtin_free (p);
| | ^~~~~~~~~~~~~~~~~~
| | |
| | (1) first ‘free’ here
| | (2) second ‘free’ here; first ‘free’ was at (1)
where the stack frame information in the execution path suggests that these
events are happening in "bar", in the top stack frame.
This is what the analyzer sees, but I find it hard to decipher such
output. Hence, as a workaround for the fact that -fanalyzer runs so
late, this patch attempts to reconstruct the "true" stack frame
information, and to inject events showing inline calls, based on the
inlining chain information recorded in the location_t values for the events.
Doing so leads to this output at -O2 on the above example (with
-fdiagnostics-show-path-depths):
In function ‘foo’,
inlined from ‘bar’ at inline-1.c:9:3:
inline-1.c:3:3: warning: double-‘free’ of ‘q’ [CWE-415] [-Wanalyzer-double-free]
3 | __builtin_free (p);
| ^~~~~~~~~~~~~~~~~~
‘bar’: events 1-2 (depth 1)
|
| 6 | void bar (void *q)
| | ^~~
| | |
| | (1) entry to ‘bar’
| 7 | {
| 8 | foo (q);
| | ~
| | |
| | (2) inlined call to ‘foo’ from ‘bar’
|
+--> ‘foo’: event 3 (depth 2)
|
| 3 | __builtin_free (p);
| | ^~~~~~~~~~~~~~~~~~
| | |
| | (3) first ‘free’ here
|
<------+
|
‘bar’: event 4 (depth 1)
|
| 9 | foo (q);
| | ^
| | |
| | (4) inlined call to ‘foo’ from ‘bar’
|
+--> ‘foo’: event 5 (depth 2)
|
| 3 | __builtin_free (p);
| | ^~~~~~~~~~~~~~~~~~
| | |
| | (5) second ‘free’ here; first ‘free’ was at (3)
|
reconstructing the calls and returns.
The patch also adds a new option, -fno-analyzer-undo-inlining, which can
be used to disable this reconstruction, restoring the output listed
above (this time with -fdiagnostics-show-path-depths):
In function ‘foo’,
inlined from ‘bar’ at inline-1.c:9:3:
inline-1.c:3:3: warning: double-‘free’ of ‘q’ [CWE-415] [-Wanalyzer-double-free]
3 | __builtin_free (p);
| ^~~~~~~~~~~~~~~~~~
‘bar’: events 1-2 (depth 1)
|
| 3 | __builtin_free (p);
| | ^~~~~~~~~~~~~~~~~~
| | |
| | (1) first ‘free’ here
| | (2) second ‘free’ here; first ‘free’ was at (1)
|
gcc/analyzer/ChangeLog:
PR analyzer/105962
* analyzer.opt (fanalyzer-undo-inlining): New option.
* checker-path.cc: Include "diagnostic-core.h" and
"inlining-iterator.h".
(event_kind_to_string): Handle EK_INLINED_CALL.
(class inlining_info): New class.
(checker_event::checker_event): Move here from checker-path.h.
Store original fndecl and depth, and calculate effective fndecl
and depth based on inlining information.
(checker_event::dump): Emit original depth as well as effective
depth when they differ; likewise for fndecl.
(region_creation_event::get_desc): Use m_effective_fndecl.
(inlined_call_event::get_desc): New.
(inlined_call_event::get_meaning): New.
(checker_path::inject_any_inlined_call_events): New.
* checker-path.h (enum event_kind): Add EK_INLINED_CALL.
(checker_event::checker_event): Make protected, and move
definition to checker-path.cc.
(checker_event::get_fndecl): Use effective fndecl.
(checker_event::get_stack_depth): Use effective stack depth.
(checker_event::get_logical_location): Use effective stack depth.
(checker_event::get_original_stack_depth): New.
(checker_event::m_fndecl): Rename to...
(checker_event::m_original_fndecl): ...this.
(checker_event::m_depth): Rename to...
(checker_event::m_original_depth): ...this.
(checker_event::m_effective_fndecl): New field.
(checker_event::m_effective_depth): New field.
(class inlined_call_event): New checker_event subclass.
(checker_path::inject_any_inlined_call_events): New decl.
* diagnostic-manager.cc: Include "inlining-iterator.h".
(diagnostic_manager::emit_saved_diagnostic): Call
checker_path::inject_any_inlined_call_events.
(diagnostic_manager::prune_for_sm_diagnostic): Handle
EK_INLINED_CALL.
* engine.cc (tainted_args_function_custom_event::get_desc): Use
effective fndecl.
* inlining-iterator.h: New file.
gcc/testsuite/ChangeLog:
PR analyzer/105962
* gcc.dg/analyzer/inlining-1-multiline.c: New test.
* gcc.dg/analyzer/inlining-1-no-undo.c: New test.
* gcc.dg/analyzer/inlining-1.c: New test.
* gcc.dg/analyzer/inlining-2-multiline.c: New test.
* gcc.dg/analyzer/inlining-2.c: New test.
* gcc.dg/analyzer/inlining-3-multiline.c: New test.
* gcc.dg/analyzer/inlining-3.c: New test.
* gcc.dg/analyzer/inlining-4-multiline.c: New test.
* gcc.dg/analyzer/inlining-4.c: New test.
* gcc.dg/analyzer/inlining-5-multiline.c: New test.
* gcc.dg/analyzer/inlining-5.c: New test.
* gcc.dg/analyzer/inlining-6-multiline.c: New test.
* gcc.dg/analyzer/inlining-6.c: New test.
* gcc.dg/analyzer/inlining-7-multiline.c: New test.
* gcc.dg/analyzer/inlining-7.c: New test.
gcc/ChangeLog:
PR analyzer/105962
* doc/invoke.texi: Add -fno-analyzer-undo-inlining.
* tree-diagnostic-path.cc (default_tree_diagnostic_path_printer):
Extend -fdiagnostics-path-format=separate-events so that with
-fdiagnostics-show-path-depths it prints fndecls as well as stack
depths.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
gcc/ChangeLog:
* value-relation.h: Add "final" and "override" to relation_oracle
vfunc implementations as appropriate.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
I've been using this tweak to the output of
-fdump-analyzer-exploded-graph in my working copies for a while;
the extra red nodes make it *much* easier to find the places where
diagnostics are being emitted (or rejected by the diagnostic_manager).
gcc/analyzer/ChangeLog:
* diagnostic-manager.cc (saved_diagnostic::dump_dot_id): New.
(saved_diagnostic::dump_as_dot_node): New.
* diagnostic-manager.h (saved_diagnostic::dump_dot_id): New decl.
(saved_diagnostic::dump_as_dot_node): New decl.
* engine.cc (exploded_node::dump_dot): Add nodes for saved
diagnostics.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>