maybe_set_nonzero_bits calls set_nonzero_bits which asserts that
var doesn't have pointer type. While we could punt for those
cases, I think we can handle at least some easy cases.
Earlier in maybe_set_nonzero_bits we've checked this is on
(var & cst) == 0
edge and the other edge is __builtin_unreachable, so if cst
is say 3 as in the testcase, we want to turn it into 4 byte alignment
of the pointer.
2023-01-04 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/108253
* tree-vrp.cc (maybe_set_nonzero_bits): Handle var with pointer
types.
* g++.dg/opt/pr108253.C: New test.
We ICE on the following testcase, because a valid V2DImode
!= comparison is folded into an unsupported V2DImode > comparison.
The match.pd pattern which does this looks like:
/* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z
where ~Y + 1 == pow2 and Z = ~Y. */
(for cst (VECTOR_CST INTEGER_CST)
(for cmp (eq ne)
icmp (le gt)
(simplify
(cmp (bit_and:c@2 @0 cst@1) integer_zerop)
(with { tree csts = bitmask_inv_cst_vector_p (@1); }
(if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
(with { auto optab = VECTOR_TYPE_P (TREE_TYPE (@1))
? optab_vector : optab_default;
tree utype = unsigned_type_for (TREE_TYPE (@1)); }
(if (target_supports_op_p (utype, icmp, optab)
|| (optimize_vectors_before_lowering_p ()
&& (!target_supports_op_p (type, cmp, optab)
|| !target_supports_op_p (type, BIT_AND_EXPR, optab))))
(if (TYPE_UNSIGNED (TREE_TYPE (@1)))
(icmp @0 { csts; })
(icmp (view_convert:utype @0) { csts; })))))))))
and that optimize_vectors_before_lowering_p () guarded stuff there
already deals with this problem, not trying to fold a supported comparison
into a non-supported one. The reason it doesn't work in this case is that
it isn't GIMPLE folding which does this, but GENERIC folding done during
forwprop4 - forward_propagate_into_comparison -> forward_propagate_into_comparison_1
-> combine_cond_expr_cond -> fold_binary_loc -> generic_simplify
and we simply assumed that GENERIC folding happens only before
gimplification.
The following patch fixes that by checking cfun properties instead of
always returning true in those cases.
2023-01-04 Jakub Jelinek <jakub@redhat.com>
PR middle-end/108237
* generic-match-head.cc: Include tree-pass.h.
(canonicalize_math_p, optimize_vectors_before_lowering_p): Define
to false if cfun and cfun->curr_properties has PROP_gimple_opt_math
resp. PROP_gimple_lvec property set.
* gcc.c-torture/compile/pr108237.c: New test.
We shouldn't narrow multiplications originally done in signed types,
because the original multiplication might overflow but the narrowed
one will be done in unsigned arithmetics and will never overflow.
2023-01-04 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/108256
* convert.cc (do_narrow): Punt for MULT_EXPR if original
type doesn't wrap around and -fsanitize=signed-integer-overflow
is on.
* fold-const.cc (fold_unary_loc) <CASE_CONVERT>: Likewise.
* c-c++-common/ubsan/pr108256.c: New test.
C++ Modules do not work reliably on AIX. This patch disables the
modules portion of the testsuite on AIX.
IBM128 float keywords not enabled for AIX, so skip this test.
gcc/testsuite/ChangeLog:
* g++.dg/modules/modules.exp: Skip on AIX.
* gcc.target/powerpc/pr99708.c: Skip on AIX.
SIMD clones are created during the IPA phase when it is not known whether
or not the vectorizer can use them. Clones for functions with external
linkage are part of the ABI, but local clones can be GC'ed if no calls are
found in the compilation unit after vectorization.
gcc/ChangeLog
* cgraph.h (struct cgraph_node): Add gc_candidate bit, modify
default constructor to initialize it.
* cgraphunit.cc (expand_all_functions): Save gc_candidate functions
for last and iterate to handle recursive calls. Delete leftover
candidates at the end.
* omp-simd-clone.cc (simd_clone_create): Set gc_candidate bit
on local clones.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Clear
gc_candidate bit when a clone is used.
gcc/testsuite/ChangeLog
* g++.dg/gomp/target-simd-clone-1.C: Tweak to test
that the unused clone is GC'ed.
* gcc.dg/gomp/target-simd-clone-1.c: Likewise.
The parameters fs->data_align and fs->code_align always have fixed
values for a particular target in GCC-generated code. Specialize
execute_cfa_program for these values, to avoid multiplications.
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Define
__LIBGCC_DWARF_CIE_DATA_ALIGNMENT__.
libgcc/
* unwind-dw2-execute_cfa.h: New file. Extracted from
the execute_cfa_program function in unwind-dw2.c.
* unwind-dw2.c (execute_cfa_program_generic): New function.
(execute_cfa_program_specialized): Likewise.
(execute_cfa_program): Call execute_cfa_program_specialized
or execute_cfa_program_generic, as appropriate.
Break the _FORTIFY_SOURCE-specific builtins out into a separate
subsection from Object Size Checking built-ins and mention
_FORTIFY_SOURCE in there so that the link between the object size
checking builtins, the helper builtins (e.g. __builtin___memcpy_chk) and
_FORTIFY_SOURCE is clearer.
gcc/ChangeLog:
PR tree-optimization/105043
* doc/extend.texi (Object Size Checking): Split out into two
subsections and mention _FORTIFY_SOURCE.
Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
This patch modifies the way that ix86_expand_int_movcc generates RTL,
to allow the condition mask to be shared/reused between multiple
conditional move sequences. Such redundancy is common when RTL
if-conversion transforms non-trivial basic blocks.
As a motivating example, consider the new test case:
int a, b, c, d;
int foo(int x)
{
if (x == 0) {
a = 3;
b = 1;
c = 4;
d = 1;
} else {
a = 5;
b = 9;
c = 2;
d = 7;
}
return x;
}
This is currently compiled, with -O2, to:
foo: cmpl $1, %edi
movl %edi, %eax
sbbl %edi, %edi
andl $-2, %edi
addl $5, %edi
cmpl $1, %eax
sbbl %esi, %esi
movl %edi, a(%rip)
andl $-8, %esi
addl $9, %esi
cmpl $1, %eax
sbbl %ecx, %ecx
movl %esi, b(%rip)
andl $2, %ecx
addl $2, %ecx
cmpl $1, %eax
sbbl %edx, %edx
movl %ecx, c(%rip)
andl $-6, %edx
addl $7, %edx
movl %edx, d(%rip)
ret
Notice that the if-then-else blocks have been if-converted into four
conditional move sequences/assignments, each consisting of cmpl, sbbl,
andl and addl. However, as the conditions are the same, the cmpl and
sbbl instructions used to generate the mask could be shared by CSE.
This patch enables that so that we now generate:
foo: cmpl $1, %edi
movl %edi, %eax
sbbl %edx, %edx
movl %edx, %edi
movl %edx, %esi
movl %edx, %ecx
andl $-6, %edx
andl $-2, %edi
andl $-8, %esi
andl $2, %ecx
addl $7, %edx
addl $5, %edi
addl $9, %esi
addl $2, %ecx
movl %edx, d(%rip)
movl %edi, a(%rip)
movl %esi, b(%rip)
movl %ecx, c(%rip)
ret
Notice, the code now contains only a single cmpl and a single sbbl,
with result being shared (via movl).
2023-01-03 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_int_movcc): Rewrite
RTL expansion to allow condition (mask) to be shared/reused,
by avoiding overwriting pseudos and adding REG_EQUAL notes.
gcc/testsuite/ChangeLog
* gcc.target/i386/cmov10.c: New test case.
This patch addresses PR target/108229, which is a change in code
generation during the STV pass, due to the recently approved patch
to handle vec_select (reductions) in the vector unit. The recent
change is innocent, but exposes a latent STV "gain" calculation issue
that is benign (or closely balanced) on most microarchitectures.
The issue is when STV considers converting PLUS with a MEM operand.
On TARGET_64BIT (m=1):
addq 24(%rdi), %rdx // 4 bytes
or with -m32 (m=2)
addl 24(%esi), %eax // 3 bytes
adcl 28(%esi), %edx // 3 bytes
is being converted by STV to
vmovq 24(%rdi), %xmm5 // 5 bytes
vpaddq %xmm5, %xmm4, %xmm4 // 4 bytes
The current code in general_scalar_chain::compute_convert_gain
considers that scalar unit addition is replaced with a vector
unit addition (usually about the same cost), but doesn't consider
anything special about MEM operands, assuming that a scalar load
gains/costs nothing compared to a vector load. We can allow the
backend slightly better fine tuning by including in the gain
calculation that m scalar loads are being replaced by one vector
load, and when optimizing for size including that we're increasing
code size (e.g. an extra vmovq instruction for a MEM operand).
This patch is a win on the CSiBE benchmark when compiled with -Os.
2023-01-03 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/108229
* config/i386/i386-features.cc
(general_scalar_chain::compute_convert_gain) <case PLUS>: Consider
the gain/cost of converting a MEM operand.
The following testcase ICEs on s390x-linux (e.g. with -march=z13).
The problem is that target is (subreg/s/u:SI (reg/v:DI 66 [ x+-4 ]) 4)
and we call convert_move from temp to the SUBREG_REG of that, expecting
to extend the value properly. That works nicely if temp has some
scalar integer mode (or partial one), but ICEs when temp has V4QImode
on the assertion that from and to modes have the same bitsize.
store_expr generally allows say store from V4QI to SI target because
they have the same size and if temp is a CONST_INT, we already have code
to convert the constant properly, so the following patch just adds handling
of non-scalar integer modes by converting them to the mode of target
first before convert_move extends them.
2023-01-03 Jakub Jelinek <jakub@redhat.com>
PR middle-end/108264
* expr.cc (store_expr): For stores into SUBREG_PROMOTED_* targets
from source which doesn't have scalar integral mode first convert
it to outer_mode.
* gcc.dg/pr108264.c: New test.
The following testcase distilled from Linux kernel on ppc64le ICEs,
because fixup_reorder_chain sees a bb with a single fallthru edge
falling into a bb with simple return and decides to redirect
that fallthru edge to EXIT. That is possible if the bb ending
in the fallthru edge doesn't end with a jump or ends with a normal
unconditional jump, but not when the bb ends with asm goto which can despite
a single fallthru have multiple labels to the fallthrough basic block.
The following patch makes sure we never try to redirect such cases to EXIT.
2023-01-03 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/108263
* cfgrtl.cc (fixup_reorder_chain): Avoid trying to redirect
asm goto to EXIT.
* gcc.dg/pr108263.c: New test.
This commit fixes a small bug where GNAT would emit unescaped quotes in
its -fdiagnostics-format=json output when using -gnatdJ and emitting
messages about operator functions (e.g. "=").
gcc/ada/
* errout.adb (Write_JSON_Span): Escape subprogram name.
Before this commit, when GNAT needed to emit lines longer than
the buffer, it accidentally inserted a newline in its output when
attempting to flush its buffer.
We fix this by using Flush_Buffer instead of Write_Eol in Write_Char.
gcc/ada/
* output.adb (Write_Buffer): Use Flush_Buffer instead of Write_Eol.
Before this patch, passing a width and a precision through
arguments with the "*" syntax always failed for real values in
GNAT.Formatted_String's routines.
gcc/ada/
* libgnat/g-forstr.adb (P_Flt_Format): Add "*" syntax handling.
Before this patch, GNAT.Formatted_String.Formatted_String failed to
handle format strings with two or more specifiers whose widths were
specified with the "*" syntax. This patch makes the parser
correctly reset its bits of state related to width and precision
parsing when needed.
gcc/ada/
* libgnat/g-forstr.adb (P_Int_Format): Fix parsing bug.
Various parts of the expander and the code generator must have a consistent
view on which temporaries generated for return statements must be finalized
because they are regular temporaries, and which ones must not be since they
are allocated on the return stack directly. The Is_Related_To_Func_Return
predicate is used for this purpose and needs to be tested consistently.
gcc/ada/
* exp_ch6.adb (Expand_Simple_Function_Return): Make sure that a
captured function call also verifies Is_Related_To_Func_Return.
Do not generate an actual subtype for special return objects.
* exp_util.ads (Is_Related_To_Func_Return): Add commentary.
Before this patch, format strings ending with "%%" (two consecutive
percent signs) caused GNAT.Formatted_String."-" to give the wrong
output, and cause the various GNAT.Formatted_String."&" to raise
exceptions with misleading error messages.
Also before this patch, a bug in GNAT.Formatted_String."-" caused
characters from the format string to be dropped. Calling
GNAT.Formatted_String."-" on an instance of
GNAT.Formatted_String.Formatted_String caused subsequent uses of
that instance to return wrong results.
In addition to fixing the parsing of format strings, this patch
centralizes the detection of format specifiers in a unique
procedure.
gcc/ada/
* libgnat/g-forstr.adb
(Advance_And_Accumulate_Until_Next_Specifier): New procedure.
("-"): Replace inline code with call to
Advance_And_Accumulate_Until_Next_Specifier.
(Next_Format): likewise.
The predicate implements the rules of the language so it needs to cope with
constructs rewritten by the expander, in particular explicit dereferences
that the expander uses liberally for various purposes.
This change makes the detection of rewritten calls more robust and adds the
detection of rewritten return objects.
gcc/ada/
* checks.adb (Apply_Discriminant_Check.Denotes_Explicit_Dereference):
Return false for artificial dereferences generated by the expander.
Such functions use neither Ada 2005's build-in-place mechanism nor Ada 95's
return-by-reference mechanism, but instead the common calling convention of
functions returning a nonlimited by-reference type.
gcc/ada/
* exp_ch6.adb (Is_Build_In_Place_Function): Adjust comment.
* sem_util.adb (Compute_Returns_By_Ref): Do not set Returns_By_Ref
on functions with foreign convention.
The frontend currently relies on gigi to use efficient assignment in
particular cases like:
Some_Var.all := (others => (others => 0));
gigi would use memset to clear memory pointed to by Some_Var.
In the case of an access with a Designated_Storage_Model aspect with a Copy_To
procedure, memset can't be used directly. Instead of simply disabling this
frontend/gigi optimization and having the frontend emit several assignments, a
temporary is used (through the new Build_Assignment_With_Temporary): gigi can
still memset it, and this temporary is then copied into the original
target (and the regular storage model mechanism handles it).
gcc/ada/
* exp_aggr.adb (Build_Assignment_With_Temporary): New.
(Expand_Array_Aggregate): Tune backend optimization
and insert a temporary in the case of an access with
Designated_Storage_Model aspect.
(Convert_Array_Aggr_In_Allocator): Likewise.
This goes back to the original implementation but keeps the special size
test with universal_integer to cope with its limited range.
gcc/ada/
* sem_res.adb (Resolve_Membership_Op): Adjust again latest change.
The predicate implements the rules of the language so it needs to cope with
constructs rewritten by the expander, in particular explicit dereferences
that the expander uses liberally for various purposes.
This change makes the detection of rewritten calls more robust, plugging an
existing loophole for specific objects and exposing a missing propagation of
the Is_Aliased flag for certain build-in-place objects, as well as adds the
detection of rewritten return objects.
It also contains a small enhancement to Set_Debug_Info_Defining_Id aimed at
making it easier to debug the generated code by means of -gnatD.
gcc/ada/
* sem_util.ads (Set_Debug_Info_Defining_Id): Adjust comment.
* sem_util.adb (Is_Aliased_View) <N_Explicit_Dereference>: Return
false for more artificial dereferences generated by the expander.
(Set_Debug_Info_Defining_Id): Set Debug_Info_Needed unconditionally
in -gnatD mode.
* exp_ch6.adb (Replace_Renaming_Declaration_Id): Also preserve the
Is_Aliased flag.
The wording of the introduction paragraph specified an incomplete
list of OSes. Rather than trying to update the list, this commit
changes the text to make it more general. For those parts of
this chapter which only apply to specific OSes, the documentation
is written in a way that it is clear which OS it applies to.
gcc/ada/
* doc/gnat_ugn/platform_specific_information.rst
(_Platform_Specific_Information): Minor rewording of intro text.
* gnat_ugn.texi: Regenerate.
The current code has relied on Original_Node to detect rewritten function
calls in object declarations but that's not robust enough in the presence
of function calls written in object notation.
gcc/ada/
* exp_util.ads (Is_Captured_Function_Call): Declare.
* exp_util.adb (Is_Captured_Function_Call): New predicate.
* exp_ch3.adb (Expand_N_Object_Declaration): Use it to detect a
rewritten function call as the initializing expression.
* exp_ch6.adb (Expand_Simple_Function_Return): Use it to detect a
rewritten function call as the returned expression.
Make Small_Integer_Type_For call Integer_Type_For,
so they share most of the code.
Remove Standard_Long_Integer from consideration,
because that's different on different machines (32- or 64-bit).
Standard_Integer or Standard_Long_Long_Integer will be
chosen.
gcc/ada/
* exp_util.adb (Integer_Type_For): Assertion and comment.
(Small_Integer_Type_For): Remove some code and call
Integer_Type_For instead.
* sem_util.ads (Rep_To_Pos_Flag): Improve comments. "Standard_..."
seems overly pedantic here.
* exp_attr.adb (Succ, Pred): Clean up: make the code as similar as
possible.
* exp_ch4.adb: Minor: named notation.
gcc/ada/
* ghost.adb (Is_OK_Declaration): A reference to a Ghost entity may
appear within the class-wide precondition of a helper subprogram.
This context is treated as suitable because it was already
verified when we were analyzing the original class-wide
precondition.
The support of the Default_Component_Value aspect on derived constrained
array types is broken because of a couple of issues: 1) the derived types
incorrectly inherit the initialization procedure of the ancestor types
and 2) the propagation of the aspect does not work for constrained array
types (unlike for unconstrained array types).
gcc/ada/
* exp_tss.adb (Base_Init_Proc): Do not return the Init_Proc of the
ancestor type for a derived array type.
* sem_ch13.adb (Inherit_Aspects_At_Freeze_Point): Factor out the
common processing done on representation items.
For Default_Component_Value and Default_Value, look into the first
subtype to find out the representation items.
Model the divider in Lujiazui processors as a separate automaton to
significantly reduce the overall model size. This should also result
in improved accuracy, as pipe 0 should be able to accept new
instructions while the divider is occupied.
It is unclear why integer divisions are modeled as if pipes 0-3 are all
occupied. I've opted to keep a single-cycle reservation of all four
pipes together, so GCC should continue trying to pack instructions
around a division accordingly.
Currently top three symbols in insn-automata.o are:
106102 r lujiazui_core_check
106102 r lujiazui_core_transitions
196123 r lujiazui_core_min_issue_delay
This patch shrinks all lujiazui tables to:
3 r lujiazui_decoder_min_issue_delay
20 r lujiazui_decoder_transitions
32 r lujiazui_agu_min_issue_delay
126 r lujiazui_agu_transitions
304 r lujiazui_div_base
352 r lujiazui_div_check
352 r lujiazui_div_transitions
1152 r lujiazui_core_min_issue_delay
1592 r lujiazui_agu_translate
1592 r lujiazui_core_translate
1592 r lujiazui_decoder_translate
1592 r lujiazui_div_translate
3952 r lujiazui_div_min_issue_delay
9216 r lujiazui_core_transitions
This continues the work on reducing i386 insn-automata.o size started
with similar fixes for division and multiplication instructions in
znver.md.
gcc/ChangeLog:
PR target/87832
* config/i386/lujiazui.md (lujiazui_div): New automaton.
(lua_div): New unit.
(lua_idiv_qi): Correct unit in the reservation.
(lua_idiv_qi_load): Ditto.
(lua_idiv_hi): Ditto.
(lua_idiv_hi_load): Ditto.
(lua_idiv_si): Ditto.
(lua_idiv_si_load): Ditto.
(lua_idiv_di): Ditto.
(lua_idiv_di_load): Ditto.
(lua_fdiv_SF): Ditto.
(lua_fdiv_SF_load): Ditto.
(lua_fdiv_DF): Ditto.
(lua_fdiv_DF_load): Ditto.
(lua_fdiv_XF): Ditto.
(lua_fdiv_XF_load): Ditto.
(lua_ssediv_SF): Ditto.
(lua_ssediv_load_SF): Ditto.
(lua_ssediv_V4SF): Ditto.
(lua_ssediv_load_V4SF): Ditto.
(lua_ssediv_V8SF): Ditto.
(lua_ssediv_load_V8SF): Ditto.
(lua_ssediv_SD): Ditto.
(lua_ssediv_load_SD): Ditto.
(lua_ssediv_V2DF): Ditto.
(lua_ssediv_load_V2DF): Ditto.
(lua_ssediv_V4DF): Ditto.
(lua_ssediv_load_V4DF): Ditto.
The parameters fs->data_align and fs->code_align always have fixed
values for a particular target in GCC-generated code. Specialize
execute_cfa_program for these values, to avoid multiplications.
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Define
__LIBGCC_DWARF_CIE_DATA_ALIGNMENT__.
libgcc/
* unwind-dw2-execute_cfa.h: New file. Extracted from
the execute_cfa_program function in unwind-dw2.c.
* unwind-dw2.c (execute_cfa_program_generic): New function.
(execute_cfa_program_specialized): Likewise.
(execute_cfa_program): Call execute_cfa_program_specialized
or execute_cfa_program_generic, as appropriate.
And use that to speed up the libgcc unwinder.
gcc/
* debug.h (dwarf_reg_sizes_constant): Declare.
* dwarf2cfi.cc (dwarf_reg_sizes_constant): New function.
gcc/c-family/
* c-cppbuiltin.cc (__LIBGCC_DWARF_REG_SIZES_CONSTANT__):
Define if constant is known.
libgcc/
* unwind-dw2.c (dwarf_reg_size): New function.
(_Unwind_GetGR, _Unwind_SetGR, _Unwind_SetGRPtr)
(_Unwind_SetSpColumn, uw_install_context_1): Use it.
(uw_init_context_1): Do not initialize dwarf_reg_size_table
if not in use.
The sizes are compile-time constants. Create a vector with them,
so that they can be inspected at compile time.
gcc/
* dwarf2cfi.cc (init_return_column_size): Remove.
(init_one_dwarf_reg_size): Adjust.
(generate_dwarf_reg_sizes): New function. Extracted
from expand_builtin_init_dwarf_reg_sizes.
(expand_builtin_init_dwarf_reg_sizes): Call
generate_dwarf_reg_sizes.
* target.def (init_dwarf_reg_sizes_extra): Adjust
hook signature.
* config/msp430/msp430.cc
(msp430_init_dwarf_reg_sizes_extra): Adjust.
* config/rs6000/rs6000.cc
(rs6000_init_dwarf_reg_sizes_extra): Likewise.
* doc/tm.texi: Update.
Normally, GCC executables are built with -static-libstdc++ -static-libgcc
on Darwin. This is fine in most cases, because GCC executables typically
do no use exceptions. However gnat1 does use exceptions and also pulls
in system libraries that are linked against the installed shared libgcc
which contains the system unwinder. This means that gnat1 effectively has
two unwinder instances (which does not work reliably since the unwinders
have global state).
A recent change in the initialization of FDEs has made this a hard error
now on Darwin versions (8 and 9) with libgcc installed in /usr/lib (gnat1
now hangs when an exception is thrown).
The solution is to link libgcc dynamically, picking up the installed
system version. To do this we strip -static-libgcc from the link flags.
PR ada/108202
gcc/ada/ChangeLog:
* gcc-interface/Make-lang.in (GCC_LINKERFLAGS, GCC_LDFLAGS):
Versions of ALL_LINKERFLAGS, LDFLAGS with -Werror and
-static-libgcc filtered out for Darwin8 and 9 (-Werror is filtered
out for other hosts).
This is another step towards a possible solution for PR 105137.
This patch introduces a define_insn for extendditi2 that allows
DImode to TImode sign-extension to be represented in the early
RTL optimizers, before being split post-reload into the exact
same idiom as currently produced by RTL expansion.
Typically this produces the identical code, so the first new
test case:
__int128 foo(long long x) { return (__int128)x; }
continues to generate:
foo: movq %rdi, %rax
cqto
ret
The "magic" is that this representation allows combine and the
other RTL optimizers to do a better job. Hence, the second
test case:
__int128 foo(__int128 a, long long b) {
a += ((__int128)b) << 70;
return a;
}
which mainline with -O2 currently generates as:
foo: movq %rsi, %rax
movq %rdx, %rcx
movq %rdi, %rsi
salq $6, %rcx
movq %rax, %rdi
xorl %eax, %eax
movq %rcx, %rdx
addq %rsi, %rax
adcq %rdi, %rdx
ret
with this patch now becomes:
foo: movl $0, %eax
salq $6, %rdx
addq %rdi, %rax
adcq %rsi, %rdx
ret
i.e. the same code for the signed and unsigned extension variants.
2023-01-01 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386.md (extendditi2): New define_insn.
(define_split): Use DWIH mode iterator to treat new extendditi2
identically to existing extendsidi2_1.
(define_peephole2): Likewise.
(define_peephole2): Likewise.
(define_Split): Likewise.
gcc/testsuite/ChangeLog
* gcc.target/i386/extendditi2-1.c: New test case.
* gcc.target/i386/extendditi2-2.c: Likewise.
The symbols for module registration constructors need to be external
or we get wrong code generated for targets that allow direct access to
local symbol definitions.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR modula2/108183
gcc/m2/ChangeLog:
* gm2-compiler/M2GCCDeclare.mod: Module registration constructors are
externs to the builder of m2_link.
Co-Authored-By: Gaius Mulley <gaiusmod2@gmail.com>