While sadly 5262831592 doesn't say anything on why these would have
been needed, the latest with the removal of most of the opcode vs
operands distinction in the scrubber these shouldn't be needed anymore.
The implementation was a little questionable anyway, in moving back to
states expecting labels, when clearly labels shouldn't really be
following predicates (in practice, due to another bug, at least ia64
permits such).
According to the description of the state machine, the expectation
appears to be that (leaving aside labels) any insn mnemonic or
directive would be followed by a comma separated list of operands. That
may have been true very long ago, but the latest with the advent of more
elaborate macros this isn't rhe case anymore. Neither macro parameters
in macro definitions nor macro arguments in macro invocations are
required to be separated by commas. Hence whitespace serves a crucial
role there. Plus even without "real" macros issues exist, in e.g.
.irp n, ...
insn\n\(suffix) operand1, operand2
.endr
Whitespace following the closing parenthesis would have been removed
(ahead of even processing the .irp), as the "opcode" was deemed to have
ended earlier already.
Therefore, squash the distinction between "opcode" and operands, i.e.
fold state 10 back into state 3. Also drop most of the distinction
between "symbol chars" and "relatively normal" ones. Not entirely
unexpectedly this results in the need to skip whitespace in a few more
places in arch-specific code (and quite likely more changes are needed
for insn forms not covered by the testsuite).
As a result the D10V special case is no longer necessary.
In config/tc-sparc.c also move a comment to be next to the code being
commented.
In opcodes/cgen-asm.in some further cleanup is done, following the local
var adjustments.
While apparently intended to be only externally controlled (e.g. via
specifying CFLAGS at make invocation), we should still keep scrubber and
lexer in sync in this regard. There's one place which imo was previously
wrong already, but would go further wrong and hence is being adjusted
right here: An .mri directive can be terminated by any kind of "line"
(really: statement) separators.
For the handling of ':' elsewhere in the scrubber to be correct with
regard to labels, the state after parsing a string found at the start of
a line must match that after finding a symbol character at the start of
a line. (Things are largely okay when there's whitespace ahead of the
label: Whitespace after the colon then is retained rather than dropped
for typical targets like x86, but read.c will know to deal with that.)
Instead re-use code handling LEX_IS_TWOCHAR_COMMENT_1ST, thus ensuring
that we wouldn't get bogus state transitions: For example, when we're in
states 0 or 1, a comment should be no different from whitespace
encountered in those states. Plus for e.g. x86 this results in such
comments now truly being converted to a blank, as mandated by
documentation. Both aspects apparently were a result of blindly (and
wrongly) moving to state 3 _before_ consuming the "ungot" blank.
Also amend a related comment elsewhere.
In the new testcase the .irp is to make visible in the listing all the
whitespace that the scrubber inserts / leaves in place.
State 1 is uniformly handled further up. And it is highly questionable
that in state 10 (i.e. after having seen not only a possible label, but
also an opcode), which is about to go away anyway, a line comment char
could still be meant to take effect. With the state checking dropped,
the immediately preceding logic can then also be simplified.
From especially the checks for the two separator forms it appears to
follow that the construct being touched is about trailing whitespace. In
such a case, considering that for many targets ordinary and line comment
chars overlap, take into account that line comment chars override
ordinary ones in lex[] (logic elsewhere in do_scrub_chars() actually
depends on that ordering, and also accounts for this overriding).
Plus of course IS_NEWLINE() would better also be consulted. Note also
that the DOUBLESLASH_LINE_COMMENTS change should generally have no
effect just yet; it's a prereq for a later change but better fits here.
Leave respective comments as well, and update documentation to correct
which comment form is actually replaced by a single blank (i.e. neither
the ones starting with what {,tc_}comment_chars[] has nor the ones
starting with what line_comment_chars[] has).
Two successive PUT() without a state change in between can't be right:
The first PUT() may take the "goto tofull" path, leading to the
subsequent character being processed later in the previously set state
(1 in this case), rather than the state we were in upon entry to the
switch() (13 in this case).
However, the original purpose of that logic appears to be to not mistake
"|| ^" for "||^". This effect, sadly, looks to not have been achieved.
Therefore drop the special case altogether; something that actually
achieves the (presumably) intended effect may then be introduced down
the road.
While we can't - unlike an old comment suggests - do this fully, we can
certainly do part of this at compile time.
Since it's adjacent, also drop the unnecessary forward declaration of
process_escape().
First input_scrub_next_buffer()'s invocation was wrong, leading to input
only being checked from the last newline till the end of the current
buffer. Correcting the invocation, however, leads to duplicate checking
unless -f (or the #NO_APP equivalent thereof) is in effect. Move the
invocation to input_file_give_next_buffer(), to restrict it accordingly.
Then, when macros contain multi-byte characters, warning about them
again in every expansion isn't useful. Suppress such warnings from
sb_scrub_and_add_sb().
Apparently (beyond what's [easily] visible in git history) when this was
added there was confusion about scrubber states vs lex[] contents. For
the purposes here LEX_IS_DOUBLEDASH_1ST (which happens to also resolve
to 12) alone is sufficient. "state" is never set to 12, and it being 12
also isn't handled anywhere.
Adds two new external authors to etc/update-copyright.py to cover
bfd/ax_tls.m4, and adds gprofng to dirs handled automatically, then
updates copyright messages as follows:
1) Update cgen/utils.scm emitted copyrights.
2) Run "etc/update-copyright.py --this-year" with an extra external
author I haven't committed, 'Kalray SA.', to cover gas testsuite
files (which should have their copyright message removed).
3) Build with --enable-maintainer-mode --enable-cgen-maint=yes.
4) Check out */po/*.pot which we don't update frequently.
The newer update-copyright.py fixes file encoding too, removing cr/lf
on binutils/bfdtest2.c and ld/testsuite/ld-cygwin/exe-export.exp, and
embedded cr in binutils/testsuite/binutils-all/ar.exp string match.
Commit 53f2b36a54 exposed a bug in sb_scrub_and_add_sb that could
result in losing input. If scrubbing results in expansion past the
holding capacity of do_scrub_chars output buffer, then do_scrub_chars
stashes the extra input for the next call. That call never came
because sb_scrub_and_add_sb wrongly decided it was done. Fix that by
allowing sb_scrub_and_add_sb to see whether there is pending input.
Also allow a little extra space so that in most cases we won't need
to resize the output buffer.
sb_scrub_and_add_sb also limited output to the size of the input,
rather than the actual output buffer size. Fixing that resulted in a
fail of gas/testsuite/macros/dot with an extra warning: "end of file
not at end of a line; newline inserted". OK, so the macro in dot.s
really does finish without end-of-line. Apparently the macro
expansion code relied on do_scrub_chars returning early. So fix that
too by adding a newline if needed in macro_expand_body.
PR 29466
* app.c (do_scrub_pending): New function.
* as.h: Declare it.
* input-scrub.c (input_scrub_include_sb): Add extra space for
two .linefile directives.
* sb.c (sb_scrub_and_add_sb): Take into account pending input.
Allow output to max.
* macro.c (macro_expand_body): Add terminating newline.
* testsuite/config/default.exp (SIZE, SIZEFLAGS): Define.
* testsuite/gas/macros/app5.d,
* testsuite/gas/macros/app5.s: New test.
* testsuite/gas/macros/macros.exp: Run it.
Macro arguments may be separated by commas or just whitespace. Macro
arguments may also be quoted (where one level of quotes is removed in
the course of determining the values for the respective formal
parameters). Furthermore this quote removal knows _two_ somewhat odd
escaping mechanisms: One, apparently in existence forever, is that a
pair of quotes counts as the escaping of a quote, with the pair being
transformed to a single quote in the course of quote removal. The other
(introduced by c06ae4f232) looks more usual on the surface in that it
deals with \" sequences, but it _retains_ the escaping \. Hence only the
former mechanism is suitable when the value to be used by the macro body
is to contain a quote. Yet this results in ambiguity of what "a""b" is
intended to mean; elsewhere (e.g. for .ascii) it represents two
successive string literals. However, in any event is the above different
from "a" "b": I don't think this can be viewed the same as "a""b" when
processing macro arguments.
Change the scrubber to retain such whitespace, by making the processing
of strings more similar to that of symbols. And indeed this appears to
make sense when taking into account that for quite a while gas has been
supporting quoted symbol names.
Taking a more general view, however, the change doesn't go quite far
enough. There are further cases where significant whitespace is removed
by the scrubber. The new testcase enumerates a few in its ".if 0"
section. I'm afraid the only way that I see to deal with this would be
to significantly simplify the scrubber, such that it wouldn't do much
more than collapse sequences of unquoted whitespace into a single blank.
To be honest problems in this area aren't really surprising when seeing
that there's hardly any checking of .macro use throughout the testsuite
(and in particular in the [relatively] generic tests under all/).
The result of running etc/update-copyright.py --this-year, fixing all
the files whose mode is changed by the script, plus a build with
--enable-maintainer-mode --enable-cgen-maint=yes, then checking
out */po/*.pot which we don't update frequently.
The copy of cgen was with commit d1dd5fcc38ead reverted as that commit
breaks building of bfp opcodes files.
* as.c (parse_args): Add support for --multibyte-handling.
* as.h (multibyte_handling): Declare.
* app.c (scan_for_multibyte_characters): New function.
(do_scrub_chars): Call the new function if multibyte warning is
enabled.
* input-scrub,c (input_scrub_next_buffer): Call the multibyte
scanning function if multibyte warnings are enabled.
* symbols.c (struct symbol_flags): Add multibyte_warned bit.
(symbol_init): Call the multibyte scanning function if multibyte
symbol warnings are enabled.
(S_SET_SEGMENT): Likewise.
* NEWS: Mention the new feature.
* doc/as.texi: Document the new feature.
* testsuite/gas/all/multibyte.s: New test source file.
* testsuite/gas/all/multibyte1.d: New test driver file.
* testsuite/gas/all/multibyte1.l: New test expected output.
* testsuite/gas/all/multibyte2.d: New test driver file.
* testsuite/gas/all/multibyte2.l: New test expected output.
* testsuite/gas/all/gas.exp: Run the new tests.
I don't know really why we should lose a space before a '/'. Possibly
it would make sense if '/' started a comment, but otherwise no.
* app.c (do_scrub_chars): Don't lose spaces before a slash.
PR gas/4572
Generalize what ab1fadc6b2 ("PR22714, Assembler preprocessor loses
track of \@") did to always honor escaped comment chars. Use this then
to support escaped /, %, and * operators on x86, when / is a comment
char (to match the Sun assembler's behavior).
The PR22714 testcase is such that the input buffer processed by
do_scrub_chars ends on this line
1: bug "Returning to usermode but unexpected PSR bits set?", \@
right at the backslash. (The line is part of a macro definition.)
The next input buffer then starts with '@' which starts a comment on
ARM, and the check for \@ fails due to to == tostart. Now it would be
possible to simply access to[-1] in this particular case, but that's
ugly, and to be absolutely safe from people deliberately trying to
crash gas we'd need the read.c:read_a_source_file buffer passed to
do_scrub_chars to have a single byte pad at the start.
PR 22714
* app.c (last_char): New static var.
(struct app_save): Add last_char field.
(app_push, app_pop): Handle it.
(do_scrub_chars): Use last_char in test for "\@". Set last_char.
The comment logically belongs inside the preprocessor conditional,
but gcc's -Wimplicit-fallthrough loses track of it. Revert when/if
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77817 is fixed.
* app.c (do_scrub_chars): Move fall through comment.
* expr.c (operand): Likewise.
* symbols.c (decode_local_label_name): Make type a const char *.
* listing.c (print_source): Make type of p const char *.
(print_line): Make type of string const char *.
(buffer_line): Return const char *.
(title): Make type const char *.
(subtitle): Likewise.
(listing_listing): Make type of p const char *.
* messages.c (as_internal_value_out_of_range): Make type of prefix
const char *.
* stabs.c (s_stab_generic): make type of stab_secname, stabstr_secname
and string const char *.
* read.c (_bfd_rel): Make type of name const char *.
* app.c (out_string): Change type to const char *.
(struct app_save::out_string): Likewise.
* config/tc-arm.c (codecomposer_syntax): New flag that states whether the
CCS syntax compatibility mode is on or off.
(asmfunc_states): New enum to represent the asmfunc directive state.
(asmfunc_state): New variable holding the asmfunc directive state.
(comment_chars): Rename to arm_comment_chars.
(line_separator_chars): Rename to arm_line_separator_chars.
(s_ccs_ref): New function that handles the .ref directive.
(asmfunc_debug): New function.
(s_ccs_asmfunc): New function that handles the .asmfunc directive.
(s_ccs_endasmfunc): New function that handles the .endasmfunc directive.
(s_ccs_def): New function that handles the .def directive.
(tc_start_label_without_colon): New function.
(md_pseudo_table): Added new CCS directives.
(arm_ccs_mode): New function that handles the -mccs command line option.
(arm_long_opts): Added new -mccs command line option.
* config/tc-arm.h (LABELS_WITHOUT_COLONS): New macro.
(TC_START_LABEL_WITHOUT_COLON): New macro.
(tc_start_label_without_colon): Added extern function declaration.
(tc_comment_chars): Define.
(tc_line_separator_chars): Define.
* app.c (do_scrub_begin): Use tc_line_separator_chars, if defined.
* read.c (read_begin): Likewise.
* doc/as.texinfo: Add documentation for the -mccs command line
option.
* doc/c-arm.texi: Likewise.
* doc/internals.texi: Document tc_line_separator_chars.
* NEWS: Mention the new feature.
* gas/arm/ccs.s: New test case.
* gas/arm/ccs.d: New expected disassembly.