Add type annotations to ada-unicode.py, just enough to make pyright
happy:
$ pyright --version
pyright 1.1.359
$ pyright ada-unicode.py
0 errors, 0 warnings, 0 informations
Introduce a `Range` class instead of using separate variables and
tuples, to make the code and type annotations a bit cleaner.
When running ada-unicode.py, I get a diff for ada-casefold.h, but I get
the same diff before and after this patch, so that is a separate issue.
Change-Id: I0d8975a57f9fb115703178ae197dc6b6b8b4eb7a
Approved-By: Tom Tromey <tom@tromey.com>
This commit is the result of the following actions:
- Running gdb/copyright.py to update all of the copyright headers to
include 2024,
- Manually updating a few files the copyright.py script told me to
update, these files had copyright headers embedded within the
file,
- Regenerating gdbsupport/Makefile.in to refresh it's copyright
date,
- Using grep to find other files that still mentioned 2023. If
these files were updated last year from 2022 to 2023 then I've
updated them this year to 2024.
I'm sure I've probably missed some dates. Feel free to fix them up as
you spot them.
In a review [1], I pointed out that applying the patch, git would say:
.git/rebase-apply/patch:147: new blank line at EOF.
However, since the empty line is in target-delegates.c (a generated
file), there's nothing the author can do about it. To avoid this
comment coming up again in the future, change make-target-delegates.py
to avoid the trailing empty line. Do this by making it output empty
lines before each entity, not after.
Since this needs removing a newline output in gdbcopyright, adjust
ada-unicode.py and gdbarch.py to avoid changes in the files they
generate.
[1] https://inbox.sourceware.org/gdb-patches/20230427210113.45380-1-jhb@FreeBSD.org/T/#m083598405bef19157f67c9d97846d3dd90dc7d1c
Change-Id: Ic4c648f06443b432168cb76603402c918aa6e5d2
Approved-By: Tom Tromey <tom@tromey.com>
This commit is the result of running the gdb/copyright.py script,
which automated the update of the copyright year range for all
source files managed by the GDB project to be updated to include
year 2023.
Ada allows non-ASCII identifiers, and GNAT supports several such
encodings. This patch adds the corresponding support to gdb.
GNAT encodes non-ASCII characters using special symbol names.
For character sets like Latin-1, where all characters are a single
byte, it uses a "U" followed by the hex for the character. So, for
example, thorn would be encoded as "Ufe" (0xFE being lower case
thorn).
For wider characters, despite what the manual says (it claims
Shift-JIS and EUC can be used), in practice recent versions only
support Unicode. Here, characters in the base plane are represented
using "Wxxxx" and characters outside the base plane using
"WWxxxxxxxx".
GNAT has some further quirks here. Ada is case-insensitive, and GNAT
emits symbols that have been case-folded. For characters in ASCII,
and for all characters in non-Unicode character sets, lower case is
used. For Unicode, however, characters that fit in a single byte are
converted to lower case, but all others are converted to upper case.
Furthermore, there is a bug in GNAT where two symbols that differ only
in the case of "Y WITH DIAERESIS" (and potentially others, I did not
check exhaustively) can be used in one program. I chose to omit
handling this case from gdb, on the theory that it is hard to figure
out the logic, and anyway if the bug is ever fixed, we'll regret
having a heuristic.
This patch introduces a new "ada source-charset" setting. It defaults
to Latin-1, as that is GNAT's default. This setting controls how "U"
characters are decoded -- W/WW are always handled as UTF-32.
The ada_tag_name_from_tsd change is needed because this function will
read memory from the inferior and interpret it -- and this caused an
encoding failure on PPC when running a test that tries to read
uninitialized memory.
This patch implements its own UTF-32-based case folder. This avoids
host platform quirks, and is relatively simple. A short Python
program to generate the case-folding table is included. It simply
relies on whatever version of Unicode is used by the host Python,
which seems basically acceptable.
Test cases for UTF-8, Latin-1, and Latin-3 are included. This
exercises most of the new code paths, aside from Y WITH DIAERESIS as
noted above.