binutils-gdb/gdb/guile
Andrew Burgess baab375361 gdb: building inferior strings from within GDB
History Of This Patch
=====================

This commit aims to address PR gdb/21699.  There have now been a
couple of attempts to fix this issue.  Simon originally posted two
patches back in 2021:

  https://sourceware.org/pipermail/gdb-patches/2021-July/180894.html
  https://sourceware.org/pipermail/gdb-patches/2021-July/180896.html

Before Pedro then posted a version of his own:

  https://sourceware.org/pipermail/gdb-patches/2021-July/180970.html

After this the conversation halted.  Then in 2023 I (Andrew) also took
a look at this bug and posted two versions:

  https://sourceware.org/pipermail/gdb-patches/2023-April/198570.html
  https://sourceware.org/pipermail/gdb-patches/2023-April/198680.html

The approach taken in my first patch was pretty similar to what Simon
originally posted back in 2021.  My second attempt was only a slight
variation on the first.

Pedro then pointed out his older patch, and so we arrive at this
patch.  The GDB changes here are mostly Pedro's work, but updated by
me (Andrew), any mistakes are mine.

The tests here are a combinations of everyone's work, and the commit
message is new, but copies bits from everyone's earlier work.

Problem Description
===================

Bug PR gdb/21699 makes the observation that using $_as_string with
GDB's printf can cause GDB to print unexpected data from the
inferior.  The reproducer is pretty simple:

  #include <stddef.h>
  static char arena[100];

  /* Override malloc() so value_coerce_to_target() gets a known
     pointer, and we know we"ll see an error if $_as_string() gives
     a string that isn't null terminated. */
  void
  *malloc (size_t size)
  {
      memset (arena, 'x', sizeof (arena));
      if (size > sizeof (arena))
          return NULL;
      return arena;
  }

  int
  main ()
  {
    return 0;
  }

And then in a GDB session:

  $ gdb -q test
  Reading symbols from /tmp/test...
  (gdb) start
  Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
  Starting program: /tmp/test

  Temporary breakpoint 1, main () at test.c:17
  17        return 0;
  (gdb) printf "%s\n", $_as_string("hello")
  "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  (gdb) quit

The problem above is caused by how value_cstring is used within
py-value.c, but once we understand the issue then it turns out that
value_cstring is used in an unexpected way in many places within GDB.

Within py-value.c we have a null-terminated C-style string.  We then
pass a pointer to this string, along with the length of this
string (so not including the null-character) to value_cstring.

In value_cstring GDB allocates an array value of the given character
type, and copies in requested number of characters.  However
value_cstring does not add a null-character of its own.  This means
that the value created by calling value_cstring is only
null-terminated if the null-character is included in the passed in
length.  In py-value.c this is not the case, and indeed, in most uses
of value_cstring, this is not the case.

When GDB tries to print one of these strings the value contents are
pushed to the inferior, and then read back as a C-style string, that
is, GDB reads inferior memory until it finds a null-terminator.  For
the py-value.c case, no null-terminator is pushed into the inferior,
so GDB will continue reading inferior memory until a null-terminator
is found, with unpredictable results.

Patch Description
=================

The first thing this patch does is better define what the arguments
for the two function value_cstring and value_string should represent.
The comments in the header file are updated to describe whether the
length argument should, or should not, include a null-character.
Also, the data argument is changed to type gdb_byte.  The functions as
they currently exist will handle wide-characters, in which case more
than one 'char' would be needed for each character.  As such using
gdb_byte seems to make more sense.

To avoid adding casts throughout GDB, I've also added an overload that
still takes a 'char *', but asserts that the character type being used
is of size '1'.

The value_cstring function is now responsible for adding a null
character at the end of the string value it creates.

However, once we start looking at how value_cstring is used, we
realise there's another, related, problem.  Not every language's
strings are null terminated.  Fortran and Ada strings, for example,
are just an array of characters, GDB already has the function
value_string which can be used to create such values.

Consider this example using current GDB:

  (gdb) set language ada
  (gdb) p $_gdb_setting("arch")
  $1 = (97, 117, 116, 111)
  (gdb) ptype $
  type = array (1 .. 4) of char
  (gdb) p $_gdb_maint_setting("test-settings string")
  $2 = (0)
  (gdb) ptype $
  type = array (1 .. 1) of char

This shows two problems, first, the $_gdb_setting and
$_gdb_maint_setting functions are calling value_cstring using the
builtin_char character, rather than a language appropriate type.  In
the first call, the 'arch' case, the value_cstring call doesn't
include the null character, so the returned array only contains the
expected characters.  But, in the $_gdb_maint_setting example we do
end up including the null-character, even though this is not expected
for Ada strings.

This commit adds a new language method language_defn::value_string,
this function takes a pointer and length and creates a language
appropriate value that represents the string.  For C, C++, etc this
will be a null-terminated string (by calling value_cstring), and for
Fortran and Ada this can be a bounded array of characters with no null
terminator.  Additionally, this new language_defn::value_string
function is responsible for selecting a language appropriate character
type.

After this commit the only calls to value_cstring are from the C
expression evaluator and from the default language_defn::value_string.

And the only calls to value_string are from Fortan, Ada, and ObjectC
related code.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21699

Co-Authored-By: Simon Marchi <simon.marchi@efficios.com>
Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
Co-Authored-By: Pedro Alves <pedro@palves.net>
Approved-By: Simon Marchi <simon.marchi@efficios.com>
2023-06-05 13:25:08 +01:00
..
lib
guile-internal.h
guile.c
guile.h
README
scm-arch.c
scm-auto-load.c
scm-block.c
scm-breakpoint.c
scm-cmd.c
scm-disasm.c
scm-exception.c
scm-frame.c
scm-gsmob.c
scm-iterator.c
scm-lazy-string.c
scm-math.c gdb: building inferior strings from within GDB 2023-06-05 13:25:08 +01:00
scm-objfile.c
scm-param.c
scm-ports.c
scm-pretty-print.c
scm-progspace.c
scm-safe-call.c
scm-string.c
scm-symbol.c
scm-symtab.c
scm-type.c
scm-utils.c
scm-value.c

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

README for gdb/guile
====================

This file contains important notes for gdb/guile developers.
["gdb/guile" refers to the directory you found this file in]

Nomenclature:

  In the implementation we use "Scheme" or "Guile" depending on context.
  And sometimes it doesn't matter.
  Guile is Scheme, and for the most part this is what we present to the user
  as well.  However, to highlight the fact that it is Guile, the GDB commands
  that invoke Scheme functions are named "guile" and "guile-repl",
  abbreviated "gu" and "gr" respectively.

Co-existence with Python:

  Keep the user interfaces reasonably consistent, but don't shy away from
  providing a clearer (or more Scheme-friendly/consistent) user interface
  where appropriate.

  Additions to Python support or Scheme support don't require corresponding
  changes in the other scripting language.

  Scheme-wrapped breakpoints are created lazily so that if the user
  doesn't use Scheme s/he doesn't pay any cost.

Importing the gdb module into Scheme:

  To import the gdb module:
  (gdb) guile (use-modules (gdb))

  If you want to add a prefix to gdb module symbols:
  (gdb) guile (use-modules ((gdb) #:renamer (symbol-prefix-proc 'gdb:)))
  This gives every symbol a "gdb:" prefix which is a common convention.
  OTOH it's more to type.

Implementation/Hacking notes:

  Don't use scm_is_false.
  For this C function, () == #f (a la Lisp) and it's not clear how treating
  them as equivalent for truth values will affect the GDB interface.
  Until the effect is clear avoid them.
  Instead use gdbscm_is_false, gdbscm_is_true, gdbscm_is_bool.
  There are macros in guile-internal.h to enforce this.

  Use gdbscm_foo as the name of functions that implement Scheme procedures
  to provide consistent naming in error messages.  The user can see "gdbscm"
  in the name and immediately know where the function came from.

  All smobs contain gdb_smob or chained_gdb_smob as the first member.
  This provides a mechanism for extending them in the Scheme side without
  tying GDB to the details.

  The lifetime of a smob, AIUI, is decided by the containing SCM.
  When there is no longer a reference to the containing SCM then the
  smob can be GC'd.  Objects that have references from outside of Scheme,
  e.g., breakpoints, need to be protected from GC.

  Don't do something that can cause a Scheme exception inside a TRY_CATCH,
  and, in code that can be called from Scheme, don't do something that can
  cause a GDB exception outside a TRY_CATCH.
  This makes the code a little tricky to write sometimes, but it is a
  rule imposed by the programming environment.  Bugs often happen because
  this rule is broken.  Learn it, follow it.

Coding style notes:

  - If you find violations to these rules, let's fix the code.
    Some attempt has been made to be consistent, but it's early.
    Over time we want things to be more consistent, not less.

  - None of this really needs to be read.  Instead, do not be creative:
    Monkey-See-Monkey-Do hacking should generally Just Work.

  - Absence of the word "typically" means the rule is reasonably strict.

  - The gdbscm_initialize_foo function (e.g., gdbscm_initialize_values)
    is the last thing to appear in the file, immediately preceded by any
    tables of exported variables and functions.

  - In addition to these of course, follow GDB coding conventions.

General naming rules:

  - The word "object" absent any modifier (like "GOOPS object") means a
    Scheme object (of any type), and is never used otherwise.
    If you want to refer to, e.g., a GOOPS object, say "GOOPS object".

  - Do not begin any function, global variable, etc. name with scm_.
    That's what the Guile implementation uses.
    (kinda obvious, just being complete).

  - The word "invalid" carries a specific connotation.  Try not to use it
    in a different way.  It means the underlying GDB object has disappeared.
    For example, a <gdb:objfile> smob becomes "invalid" when the underlying
    objfile is removed from GDB.

  - We typically use the word "exception" to mean Scheme exceptions,
    and we typically use the word "error" to mean GDB errors.

Comments:

  - function comments for functions implementing Scheme procedures begin with
    a description of the Scheme usage.  Example:
    /* (gsmob-aux gsmob) -> object */

  - the following comment appears after the copyright header:
    /* See README file in this directory for implementation notes, coding
       conventions, et.al.  */

Smob naming:

  - gdb smobs are named, internally, "gdb:foo"
  - in Guile they become <gdb:foo>, that is the convention for naming classes
    and smobs have rudimentary GOOPS support (they can't be inherited from,
    but generics can work with them)
  - in comments use the Guile naming for smobs,
    i.e., <gdb:foo> instead of gdb:foo.
    Note: This only applies to smobs.  Exceptions are also named gdb:foo,
    but since they are not "classes" they are not wrapped in <>.
  - smob names are stored in a global, and for simplicity we pass this
    global as the "expected type" parameter to SCM_ASSERT_TYPE, thus in
    this instance smob types are printed without the <>.
    [Hmmm, this rule seems dated now.  Plus I18N rules in GDB are not always
    clear, sometimes we pass the smob name through _(), however it's not
    clear that's actually a good idea.]

Type naming:

  - smob structs are typedefs named foo_smob

Variable naming:

  - "scm" by itself is reserved for arbitrary Scheme objects

  - variables that are pointers to smob structs are named <char>_smob or
    <char><char>_smob, e.g., f_smob for a pointer to a frame smob

  - variables that are gdb smob objects are typically named <char>_scm or
    <char><char>_scm, e.g., f_scm for a <gdb:frame> object

  - the name of the first argument for method-like functions is "self"

Function naming:

  General:

  - all non-static functions have a prefix,
    either gdbscm_ or <char><char>scm_ [or <char><char><char>scm_]

  - all functions that implement Scheme procedures have a gdbscm_ prefix,
    this is for consistency and readability of Scheme exception text

  - static functions typically have a prefix
    - the prefix is typically <char><char>scm_ where the first two letters
      are unique to the file or class the function works with.
      E.g., the scm-arch.c prefix is arscm_.
      This follows something used in gdb/python in some places,
      we make it formal.

  - if the function is of a general nature, or no other prefix works,
    use gdbscm_

  Conversion functions:

  - the from/to in function names follows from libguile's existing style
  - conversions from/to Scheme objects are named:
      prefix_scm_from_foo: converts from foo to scm
      prefix_scm_to_foo: converts from scm to foo

  Exception handling:

  - functions that may throw a Scheme exception have an _unsafe suffix
    - This does not apply to functions that implement Scheme procedures.
    - This does not apply to functions whose explicit job is to throw
      an exception.  Adding _unsafe to gdbscm_throw is kinda superfluous. :-)
  - functions that can throw a GDB error aren't adorned with _unsafe

  - "_safe" in a function name means it will never throw an exception
    - Generally unnecessary, since the convention is to mark the ones that
      *can* throw an exception.  But sometimes it's useful to highlight the
      fact that the function is safe to call without worrying about exception
      handling.

  - except for functions that implement Scheme procedures, all functions
    that can throw exceptions (GDB or Scheme) say so in their function comment

  - functions that don't throw an exception, but still need to indicate to
    the caller that one happened (i.e., "safe" functions), either return
    a <gdb:exception> smob as a result or pass it back via a parameter.
    For this reason don't pass back <gdb:exception> smobs for any other
    reason.  There are functions that explicitly construct <gdb:exception>
    smobs.  They're obviously the, umm, exception.

  Internal functions:

  - internal Scheme functions begin with "%" and are intentionally undocumented
    in the manual

  Standard Guile/Scheme conventions:

  - predicates that return Scheme values have the suffix _p and have suffix "?"
    in the Scheme procedure's name
  - functions that implement Scheme procedures that modify state have the
    suffix _x and have suffix "!" in the Scheme procedure's name
  - object predicates that return a C truth value are named prefix_is_foo
  - functions that set something have "set" at the front (except for a prefix)
    write this: gdbscm_set_gsmob_aux_x implements (set-gsmob-aux! ...)
    not this: gdbscm_gsmob_set_aux_x implements (gsmob-set-aux! ...)

Doc strings:

  - there are lots of existing examples, they should be pretty consistent,
    use them as boilerplate/examples
  - begin with a one line summary (can be multiple lines if necessary)
  - if the arguments need description:
    - blank line
    - "  Arguments: arg1 arg2"
      "    arg1: blah ..."
      "    arg2: blah ..."
  - if the result requires more description:
    - blank line
    - "  Returns:"
      "    Blah ..."
  - if it's important to list exceptions that can be thrown:
    - blank line
    - "  Throws:"
      "    exception-name: blah ..."