RFC: problems with minimal symbols (without a type)

Message ID 20171109012540.ds5ixw4pq6rclhgc@adacore.com
State New, archived
Headers

Commit Message

Joel Brobecker Nov. 9, 2017, 1:25 a.m. UTC
  Hi Pedro,


You recently made a change to require that minimal symbols be cast
in order to be printed. This is causing some unexpected side-effects
in Ada. The most visible is with exception catchpoints.

Consider any Ada program, trying to insert a catchpoint on a specific
exception now fails:

    $ (gdb) catch exception constraint_error
    warning: failed to reevaluate internal exception condition for catchpoint 0: 'constraint_error' has unknown type; cast it to its declared type

I suspect people using the version of GNAT produced by distribution
makes such as RedHat probably cannot reproduce the issue, as I understand
the compiler is built with a runtime with full debugging information,
whereas the version AdaCore distributes only provides debugging info
for the few units where we need it.

Another way to show the same issue is to just print constraint_error:

    (gdb) p constraint_error
    'constraint_error' has unknown type; cast it to its declared type

And unfortunately, the Ada equivalent of casting does not work either:

    (gdb) print integer(constraint_error)
    'constraint_error' has unknown type; cast it to its declared type

But even if casting were working, should we really also require
a case in a situation like this?

    (gdb) p constraint_error'address
    $2 = (system.address) 0x62c960 <constraint_error>

Unlike C, 'address returns an object of type system.address,
which is the equivalent of "void *" in C. For minimal symbols
other that integers, the way people typically dump them is by
doing something like that:

    (gdb) print {my_type} minsym'address

I looked at the patch series that introduced that change, and
I see where you are coming from, but I am wondering if we should
be doing the same for Ada or not. I think Ada users are a lot
less used to casting (case in point, it took quite a while, even
for myself, to remember what the syntax was), and I fear this is
not going to be well received by the users.

What do you think? Attached is a prototype patch which implements
the original behavior for Ada expressions. It fixes the failures
in exception catchpoints without introducing any regresssion
(x86_64-linux).

Thanks!
  

Comments

Pedro Alves Nov. 9, 2017, 10:47 a.m. UTC | #1
Hi Joel,

On 11/09/2017 01:25 AM, Joel Brobecker wrote:

> You recently made a change to require that minimal symbols be cast
> in order to be printed. This is causing some unexpected side-effects
> in Ada. The most visible is with exception catchpoints.
> 
> Consider any Ada program, trying to insert a catchpoint on a specific
> exception now fails:
> 
>     $ (gdb) catch exception constraint_error
>     warning: failed to reevaluate internal exception condition for catchpoint 0: 'constraint_error' has unknown type; cast it to its declared type
> 
> I suspect people using the version of GNAT produced by distribution
> makes such as RedHat probably cannot reproduce the issue, as I understand
> the compiler is built with a runtime with full debugging information,
> whereas the version AdaCore distributes only provides debugging info
> for the few units where we need it.
> 
> Another way to show the same issue is to just print constraint_error:
> 
>     (gdb) p constraint_error
>     'constraint_error' has unknown type; cast it to its declared type
> 
> And unfortunately, the Ada equivalent of casting does not work either:
> 
>     (gdb) print integer(constraint_error)
>     'constraint_error' has unknown type; cast it to its declared type

Shouldn't we just make that work, like it works for C?

> 
> But even if casting were working, should we really also require
> a case in a situation like this?

I'm not exactly sure which situation you're referring to here.

For "catch exception constraint_error", or in general?

For "catch exception", what is the expression that GDB is
trying to evaluate?

> 
>     (gdb) p constraint_error'address
>     $2 = (system.address) 0x62c960 <constraint_error>
> 
> Unlike C, 'address returns an object of type system.address,
> which is the equivalent of "void *" in C. For minimal symbols
> other that integers, the way people typically dump them is by
> doing something like that:
> 
>     (gdb) print {my_type} minsym'address
> 

The problem here is even knowing that you have to do that,
if GDB just prints it as an int.  For some symbols, it's
somewhat obvious GDB is printing something odd, but for others
not so much.

> I looked at the patch series that introduced that change, and
> I see where you are coming from, but I am wondering if we should
> be doing the same for Ada or not.

IMHO, we should.

Let me quote a conversation that happened on IRC just last week
(user name anonymized):

 <user>		Does anybody know what's going on: I do 1) `p malloc(128)` (now it's stored at $1), 2) `x/1x $1`. The last command results in error "0xfffffffff7f921a0:     Cannot access memory at address 0xfffffffff7f921a0"
 <user>		Why can't I access the memory I just malloc'ed?
 <palves>	what does ptype malloc say?
 <user>		type = int ()
 <palves>	ok, you don't have debug info for malloc, and so gdb assumes that it return int.
 <user>		Aaah
 <user>		Thank you!
 <user>		Ok, I'll just type-cast it, thank you very much!
 <user>		That said, I still don't understand. Even if gdb thinks that malloc returned int — but the "x" command accepts address. So it should just cast the argument to address, shouldn't it?
 <user>		And indeed, using `x/1x (void*)$1` results in the same error.
 <palves>	malloc probably returned a 64-bit address, while int is 32-bit.  so the upper 0xffffffff in "0xfffffffff7f921a0" are probably a bogus sign extension.
 <user>		Ah
 <user>		I see, thank you
 <palves>	you have to cast malloc to the right function pointer type, and then call that.
 <palves>	something like: p ((void * (*) (size_t )) malloc) (128)
 <palves>	in master, the simpler 'p (void*)malloc (128)' does the right thing.
 <palves>	(it infers the prototype from the type of the passed in arguments + the cast-to type)

> I think Ada users are a lot
> less used to casting (case in point, it took quite a while, even
> for myself, to remember what the syntax was), and I fear this is
> not going to be well received by the users.

You won't know without trying, of course.  :-)  

At least an Ada conversion/cast is Ada syntax.
The "print {my_type} minsym'address" syntax above looks like
the GDB syntax/extension that works with C too?
How do users discover that that gdb syntax extension is available
and that they need to use it?
For non-integer types, users must already do some casting to get at
the real data.  Why treat integers differently?

IMO, this could be tackled as well by better documentation.

Or even by improving the error message.  E.g., we could give a suggestion
of what the cast syntax looks like for the current language.

     (gdb) p constraint_error
     'constraint_error' has unknown type; cast it to its declared type:
          declared_type (symbol)

GDB knows the name of the variable/symbol that is missing type info,
so we could even include it in the suggestion:

     'constraint_error' has unknown type; cast it to its declared type:
          declared_type (constraint_error)

Or we could suggest the {} syntax, if you'd prefer.

Ultimately it's your call, but my vote would be to fix the casting
instead of going back to assuming integer, often incorrectly.

> What do you think? Attached is a prototype patch which implements
> the original behavior for Ada expressions. It fixes the failures
> in exception catchpoints without introducing any regresssion
> (x86_64-linux).

Thanks,
Pedro Alves
  
Joel Brobecker Nov. 15, 2017, 1:35 a.m. UTC | #2
> > And unfortunately, the Ada equivalent of casting does not work either:
> > 
> >     (gdb) print integer(constraint_error)
> >     'constraint_error' has unknown type; cast it to its declared type
> 
> Shouldn't we just make that work, like it works for C?

Of course. It's at the top of my list, and independent in my mind
of the discussion about forcing the cast. It was a convenient
work-around, however, allowing me to reduce the urgency to fix
this issue with casting.

That being said...

> Let me quote a conversation that happened on IRC just last week
> (user name anonymized):
[...]
>  <palves>	you have to cast malloc to the right function pointer type, and then call that.
>  <palves>	something like: p ((void * (*) (size_t )) malloc) (128)
>  <palves>	in master, the simpler 'p (void*)malloc (128)' does the right thing.
>  <palves>	(it infers the prototype from the type of the passed in arguments + the cast-to type)

I understand why GDB now behaves more naturally when using the casting
information to infer the symbol's type and how this is an improvement.
But wouldn't that part be orthogonal to the question whether GDB should
make this case mandatory or not?

This is probably a judgement call. And since I didn't comment on
the patch when it was proposed, I suppose it's fair that I try it
for Ada users as well, and see how they react. It'll keep the languages
consistent too, which is an advantage.

Just to answer some of your questions (therefore not in the spirit
of continuing a debate):

> At least an Ada conversion/cast is Ada syntax.
> The "print {my_type} minsym'address" syntax above looks like
> the GDB syntax/extension that works with C too?

Yes, this part is documented in the Ada section of the GDB manual.

> How do users discover that that gdb syntax extension is available
> and that they need to use it?
> For non-integer types, users must already do some casting to get at
> the real data.  Why treat integers differently?

For me, this is because those minimal symbols are typically integral
types. So, for a non-trivial part of the times I use this feature,
it actually gives me the right answer. Therefore, for me, GDB lost
a small feature that didn't always work, but still did work for a good
chunk of the scenarios I was involved in.

I'll work on fixing the casting whenever I find some time...

That got me to one piece of code in evaluate_subexp_for_cast:

      /* Don't allow e.g. '&(int)var_with_no_debug_info'.  */
      if (VALUE_LVAL (val) == lval_memory)
        {
          if (value_lazy (val))
            value_fetch_lazy (val);
          VALUE_LVAL (val) = not_lval;
        }

I was wondering why do we not want to allow someone get its
address? I checked the commit that introduced this change,
and it doesn't say.

Thanks for the feedback,
  
Pedro Alves Nov. 15, 2017, 12:17 p.m. UTC | #3
On 11/15/2017 01:35 AM, Joel Brobecker wrote:
>>> And unfortunately, the Ada equivalent of casting does not work either:
>>>
>>>     (gdb) print integer(constraint_error)
>>>     'constraint_error' has unknown type; cast it to its declared type
>>
>> Shouldn't we just make that work, like it works for C?
> 
> Of course. It's at the top of my list, and independent in my mind
> of the discussion about forcing the cast. It was a convenient
> work-around, however, allowing me to reduce the urgency to fix
> this issue with casting.

Sorry, it wasn't clear to me that this was intended as a
workaround as opposed to a proposal to make Ada always
assume int.

I'm not seeing this as independent -- if GDB assumes unknown
types are integers, then we won't need to support casting from
unknown types to anything else in the first place, it just
falls out of supporting conversion from integer to anything else.

Unless, gdb assumed int if there's no cast, and used the
cast-to type if there's a cast.  I don't think that's a good
idea, because it'd still lead to surprising results.
You could even get very hard-to-explain-by-users results,
like:

 int64_t global = 0xffffffff00000000;

 (gdb) p global
 -1                   

-1 looks reasonable, the program sets global to -1 some times.
I.e., there's no reason for a user to suspect gdb here, even
though -1 was actually the incorrect value.

And then with a cast, you'd get these conflicting results:

 (gdb) p /x global
 0xffffffff
 (gdb) p /x (int64_t) (int) global
 0xffffffffffffffff
 (gdb) p /x (int64_t) global
 0xffffffff00000000

Notice that this isn't just about printing the values
directly.  You can get hard-to-diagnose problems if
if you pass the value directly to some function.

  extern void function (unsigned long long);

 (gdb) p function (global) 

Above old GDB passes 0xffffffffffffffff, incorrectly.

> That being said...
> 
>> Let me quote a conversation that happened on IRC just last week
>> (user name anonymized):
> [...]
>>  <palves>	you have to cast malloc to the right function pointer type, and then call that.
>>  <palves>	something like: p ((void * (*) (size_t )) malloc) (128)
>>  <palves>	in master, the simpler 'p (void*)malloc (128)' does the right thing.
>>  <palves>	(it infers the prototype from the type of the passed in arguments + the cast-to type)
> 
> I understand why GDB now behaves more naturally when using the casting
> information to infer the symbol's type and how this is an improvement.
> But wouldn't that part be orthogonal to the question whether GDB should
> make this case mandatory or not?

I actually pasted that part just for completeness.  The point I was
really trying to convey is in the unquoted part.  I.e, that this is
really a FAQ, that frequently users show up on IRC confused
by GDB showing incorrect results they can't explain.

IMHO, it's also very common to have globals that are pointers,
which tend to be 64-bit nowadays, while integers are 32-bit,
leading to problems with incorrect slicing and sign extending,
similar to that user's issue with the return type of malloc.

> 
> This is probably a judgement call. And since I didn't comment on
> the patch when it was proposed, I suppose it's fair that I try it
> for Ada users as well, and see how they react. It'll keep the languages
> consistent too, which is an advantage.
> 
> Just to answer some of your questions (therefore not in the spirit
> of continuing a debate):
> 
>> At least an Ada conversion/cast is Ada syntax.
>> The "print {my_type} minsym'address" syntax above looks like
>> the GDB syntax/extension that works with C too?
> 
> Yes, this part is documented in the Ada section of the GDB manual.
> 

OK.  When involving "minsym" in a more complicated expression,
it seems to me that the Ada conversion syntax would be a bit
more convenient / natural:

(gdb) print foo + long_integer (minsym) + bar
(gdb) print foo + {long_integer} minsym'address + bar

But then again, I'm clueless on most things Ada.  :-)

>> How do users discover that that gdb syntax extension is available
>> and that they need to use it?
>> For non-integer types, users must already do some casting to get at
>> the real data.  Why treat integers differently?
> 
> For me, this is because those minimal symbols are typically integral
> types. So, for a non-trivial part of the times I use this feature,
> it actually gives me the right answer. Therefore, for me, GDB lost
> a small feature that didn't always work, but still did work for a good
> chunk of the scenarios I was involved in.
> 
> I'll work on fixing the casting whenever I find some time...

Thanks!  Hopefully it'll be a simple change to ada_evaluate_subexp's
UNOP_CAST handling, mirroring the evaluate_subexp_for_cast in
eval.c.  Let me know if you run into something odd.

> 
> That got me to one piece of code in evaluate_subexp_for_cast:
> 
>       /* Don't allow e.g. '&(int)var_with_no_debug_info'.  */
>       if (VALUE_LVAL (val) == lval_memory)
>         {
>           if (value_lazy (val))
>             value_fetch_lazy (val);
>           VALUE_LVAL (val) = not_lval;
>         }
> 
> I was wondering why do we not want to allow someone get its
> address? I checked the commit that introduced this change,
> and it doesn't say.

This is to follow usual language rules.  A cast expression isn't
an lvalue, so you can't take its address:

 int global;

 void foo ()
 {
   int *p = & (int) global;
 }

 $ gcc cast-address.c -o cast-address -g3 -O0
 cast-address.c: In function ‘foo’:
 cast-address.c:5:12: error: lvalue required as unary ‘&’ operand
    int *p = & (int) global;
             ^

So accordingly, taking the address of the lvalue in GDB
does work:

 (gdb) p &dataglobal
 $1 = (<data variable, no debug info> *) 0x601040 <dataglobal>

But taking the address of the rvalue doesn't:

 (gdb) p &(int) dataglobal
 Attempt to take address of value not located in memory.
 (gdb) p *& (int) dataglobal
 Attempt to take address of value not located in memory.

Just like what you'd get if you had debug info
for "dataglobal".

The correct way is to cast the resulting pointer instead:

 (gdb) p (int *) &dataglobal
 $2 = (int *) 0x601040 <dataglobal>
 (gdb) p *(int *) &dataglobal
 $3 = 3

which works the same whether you have debug info for
dataglobal or not.

This is covered by the tests in gdb.base/nodebug.exp, in the loop
right after "We can't rely on uintXX_t".

About the exception catchpoints, to answer my own question
about which expression is GDB trying to evaluate, I was able to 
reproduce the issue with catchpoints using the gdb.ada/mi_exc_info.exp 
testcase, btw, using 'strip -g' on the testcase's binary.

 (gdb) set language ada
 (gdb) catch exception const__aint_global_gdb_e
 warning: failed to reevaluate internal exception condition for catchpoint 0: 'const.aint_global_gdb_e' has unknown type; cast it to its declared type
 Catchpoint 2: `const__aint_global_gdb_e' Ada exception

Setting a breakpoint on 'warning' reveals:

 (top-gdb) p s
 $1 = 0x1ec6d00 "long_integer (e) = long_integer (&const__aint_global_gdb_e)"

Curious, I didn't know that "&" worked in Ada too.

Thanks,
Pedro Alves
  
Joel Brobecker Nov. 15, 2017, 6:52 p.m. UTC | #4
> I'm not seeing this as independent -- if GDB assumes unknown
> types are integers, then we won't need to support casting from
> unknown types to anything else in the first place, it just
> falls out of supporting conversion from integer to anything else.

Right. I forgot about those variables having a TYPE_CODE_ERROR.

> I actually pasted that part just for completeness.  The point I was
> really trying to convey is in the unquoted part.  I.e, that this is
> really a FAQ, that frequently users show up on IRC confused
> by GDB showing incorrect results they can't explain.

Ah, that's because that was the part that I was less convinced about ;-).
I can accept that less experienced users might be confused indeed.

> Thanks!  Hopefully it'll be a simple change to ada_evaluate_subexp's
> UNOP_CAST handling, mirroring the evaluate_subexp_for_cast in
> eval.c.  Let me know if you run into something odd.

That's pretty much it. There is still an unknown inside the "resolve"
part, but I spent most of yesterday trying to see if I could reuse
the standard code more. Couldn't find a better way, but let's take
it one bird at a time...

> > That got me to one piece of code in evaluate_subexp_for_cast:
> > 
> >       /* Don't allow e.g. '&(int)var_with_no_debug_info'.  */
> >       if (VALUE_LVAL (val) == lval_memory)
> >         {
> >           if (value_lazy (val))
> >             value_fetch_lazy (val);
> >           VALUE_LVAL (val) = not_lval;
> >         }
> > 
> > I was wondering why do we not want to allow someone get its
> > address? I checked the commit that introduced this change,
> > and it doesn't say.
> 
> This is to follow usual language rules.  A cast expression isn't
> an lvalue, so you can't take its address:

Ah, ok.

My philosophy is been that it's OK, and sometimes good, to allow
in the debugger something that's not allowed in the language, if
it makes it easier for the user to do his debugging. Would allowing
this as an extension be introducing possible confusion?

>  (top-gdb) p s
>  $1 = 0x1ec6d00 "long_integer (e) = long_integer (&const__aint_global_gdb_e)"
> 
> Curious, I didn't know that "&" worked in Ada too.

Yes. We found that we could extend the expression interpreter to accept
some of the same syntaxes that are familiar to a C developer. It's
very convenient, because it's quite short :). That was before my time,
but I think that's also why we chose the "{TYPE}" syntax for Ada as
well, to mimic what is done in C.
  
Joel Brobecker Nov. 17, 2017, 12:51 a.m. UTC | #5
For easier cross-references, I sent a patch fixing casting (and it turns
out also taking a minsym's address) for review at:

https://www.sourceware.org/ml/gdb-patches/2017-11/msg00331.html
  
Pedro Alves Nov. 17, 2017, 11:32 a.m. UTC | #6
On 11/15/2017 06:52 PM, Joel Brobecker wrote:

>> This is to follow usual language rules.  A cast expression isn't
>> an lvalue, so you can't take its address:
> 
> Ah, ok.
> 
> My philosophy is been that it's OK, and sometimes good, to allow
> in the debugger something that's not allowed in the language, if
> it makes it easier for the user to do his debugging. Would allowing
> this as an extension be introducing possible confusion?

Sorry, somehow I missed this question.  I agree that sometimes
extensions are OK, but IMO, they need to have some clear advantage.
Since there's a just-as-easy way to do the same thing within the
language, IMO, we shouldn't add such an extension.  I think that yes,
it can introduce confusion, and I could see someone reporting a bug
if they notice "&(int)global" works.

Extending the language always has risk of conflicting with future
revisions of the language, or running into cases that we can't make
work, kind of painting ourselves into an odd corner.  For example, 
consider the case of the "compile" command, when we pass the 
expression to a real compiler to parse.  Ideally we'll teach the
gcc C/C++ frontends about all our syntax extensions somehow (e.g, behind
some special "#pragma GCC gdb_extensions" or something in the code we
hand over to the compiler), so that "compile print" can become a strict
superset of "print".  IMO we'd even ditch our internal
C/C++ parser and always go via the compiler (though we're still far
from being able to do that).  This extension sounds like the kind that
would run into implementation difficulties and maybe
"language-lawer-ish"-resistance in GCC.

In this case, taking the address of an rvalue expression never worked,
even with debug info for the symbol at hand, so IMO adding such
a feature would/should be considered as an orthogonal enhancement.

Thanks,
Pedro Alves
  
Joel Brobecker Nov. 17, 2017, 5:19 p.m. UTC | #7
> > My philosophy is been that it's OK, and sometimes good, to allow
> > in the debugger something that's not allowed in the language, if
> > it makes it easier for the user to do his debugging. Would allowing
> > this as an extension be introducing possible confusion?
> 
> Sorry, somehow I missed this question.  I agree that sometimes
> extensions are OK, but IMO, they need to have some clear advantage.
> Since there's a just-as-easy way to do the same thing within the
> language, IMO, we shouldn't add such an extension.  I think that yes,
> it can introduce confusion, and I could see someone reporting a bug
> if they notice "&(int)global" works.

FTR, sounds good to me. Thanks for taking the time to explain
your thinking as well.
  

Patch

From acf78c27f8652e2bb2c35fc488766312bccc3487 Mon Sep 17 00:00:00 2001
From: Joel Brobecker <brobecker@adacore.com>
Date: Wed, 8 Nov 2017 19:32:14 -0500
Subject: [PATCH] WIP: Treat minimal symbol as integers when in Ada

---
 gdb/ada-lang.c | 15 +++++++++++++++
 gdb/eval.c     |  2 +-
 gdb/value.h    |  4 ++++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/gdb/ada-lang.c b/gdb/ada-lang.c
index 40213b5..8d987a1 100644
--- a/gdb/ada-lang.c
+++ b/gdb/ada-lang.c
@@ -10355,6 +10355,21 @@  ada_evaluate_subexp (struct type *expect_type, struct expression *exp,
 
       return arg1;
 
+    case OP_VAR_MSYM_VALUE:
+      {
+	(*pos) += 3;
+
+	minimal_symbol *msymbol = exp->elts[pc + 2].msymbol;
+	arg1 = evaluate_var_msym_value (noside, exp->elts[pc + 1].objfile,
+					msymbol);
+	if (noside != EVAL_AVOID_SIDE_EFFECTS
+	    && TYPE_CODE (value_type (arg1)) == TYPE_CODE_ERROR)
+	  arg1 = ada_value_cast (builtin_type (exp->gdbarch)->builtin_int,
+				 arg1, noside);
+
+	return arg1;
+      }
+
     case OP_STRING:
       {
         struct value *result;
diff --git a/gdb/eval.c b/gdb/eval.c
index 94ddfdb..eca4643 100644
--- a/gdb/eval.c
+++ b/gdb/eval.c
@@ -738,7 +738,7 @@  evaluate_var_value (enum noside noside, const block *blk, symbol *var)
 
 /* Helper for evaluating an OP_VAR_MSYM_VALUE.  */
 
-static value *
+value *
 evaluate_var_msym_value (enum noside noside,
 			 struct objfile *objfile, minimal_symbol *msymbol)
 {
diff --git a/gdb/value.h b/gdb/value.h
index b1b8c6d..64556c5 100644
--- a/gdb/value.h
+++ b/gdb/value.h
@@ -885,6 +885,10 @@  extern char *extract_field_op (struct expression *exp, int *subexp);
 extern struct value *evaluate_subexp_with_coercion (struct expression *,
 						    int *, enum noside);
 
+extern struct value *evaluate_var_msym_value (enum noside noside,
+					      struct objfile *objfile,
+					      minimal_symbol *msymbol);
+
 extern struct value *parse_and_eval (const char *exp);
 
 extern struct value *parse_to_comma_and_eval (const char **expp);
-- 
2.1.4