ctermid: return string literal, document MT-Safety pitfall

Message ID ortx2b8l2k.fsf@free.home
State New, archived
Headers

Commit Message

Alexandre Oliva Nov. 7, 2014, 8:35 a.m. UTC
  The ctermid implementation, like cuserid, uses a static buffer.  I
noticed this one, but I reasoned that, since the buffer was initialized
with the same short string in every thread that called the function
without passing it a buffer, the value would remain unchanged, and so no
harmful effects would be caused by what is technically a data race.

This was based on an interpretation that strcpy (and memcpy, and
compiler-inlined versions thereof) could not write garbage in the
destination before writing the intended values, because this would be a
deviation from the specification, and it could be observed by an
asynchronous signal handler.

Whether or not this reading of POSIX is correct is not so important:
ctermid can be implemented so as to return a pre-initialized static
buffer, instead of initializing it every time.  Callers are not allowed
by POSIX to modify this buffer, so we can even make it read-only.  This
patch does this, to sidestep the debate.  It might even be the case that
it makes ctermid more efficient, since it avoids reinitializing a static
buffer every time.  GCC is still smart enough to notice that, when a
buffer is passed in, the string copied to it is a known constant, so it
optimizes the strcpy to the same sequence of stores used before this
patch.

As for the MT-Safety documentation, I update the comments next to the
annotations to reflect this change in the implementation, add a note
indicating we diverge from POSIX in the static buffer case (MT-Safety is
not required), and suggest that, when we drop the note that indicates
this is preliminary documentation about the current implementation,
rather than a commitment to remain within these safety boundaries in the
future, we may want to add a note indicating the possibility of a race
condition.

Ok to install?


From: Alexandre Oliva <aoliva@redhat.com>

for  ChangeLog

	* sysdeps/posix/ctermid.c (ctermid): Return a pointer to a
	string literal if not passed a buffer.
	* manual/job.texi (ctermid): Update reasoning, note deviation
	from posix, suggest mtasurace when not passed a buffer, for
	future non-preliminary safety notes.
---
 manual/job.texi         |    8 +++++---
 sysdeps/posix/ctermid.c |   13 +++++++------
 2 files changed, 12 insertions(+), 9 deletions(-)
  

Comments

Richard Henderson Nov. 7, 2014, 10:29 a.m. UTC | #1
On 11/07/2014 09:35 AM, Alexandre Oliva wrote:
>  char *
>  ctermid (s)
>       char *s;

Can you please fix the K&R at the same time?


r~
  
Florian Weimer Nov. 11, 2014, 1:30 p.m. UTC | #2
On 11/07/2014 09:35 AM, Alexandre Oliva wrote:
> This was based on an interpretation that strcpy (and memcpy, and
> compiler-inlined versions thereof) could not write garbage in the
> destination before writing the intended values, because this would be a
> deviation from the specification, and it could be observed by an
> asynchronous signal handler.

Which specification do you mean?  glibc or the C standard?
  
Alexandre Oliva Nov. 13, 2014, 9:03 p.m. UTC | #3
On Nov 11, 2014, Florian Weimer <fweimer@redhat.com> wrote:

> On 11/07/2014 09:35 AM, Alexandre Oliva wrote:
>> This was based on an interpretation that strcpy (and memcpy, and
>> compiler-inlined versions thereof) could not write garbage in the
>> destination before writing the intended values, because this would be a
>> deviation from the specification, and it could be observed by an
>> asynchronous signal handler.

> Which specification do you mean?  glibc or the C standard?

I meant standard C.
  
Florian Weimer Nov. 14, 2014, 12:01 p.m. UTC | #4
On 11/13/2014 10:03 PM, Alexandre Oliva wrote:
> On Nov 11, 2014, Florian Weimer <fweimer@redhat.com> wrote:
>
>> On 11/07/2014 09:35 AM, Alexandre Oliva wrote:
>>> This was based on an interpretation that strcpy (and memcpy, and
>>> compiler-inlined versions thereof) could not write garbage in the
>>> destination before writing the intended values, because this would be a
>>> deviation from the specification, and it could be observed by an
>>> asynchronous signal handler.
>
>> Which specification do you mean?  glibc or the C standard?
>
> I meant standard C.

I've been staring at the standard for a while.  The standard explicitly 
refuses to deal with the interaction of signal handlers and threads 
(7.14.1.1/7, “Use of this function in a multi-threaded program results 
in undefined behavior.”).

However, the standard still required that lock-free atomic objects have 
values which are not unspecified.  But as far as I can tell, the 
standard does not explicitly sequence operations on atomic objects, so 
the normal sequencing rules apply, and they fail to specify a value, so 
the value is still effectively unspecified, and library functions such 
as memcpy and memset can write ghost values, or can be implemented with 
one-char-at-a-time loops, and there is no way to observe that.

This (the “not unspecified but not specified either” state) seems to be 
a defect in the standard.  I very much doubt the intent was invalidate 
existing implementations which write ghost values, such as the 
Solaris/SPARC memset implementation:

   <https://bugs.openjdk.java.net/browse/JDK-6948537>
  
Torvald Riegel Nov. 14, 2014, 1:28 p.m. UTC | #5
On Fri, 2014-11-14 at 13:01 +0100, Florian Weimer wrote:
> On 11/13/2014 10:03 PM, Alexandre Oliva wrote:
> > On Nov 11, 2014, Florian Weimer <fweimer@redhat.com> wrote:
> >
> >> On 11/07/2014 09:35 AM, Alexandre Oliva wrote:
> >>> This was based on an interpretation that strcpy (and memcpy, and
> >>> compiler-inlined versions thereof) could not write garbage in the
> >>> destination before writing the intended values, because this would be a
> >>> deviation from the specification, and it could be observed by an
> >>> asynchronous signal handler.
> >
> >> Which specification do you mean?  glibc or the C standard?
> >
> > I meant standard C.
> 
> I've been staring at the standard for a while.  The standard explicitly 
> refuses to deal with the interaction of signal handlers and threads 
> (7.14.1.1/7, “Use of this function in a multi-threaded program results 
> in undefined behavior.”).

At least in ISO C++, what a signal handler can do is still being
discussed.  There's sig_atomic_t and the lock-free atomic ops that are
safe, and at least C++ wants some ordering guarantee wrt. to the
installation of the signal handler and an executing signal handler.
AFAIU from the ISO C++ discussions, the same discussion is happening
among ISO C committee members.

> However, the standard still required that lock-free atomic objects have 
> values which are not unspecified.  But as far as I can tell, the 
> standard does not explicitly sequence operations on atomic objects,

What do you mean by "to sequence"?  The sequenced-before relation can
include atomic operations, and atomic operations will be part of
happens-before.

> so 
> the normal sequencing rules apply, and they fail to specify a value, so 
> the value is still effectively unspecified, and library functions such 
> as memcpy and memset can write ghost values, or can be implemented with 
> one-char-at-a-time loops, and there is no way to observe that.
> 
> This (the “not unspecified but not specified either” state) seems to be 
> a defect in the standard.  I very much doubt the intent was invalidate 
> existing implementations which write ghost values, such as the 
> Solaris/SPARC memset implementation:
> 
>    <https://bugs.openjdk.java.net/browse/JDK-6948537>
> 

I agree that what happens during the execution of non-concurrent
functions is unspecified, and that the as-if rule applies.  For a
sequential specification of a function, one has a precondition and a
postcondition -- but how to reach a state satisfying the postcondition
is left to the implementation.  memset is allowed to change one bit at a
time, in any order.  Wanting anything else would require specifying the
actual implementation, which the standard doesn't do; it might be easy
to assume that many implementations of a very simple function like
memset would behave in a certain way -- but this already breaks down
with more complex functions such as qsort (which intermediate states are
actually allowed? can it use the to-be-sorted array as scratch space?).
Also, making assumptions about intermediate states kills the as-if rule,
hampering compiler optimizations.

If we want to reason about states during the execution of a function,
there must be some way to observe that, and the observation will be
concurrent with the execution of the function.  Thus, we need a
concurrent specification not just a sequential one, which describes the
possible outcomes when combining the function and something concurrently
running.
  
Florian Weimer Nov. 14, 2014, 1:47 p.m. UTC | #6
On 11/14/2014 02:28 PM, Torvald Riegel wrote:
>> However, the standard still required that lock-free atomic objects have
>> values which are not unspecified.  But as far as I can tell, the
>> standard does not explicitly sequence operations on atomic objects,
>
> What do you mean by "to sequence"?  The sequenced-before relation can
> include atomic operations, and atomic operations will be part of
> happens-before.

Unlike volatile accesses, access to atomic objects do not contribute to 
the sequenced-before relation directly, only their corresponding full 
expressions do.

> Wanting anything else would require specifying the
> actual implementation, which the standard doesn't do; it might be easy
> to assume that many implementations of a very simple function like
> memset would behave in a certain way -- but this already breaks down
> with more complex functions such as qsort (which intermediate states are
> actually allowed? can it use the to-be-sorted array as scratch space?).
> Also, making assumptions about intermediate states kills the as-if rule,
> hampering compiler optimizations.

It tries to do that for memset_s, but I doubt it succeeds at this (we 
touch this issue briefly before).  I still think the language in the 
standard allows the compiler to elide dead memset_s calls, despite the 
intent.
  
Torvald Riegel Nov. 14, 2014, 2:06 p.m. UTC | #7
On Fri, 2014-11-14 at 14:47 +0100, Florian Weimer wrote:
> On 11/14/2014 02:28 PM, Torvald Riegel wrote:
> > Wanting anything else would require specifying the
> > actual implementation, which the standard doesn't do; it might be easy
> > to assume that many implementations of a very simple function like
> > memset would behave in a certain way -- but this already breaks down
> > with more complex functions such as qsort (which intermediate states are
> > actually allowed? can it use the to-be-sorted array as scratch space?).
> > Also, making assumptions about intermediate states kills the as-if rule,
> > hampering compiler optimizations.
> 
> It tries to do that for memset_s, but I doubt it succeeds at this (we 
> touch this issue briefly before).  I still think the language in the 
> standard allows the compiler to elide dead memset_s calls, despite the 
> intent.

AFAICT memset_s is still a sequentially-specified function.  Even though
it states that the memory will be modified strictly according to the
rules of the abstract machine, it doesn't state that the stores don't
contribute to data races -- thus, data-race freedom would still be
required.  And it doesn't make the stores atomic.  Also, for a
concurrent observer to actually see the stores in the way they were
issues by the memset_s, it would have to synchronize with the stores;
this would require a statement how that needs to happen.

It could be used to have a "volatile" memset I guess.
  
Alexandre Oliva Nov. 14, 2014, 4:45 p.m. UTC | #8
On Nov 14, 2014, Florian Weimer <fweimer@redhat.com> wrote:

> I've been staring at the standard for a while.  The standard
> explicitly refuses to deal with the interaction of signal handlers and
> threads

The argument doesn't require threads, they'd be a distraction at best.

The issue is whether a signal handler that interrupts strcpy (or memcpy,
or any other standard function) could observe effects in the destination
string (or whatever else they modify) that are not specified in the
definition of the corresponding function.

Say, given:

char foo[5] = "12";

int main() {
  signal (SIGUSR1, checkme)
  strcpy (&foo[1], "23");
}

what standard-compliant values can checkme legitimately expect to find
in foo[0], foo[1], foo[2], foo[3], and foo[4]?

Under my reading, foo[0] and foo[4] could only hold '1' and '\0',
respectively, since nothing allows strcpy to modify them from their
initial values.  foo[1] and foo[3] could only hold '2' and '\0',
respectively, since they already held the values that the standard says
strcpy should store in them.  foo[2] could hold '\0' or '3' (*),
depending on whether the the signal interrupts strcpy before or after it
gets to it.  I reason that temporarily storing alternate values in the
destination would be as much of a deviation from the specification as
writing to foo[0] or foo[4].

(*) or perhaps other intermediate values, if chars are too big to copy
as a single memory transaction, so that narrower memory blocks, such as
individual bits, had to be copied one at a time.

I realize this reading would rule out not only the potentially useful
practice of resetting full cache lines instead of loading them from
memory before overwriting them, but also other possibilities that would
arguably comply with the current specification if it was limited to pre-
and post-condition only, without any observable intermediate results,
such as repeatedly generating random strings and comparing them with the
source until they were identical, or incrementing the source as an
multi-byte unsigned number, represented by the concatenation of all the
bits in the destination string, until it compares equal to the source.
These other possibilities would not only fail the efficiency
expectations, but also produce visible intermediate results that IMHO
are not allowed by the current wording of the standard.  But see below.

> I very much doubt the intent was invalidate existing implementations
> which write ghost values

This was the sort of argument that made me revisit my understanding that
strcpy et al couldn't “write garbage” in the destination before writing
the final value.  I still don't see how the current wording would allow
for that, but now I agree it would make perfect sense to have wording in
standards that would allow this sort of behavior.

Maybe not arbitrary garbage, certainly not temporarily writing garbage
to other user-visible portions of the address space, but something that
can be construed as executing an algorithm that, starting with the
destination assumed to contain random garbage, makes progress towards
the goal of having the destination hold a copy of the source.

Similar wording could apply to qsort too, although to inspect
intermediate qsort results you don't even need signal handlers: it calls
back the compare function synchronously, and the compare function is not
prohibited from accessing the array being sorted; plus, I believe
numerous qsort implementations have historically exchanged array entries
as sorting progresses, so attemting to rule that out would be unlikely
to fly.
  
Alexandre Oliva Nov. 14, 2014, 4:53 p.m. UTC | #9
On Nov 14, 2014, Torvald Riegel <triegel@redhat.com> wrote:

> AFAICT memset_s is still a sequentially-specified function.

How can you tell?  It's not like the standard explicitly says so, is it?
It can't be the as-if rule if intermediate results can be observed in
ways that are not ruled out by the standard.
  
Florian Weimer Nov. 14, 2014, 9:43 p.m. UTC | #10
On 11/14/2014 05:45 PM, Alexandre Oliva wrote:

> Say, given:
>
> char foo[5] = "12";
>
> int main() {
>    signal (SIGUSR1, checkme)
>    strcpy (&foo[1], "23");
> }
>
> what standard-compliant values can checkme legitimately expect to find
> in foo[0], foo[1], foo[2], foo[3], and foo[4]?

foo is not an atomic object, so this is undefined.

As a tried to explain, things turn out rather messy if you add the 
_Atomic qualifier.  I still think the values are unspecified (despite 
the standard saying they are not) because the accesses from strcpy and 
the signal handler are not sequenced.
  
Alexandre Oliva Nov. 14, 2014, 11:58 p.m. UTC | #11
On Nov 14, 2014, Florian Weimer <fweimer@redhat.com> wrote:

> On 11/14/2014 05:45 PM, Alexandre Oliva wrote:
>> Say, given:
>> 
>> char foo[5] = "12";
>> 
>> int main() {
>> signal (SIGUSR1, checkme)
>> strcpy (&foo[1], "23");
>> }
>> 
>> what standard-compliant values can checkme legitimately expect to find
>> in foo[0], foo[1], foo[2], foo[3], and foo[4]?

> foo is not an atomic object, so this is undefined.

Yeah, I goofed in the testcase, and I failed to mention the reasoning
was supposed to apply to earlier C standards, that we still intend to
comply with, so our implementation of strcpy shouldn't gratuitously
break.

In order to avoid the undefinedness under e.g. C90, the actual string
couldn't be in static storage.  Getting ahold of a pointer to the string
storage could be messy, though; maybe C90 and C99 were written so as to
imply it couldn't be done at all, even if POSIX introduced means that
would make it possible, such as writing the pointer to a pipe, or to a
file using POSIX functions not defined in standard C.  If their intent
was to make access impossible, then my argument would indeed fall apart.

> As a tried to explain, things turn out rather messy if you add the
> _Atomic qualifier.  I still think the values are unspecified (despite
> the standard saying they are not) because the accesses from strcpy and
> the signal handler are not sequenced.

Once we make C11 the focus, getting the pointer to the signal handler is
easier, through an _Atomic intptr_t with static or per-thread storage,
but the string storage would have to be _Atomic as well, and then, as
you say, what might happen within strcpy, memcpy et al is not entirely
clear.  However, making it unspecified might be pushing it too far.

The standard could specify, for example, that it is unspecified, within
an interrupting signal handler, whether observed values would be those
originally held in the atomic storage, or those that should be put in
there by the copy, without permitting any other values.  That would be
in line with my understanding, and I'll dare now put forth the idea that
the apparent contradiction you point out might be an indication that
this was the intent.

But it could also say any value whatsoever would be permitted, which
might make writing temporary garbage or invalidating entire cache lines
more defensible.  I suppose we'll only know if we ask and get a
clarification...  It no longer matters for the situation that initiated
the debate, but it might matter for other future decisions.
  
Florian Weimer Nov. 17, 2014, 7:53 a.m. UTC | #12
On 11/15/2014 12:58 AM, Alexandre Oliva wrote:
> The standard could specify, for example, that it is unspecified, within
> an interrupting signal handler, whether observed values would be those
> originally held in the atomic storage, or those that should be put in
> there by the copy, without permitting any other values.  That would be
> in line with my understanding, and I'll dare now put forth the idea that
> the apparent contradiction you point out might be an indication that
> this was the intent.

It would mean that memset and memcpy would align the passed-in pointer 
to the largest possible atomic object size and update the target using 
atomic instructions of at least this size.  (This might also apply to 
the string functions.)  Head and tail may not be a multiple of the word 
size, so we'd need a compare-and-swap loop to cover this case, with 
quite a bit of performance overhead.

Personally, I find it rather attractive to leave this unspecified.

(Note that C11 is a bit ambiguous whether there is a “no values out of 
thin air” requirement in the memory model.  Java has this even in the 
presence of data races, but I don't think GCC provides this for C11.  If 
all data races are indeed undefined behavior, the fact that the standard 
makes a contrary claim about how the memory model works (see the 
previous discussion with Torvald for a quote from the standard) does not 
matter.)
  
Torvald Riegel Nov. 17, 2014, 9:44 a.m. UTC | #13
On Fri, 2014-11-14 at 14:53 -0200, Alexandre Oliva wrote:
> On Nov 14, 2014, Torvald Riegel <triegel@redhat.com> wrote:
> 
> > AFAICT memset_s is still a sequentially-specified function.
> 
> How can you tell?  It's not like the standard explicitly says so, is it?
> It can't be the as-if rule if intermediate results can be observed in
> ways that are not ruled out by the standard.

If we're talking about C11, which Florian cited, then the by-default
data-race freedom requirement applies, and memset_s doesn't say anything
about atomicity or ordering, so if you would observe intermediate
states, you'd have a race condition.  You wouldn't have a race condition
if you'd have an observer that happens-before the memset_s or have the
memset_s happens-before the observer.  IOW, you're not allowed to look
at the intermediate states.

If we disregard data-race freedom for a second, memset_s is, in
comparison to memset, a little special in that it says that the function
has to be executed strictly according to the rules of the abstract
machine.  That may look like it could be useful for concurrent settings,
but then you still have the issue that observers need to be constrained
as well, execution under racing accesses from multiple threads is still
undefined, and there's no memory ordering (which matters less in related
ctermid case of concurrent memset_s to the same memory locations because
you just store store store).  memset_s doesn't specify any of that, so,
by absence of defined semantics, it's still a sequential function to me.

The way I read the special memset_s requirements is that if the
function's execution is terminated prematurely because of violating the
runtime constraints, that an observer then get an as-if to the abstract
machine.  Not that you can just observe the results without it being
terminated.
Also, C11 states in 3.7.4.1p4: "Unlike memset, any call to
the memset_s function shall be evaluated strictly according to the rules
of the abstract machine as described in (5.1.2.3)."  This indicates that
memset can write intermediate states; otherwise, the standard wouldn't
need to state the deviation from the default for memset_s.

If the standard doesn't define semantics of multi-threaded executions, I
disagree that you can assume some semantics for it; it's undefined, so
like undefined behavior, you can get anything.
  
Torvald Riegel Nov. 17, 2014, 10:05 a.m. UTC | #14
On Fri, 2014-11-14 at 21:58 -0200, Alexandre Oliva wrote:
> On Nov 14, 2014, Florian Weimer <fweimer@redhat.com> wrote:
> 
> > On 11/14/2014 05:45 PM, Alexandre Oliva wrote:
> >> Say, given:
> >> 
> >> char foo[5] = "12";
> >> 
> >> int main() {
> >> signal (SIGUSR1, checkme)
> >> strcpy (&foo[1], "23");
> >> }
> >> 
> >> what standard-compliant values can checkme legitimately expect to find
> >> in foo[0], foo[1], foo[2], foo[3], and foo[4]?
> 
> > foo is not an atomic object, so this is undefined.
> 
> Yeah, I goofed in the testcase, and I failed to mention the reasoning
> was supposed to apply to earlier C standards, that we still intend to
> comply with, so our implementation of strcpy shouldn't gratuitously
> break.
> 
> In order to avoid the undefinedness under e.g. C90, the actual string
> couldn't be in static storage.  Getting ahold of a pointer to the string
> storage could be messy, though; maybe C90 and C99 were written so as to
> imply it couldn't be done at all, even if POSIX introduced means that
> would make it possible, such as writing the pointer to a pipe, or to a
> file using POSIX functions not defined in standard C.  If their intent
> was to make access impossible, then my argument would indeed fall apart.
> 
> > As a tried to explain, things turn out rather messy if you add the
> > _Atomic qualifier.  I still think the values are unspecified (despite
> > the standard saying they are not) because the accesses from strcpy and
> > the signal handler are not sequenced.
> 
> Once we make C11 the focus, getting the pointer to the signal handler is
> easier, through an _Atomic intptr_t with static or per-thread storage,
> but the string storage would have to be _Atomic as well, and then, as
> you say, what might happen within strcpy, memcpy et al is not entirely
> clear.  However, making it unspecified might be pushing it too far.

For example, strcpy doesn't take atomic types as arguments, so it won't
access the memory atomically, so you get data races and undefined
behavior unless you make sure that the signal handler happens after or
before the strcpy (via happens-before).

Please have a look at C11 (N1570) 5.1.2.3p10, which gives an example of
an allowed implementation that just guarantees equality of non-volatiles
to the abstract machine at function boundaries.

> The standard could specify, for example, that it is unspecified, within
> an interrupting signal handler, whether observed values would be those
> originally held in the atomic storage, or those that should be put in
> there by the copy, without permitting any other values.

If the signal handler's atomic accesses are concurrent with other atomic
accesses, then there is no data race because atomics don't create data
races.  But strcpy doesn't write with atomics.
  
Torvald Riegel Nov. 17, 2014, 10:21 a.m. UTC | #15
On Mon, 2014-11-17 at 08:53 +0100, Florian Weimer wrote:
> On 11/15/2014 12:58 AM, Alexandre Oliva wrote:
> > The standard could specify, for example, that it is unspecified, within
> > an interrupting signal handler, whether observed values would be those
> > originally held in the atomic storage, or those that should be put in
> > there by the copy, without permitting any other values.  That would be
> > in line with my understanding, and I'll dare now put forth the idea that
> > the apparent contradiction you point out might be an indication that
> > this was the intent.
> 
> It would mean that memset and memcpy would align the passed-in pointer 
> to the largest possible atomic object size and update the target using 
> atomic instructions of at least this size.  (This might also apply to 
> the string functions.)  Head and tail may not be a multiple of the word 
> size, so we'd need a compare-and-swap loop to cover this case, with 
> quite a bit of performance overhead.
> 
> Personally, I find it rather attractive to leave this unspecified.
> 
> (Note that C11 is a bit ambiguous whether there is a “no values out of 
> thin air” requirement in the memory model.

I've heard nobody in the C++ committee say that they want to allow
out-of-thin-air values -- and the C and C++ models are supposed to be
equivalent.
For C++, there is such a requirement, even though in a non-normative
note -- but just because it's hard to specify precisely.

See this paper for details (e.g., Section 4):
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4136.pdf

But, I don't think this specificaton problem is critical for us; the
paper also states: "This is a high-level-language specification problem:
there is no suggestion that thin-air executions occur in practice with
current compilers and hardware; the problem is rather how to exclude
them without preventing desired compiler optimisations."

> Java has this even in the 
> presence of data races,

... and that makes it hard for them.

> but I don't think GCC provides this for C11.

Agreed.  Data races are undefined behavior as stated by C11.  That
doesn't mean that everything that would be a data race under C11 also
leads to a program crash, for example, when compiled by GCC.
  
Alexandre Oliva Nov. 18, 2014, 10:23 p.m. UTC | #16
On Nov 17, 2014, Torvald Riegel <triegel@redhat.com> wrote:

> On Fri, 2014-11-14 at 14:53 -0200, Alexandre Oliva wrote:
>> On Nov 14, 2014, Torvald Riegel <triegel@redhat.com> wrote:
>> 
>> > AFAICT memset_s is still a sequentially-specified function.
>> 
>> How can you tell?  It's not like the standard explicitly says so, is it?
>> It can't be the as-if rule if intermediate results can be observed in
>> ways that are not ruled out by the standard.

> If we're talking about C11, which Florian cited, then the by-default
> data-race freedom requirement applies, and memset_s doesn't say anything
> about atomicity or ordering, so if you would observe intermediate
> states, you'd have a race condition.  You wouldn't have a race condition
> if you'd have an observer that happens-before the memset_s or have the
> memset_s happens-before the observer.  IOW, you're not allowed to look
> at the intermediate states.

I'm not asking specifically about memset or strcpy, I'm asking how do
you tell in general.  You've long ago, and again recently, claimed that
such functions as qsort and bsearch have sequential specifications, even
though they have callbacks that must necessarily observe and compute
based on intermediate states.  I'm just trying to figure out what the
heck you mean by “sequential function”, and by “sequential
specification”.  I had understood the latter had to do with
specifications limited to pre- and post-conditions, but the standards
we've been talking about do not limit function specifications to that.
So, something is clearly amiss.

As for observing intermediate results, we seem to have ruled out as
undefined accesses from other threads, and from interrupting signal
handlers.  This covers almost all possibilities, but how about
cancelling the thread that's running memcpy or strcpy, if it has
asynchronous cancellation enabled?  If you do that, and then
pthread_join completes, you have set a clear happens-before
relationship.  Sure enough, POSIX doesn't require such functions as
memcpy or strcpy to be AC-Safe, but our manual claims our current
implementations are.  Does this mean it is safe to access the variables
that were partially modified by the interrupted memcpy/strcpy/whatever,
and that this provides means to safely inspect intermediate states?  Or
does it mean our manual should not claim these functions to be AC-Safe,
just so that we can claim a program that attempts to inspect
intermediate states of strcpy is undefined behavior?  Or could we resort
to any other argument to make it undefined?
  
Torvald Riegel Nov. 19, 2014, 10:11 p.m. UTC | #17
On Tue, 2014-11-18 at 20:23 -0200, Alexandre Oliva wrote:
> On Nov 17, 2014, Torvald Riegel <triegel@redhat.com> wrote:
> 
> > On Fri, 2014-11-14 at 14:53 -0200, Alexandre Oliva wrote:
> >> On Nov 14, 2014, Torvald Riegel <triegel@redhat.com> wrote:
> >> 
> >> > AFAICT memset_s is still a sequentially-specified function.
> >> 
> >> How can you tell?  It's not like the standard explicitly says so, is it?
> >> It can't be the as-if rule if intermediate results can be observed in
> >> ways that are not ruled out by the standard.
> 
> > If we're talking about C11, which Florian cited, then the by-default
> > data-race freedom requirement applies, and memset_s doesn't say anything
> > about atomicity or ordering, so if you would observe intermediate
> > states, you'd have a race condition.  You wouldn't have a race condition
> > if you'd have an observer that happens-before the memset_s or have the
> > memset_s happens-before the observer.  IOW, you're not allowed to look
> > at the intermediate states.
> 
> I'm not asking specifically about memset or strcpy, I'm asking how do
> you tell in general.

For C11 specifically, what I wrote above applies. If the function is
doing something, for example a store, and the function does not specify
what it does internally and which inter-thread happens-before relations
this creates, then there's nothing specified that makes this not a data
race if you try to look at an intermediate state from another thread.
So if you actually look at an intermediate state, there's a data race.

Data-race-freedom is the default.  5.1.2.3p10 gives an example of an
allowed implementation that just guarantees equality of non-volatiles to
the abstract machine at function boundaries; IOW, the implementation is
allowed to just satisfy a post-condition (as long as it doesn't violate
other invariants, introduced data races on its own, etc.). 

> You've long ago, and again recently, claimed that
> such functions as qsort and bsearch have sequential specifications, even
> though they have callbacks that must necessarily observe and compute
> based on intermediate states.

Well, the comparison callbacks can't just look at will at every piece of
intermediate state.  They get called with specific arguments, and the
memory locations that they have to compare are exactly specified.  So, I
agree that these *specific* memory locations are intermediate states,
but the comparison functions are not guaranteed to be able to look at
other elements of the arrays and find sensible information in those.

The promise that a function such as qsort makes is still that after it
has finished, the array will be sorted.  Yes it can call other functions
while doing that, and it will do those calls in a way that satisfies the
preconditions of those other functions (e.g., don't have garbage in the
elements that a comparison function needs to compare); but that doesn't
mean that it guarantees anything beyond that in terms of it's promise.

> I'm just trying to figure out what the
> heck you mean by “sequential function”, and by “sequential
> specification”.

What I mean is that they are not concurrent specifications that make
guarantees about states of an unfinished execution as visible to
concurrent observers.  They only make guarantees about the state after a
function has finished executing.  (Sorry if I'm using shared-memory
synchronization terminology here, but given that we want to distinguish
between concurrent and non-concurrent, that seems to make sense.)

> I had understood the latter had to do with
> specifications limited to pre- and post-conditions, but the standards
> we've been talking about do not limit function specifications to that.

Why do you think that is the case?  The callback, or composition of
functions in general, is one thing you mentioned, and I hope was able to
convince you that this doesn't give guarantees about the caller (e.g.,
qsort) to the callee (e.g., comparison function), except when those
guarantees overlap with preconditions for the callee.

> So, something is clearly amiss.
> 
> As for observing intermediate results, we seem to have ruled out as
> undefined accesses from other threads, and from interrupting signal
> handlers.

Good.

> This covers almost all possibilities, but how about
> cancelling the thread that's running memcpy or strcpy, if it has
> asynchronous cancellation enabled?  If you do that, and then
> pthread_join completes, you have set a clear happens-before
> relationship.

Well, that's what I would guess too.  I definitely agree that the
cancellation happens-before the return of pthread_join, but which
effects then actually happen-before depends on the definition of
AC-Safe.  Which you seem to point out next:

> Sure enough, POSIX doesn't require such functions as
> memcpy or strcpy to be AC-Safe, but our manual claims our current
> implementations are.

That is a good question, and really is up to the definition of AC-Safe.
The one I see is (please correct me or cite further parts of the spec
that may apply):
"A function that may be safely invoked by an application while the
asynchronous form of cancellation is enabled."

That doesn't really tell me a lot :)  I can interpret "safely invoked"
to at least mean that the mere act of cancellation will not break
anything.  But it doesn't tell me which state one can expect after
cancellation.

I think we can distinguish between three kinds of functions here:
1) Functions like memset that (IMO) don't specify intermediate states.
2) Functions like memset_s, that to some extent do specify intermediate
states (e.g., in this case, it's equivalence to steps the abstract
machine would do when storing one character at a time, starting at the
beginning).
3) Those with an already concurrent specification, which clearly
designate atomic parts (i.e., indivisible steps, even wrt.
cancellation).

One way to define safety would be to say that a cancelled function
should either take effect or not, but never partially take effect.  IOW,
it's either just the precondition or the postcondition that holds.

Another option would be to allow specified intermediate steps to take
effect.  For 2), we could say that cancellation happens anywhere between
the steps the abstract machine would do, but not within a step.  This
would be satisfied under the requirement you assumed for memset and
strcpy implementations, I believe.
For 3), it could be cancellation between any of the atomic steps, unless
otherwise specified.  For condvar wait, for example, this could be one
of the three parts: lock release, wakeup, lock acquisition.  However,
that may not be really useful, so more needs to be specified (as
condvars do, IIRC).  If a concurrent function is supposed to be just one
atomic step, safety could mean either pre- or postcondition.

For 3), we can probably assume safety to be that the function was
cancelled somewhere between the atomic steps it does make.

> Does this mean it is safe to access the variables
> that were partially modified by the interrupted memcpy/strcpy/whatever,
> and that this provides means to safely inspect intermediate states?

For normal memcpy, strcpy, and other functions in group 1), the
intermediate states aren't defined, so unless we want to define safety
as just being cancellable (and leaving the affected memory in an
unspecified state), we can't do much.

> Or
> does it mean our manual should not claim these functions to be AC-Safe,
> just so that we can claim a program that attempts to inspect
> intermediate states of strcpy is undefined behavior?

I guess not claiming AC-Safety makes most sense for group 1).
Cancellation could be useful in scenarios where you actually don't need
to look at the state at all -- but then AC Safety must clarify the
definition that you shouldn't look at state.  I'm not sure whether we
want this as default though.

> Or could we resort
> to any other argument to make it undefined?

I'm not aware of one.  I believe clarifying (our interpretation of) the
definition of AC-Safety is a good way forward.  Or checking back with
POSIX.  If there is indeed an agreed upon, clear definition, we should
just adapt to it I suppose.
  
Alexandre Oliva Nov. 21, 2014, 9:30 a.m. UTC | #18
On Nov 19, 2014, Torvald Riegel <triegel@redhat.com> wrote:

> On Tue, 2014-11-18 at 20:23 -0200, Alexandre Oliva wrote:
>> On Nov 17, 2014, Torvald Riegel <triegel@redhat.com> wrote:
>> 
>> > On Fri, 2014-11-14 at 14:53 -0200, Alexandre Oliva wrote:
>> >> On Nov 14, 2014, Torvald Riegel <triegel@redhat.com> wrote:
>> >> 
>> >> > AFAICT memset_s is still a sequentially-specified function.
>> >> 
>> >> How can you tell?  It's not like the standard explicitly says so, is it?

>> I'm asking how do you tell in general.

> If the function is doing something, for example a store, and the
> function does not specify what it does internally and which
> inter-thread happens-before relations this creates, then there's
> nothing specified that makes this not a data race if you try to look
> at an intermediate state from another thread.

Which is why I've resorted to non-threaded means of inspection of
intermediate states.  I think we differ in whether “the function does
not specify what it does internally”.  If the definition of the function
said “copy n chars from src[0..n-1] to dest[0..n-1] respectively”,
besides any pre- and post- conditions, then it *does* specify what it
does internally.  Not the order in which the chars are copied, for sure,
but still, it says the function should copy each and every one of those
chars.  It doesn't state how to copy a char, but anything other than
load from src[i] and store the loaded value in dest[i] is hardly a copy.

So while this makes room an interrupted copy to leave dest[i] in an
unspecified state that could be its earlier value or the newly-copied
one, it would be hard to argue that anything else complies with the
behavior specification enclosed in quotes above.

I can see value in making simplifying assumptions to reason about
behavior in the presence of multiple threads, and I realize that the no
data race requirements can enable *reasoning* about sequential functions
in such contexts as if only the pre- and post-conditions mattered, I do
not agree that applying similar reasoning to go backwards is logically
sound.

I mean, “I perceive this as a sequential function, which enables
simplifying assumptions about internal behavior in multi-threaded
contexts, therefore I can disregard the explicit behavior specification
and only look at explicit or inferred (pre- and?) post-conditions to
reason in any context whatsoever, or to implement the function however I
like, even deviating from the specification, as long as it still
satisfies the post-conditions when given the pre-conditions” doesn't
hold, because there are issues that arise besides those that come up in
multi-threaded contexts, to which the simplifying assumptions for
reasoning about multi-threaded contexts do not apply.

> Well, the comparison callbacks can't just look at will at every piece
> of intermediate state.

Why is that?  I mean, what, if any, part of the relevant standards says
so?

> So, I agree that these *specific* memory locations are intermediate
> states, but the comparison functions are not guaranteed to be able to
> look at other elements of the arrays and find sensible information in
> those.

The important question here IMHO is whether looking at them is invokes
undefined behavior, or just yields unspecified values, possibly narrowed
to a subset of all values that might be held by the types of the objects
in those locations, if there can even be valid assumptions about the
types of those memory locations.


>> I'm just trying to figure out what the
>> heck you mean by “sequential function”, and by “sequential
>> specification”.

> What I mean is that they are not concurrent specifications that make
> guarantees about states of an unfinished execution as visible to
> concurrent observers.  They only make guarantees about the state after a
> function has finished executing.  (Sorry if I'm using shared-memory
> synchronization terminology here, but given that we want to distinguish
> between concurrent and non-concurrent, that seems to make sense.)

Thanks.  The definitely makes sense, when the goal is to reason about
shared-memory multi-threaded (henceforth SMMT) issues.  But there are
other issues for which this distinction, or the simplifications in SMMT
reasoning that follow from it, don't apply, and may even contradict
other standard-imposed requirements.  So please take the “sequential
function” claims with a grain of salt, and don't use them to discard
parts of the specification you don't generally have to worry about when
you're thinking of SMMT, when the context is not limited to SMMT.

>> I had understood the latter had to do with
>> specifications limited to pre- and post-conditions, but the standards
>> we've been talking about do not limit function specifications to that.

> Why do you think that is the case?

What does “that” mean?  That I had understood it in a certain way?  Or
that the standards do not limit specs to pre- and post-conditions?

> The callback, or composition of functions in general, is one thing you
> mentioned, and I hope was able to convince you that this doesn't give
> guarantees about the caller (e.g., qsort) to the callee (e.g.,
> comparison function), except when those guarantees overlap with
> preconditions for the callee.

I'm afraid you haven't, but you've helped me understand our differences
in reasoning, because I won't turn specifications of behavior into pre-
and post-conditions and label a function as sequential to then pretend
the original specifications did not exist and did not impose any other
requirements that are not necessarily relevant for SMMT contexts, but
that might be in other contexts.


> "A function that may be safely invoked by an application while the
> asynchronous form of cancellation is enabled."

> That doesn't really tell me a lot :)  I can interpret "safely invoked"
> to at least mean that the mere act of cancellation will not break
> anything.  But it doesn't tell me which state one can expect after
> cancellation.

Yup.  Again, the important question is: is it undefined or unspecified?

> One way to define safety would be to say that a cancelled function
> should either take effect or not, but never partially take effect.  IOW,
> it's either just the precondition or the postcondition that holds.

This would be a way to extend the simplifying assumptions of sequential
functions to some other contexts.  Sequential functions would
essentially be regarded as, and required to behave as, atomic.

> Another option would be to allow specified intermediate steps to take
> effect.  For 2), we could say that cancellation happens anywhere between
> the steps the abstract machine would do, but not within a step.  This
> would be satisfied under the requirement you assumed for memset and
> strcpy implementations, I believe.

Yeah, with the caveat that the order of steps of the abstract machine
that may be used to carry out the required behavior is not specified.
So, interrupting memset, you might observe that dest[i+1] is modified
while dest[i] wasn't yet, or vice-versa.

> For 3), it could be cancellation between any of the atomic steps, unless
> otherwise specified.  For condvar wait, for example, this could be one
> of the three parts: lock release, wakeup, lock acquisition.

Eeek, it would be Really Bad (TM) IMHO if a condvar wait could be
canceled while the lock is not held: this could mess with enclosing
cleanup handlers that, among other things, release the lock.

What states can cancellation cleanup handlers reliably inspect, anyway?
Are they to be regarded as running in async signal context, so that they
can't reliably access local state and are very limited in global state?
Or are they allowed to access local state, plus any global state that
could be accessed after pthread_join()ing the canceled thread?

>> Does this mean it is safe to access the variables
>> that were partially modified by the interrupted memcpy/strcpy/whatever,
>> and that this provides means to safely inspect intermediate states?

> For normal memcpy, strcpy, and other functions in group 1), the
> intermediate states aren't defined

Again, not defined or not specified?
  
Joseph Myers Nov. 21, 2014, 5:17 p.m. UTC | #19
On Fri, 21 Nov 2014, Alexandre Oliva wrote:

> > So, I agree that these *specific* memory locations are intermediate
> > states, but the comparison functions are not guaranteed to be able to
> > look at other elements of the arrays and find sensible information in
> > those.
> 
> The important question here IMHO is whether looking at them is invokes
> undefined behavior, or just yields unspecified values, possibly narrowed
> to a subset of all values that might be held by the types of the objects
> in those locations, if there can even be valid assumptions about the
> types of those memory locations.

If a location is modified by a function, and the semantics of that 
function do not specify it to be modified as an atomic operation with a 
particular memory order, I think asynchronous accesses result in undefined 
behavior.  At least, they behave like accessing uninitialized automatic 
storage or struct padding (i.e., a variable copied from the possibly 
modified location has a wobbly value, as in DR#451, that need not behave 
consistently like any particular value of its type for subsequent 
operations on it).

I don't think memset_s is any different - it acts on memory as if it were 
volatile, but not atomic.

Similarly, it is valid for functions to read their inputs multiple times 
unless otherwise specified; memcpy has undefined behavior if its inputs 
change concurrently with the call to memcpy (rather than it simply being 
unspecified which value gets copied).
  
Torvald Riegel Nov. 21, 2014, 11:43 p.m. UTC | #20
On Fri, 2014-11-21 at 07:30 -0200, Alexandre Oliva wrote:
> On Nov 19, 2014, Torvald Riegel <triegel@redhat.com> wrote:
> 
> > On Tue, 2014-11-18 at 20:23 -0200, Alexandre Oliva wrote:
> >> On Nov 17, 2014, Torvald Riegel <triegel@redhat.com> wrote:
> >> 
> >> > On Fri, 2014-11-14 at 14:53 -0200, Alexandre Oliva wrote:
> >> >> On Nov 14, 2014, Torvald Riegel <triegel@redhat.com> wrote:
> >> >> 
> >> >> > AFAICT memset_s is still a sequentially-specified function.
> >> >> 
> >> >> How can you tell?  It's not like the standard explicitly says so, is it?
> 
> >> I'm asking how do you tell in general.
> 
> > If the function is doing something, for example a store, and the
> > function does not specify what it does internally and which
> > inter-thread happens-before relations this creates, then there's
> > nothing specified that makes this not a data race if you try to look
> > at an intermediate state from another thread.
> 
> Which is why I've resorted to non-threaded means of inspection of
> intermediate states.  I think we differ in whether “the function does
> not specify what it does internally”.

Yes, that seems to be the case.

> If the definition of the function
> said “copy n chars from src[0..n-1] to dest[0..n-1] respectively”,
> besides any pre- and post- conditions, then it *does* specify what it
> does internally.

I think we differ on whether that specifies the details of an
implementation, or what condition will hold after the function returns.

The way I read it, and the way I think this needs to be understood to be
actually generally applicable, is that it effectively says:  "When the
function has returned (and thus finished execution), it will have copied
n chars from src[0..n-1] to dest[0..n-1] respectively”.  That's the
post-condition.

One reason for why I think that this is indeed the intended verbose
specification is that it's clear how to inspect the state after the
function has returned; it's not executing anymore, a copy can be easily
understood, and there's no restriction on how you actually look at the
state (e.g., when you rely on the post-condition, or want it to check
whether it holds).

In contrast, while the function is still running, there are several
other things that would have to be specified for this to make sense and
to prevent that it is interpreted differently (which a standard wouldn't
want).  For example, which steps does copying a char take?  It sounds
trivial in this example, but what disallows this to be a bit-wise copy?
If we look at other functions like qsort, which are specified to sort
the array, then there are various ways to do sorting; do you think it
says anything except that the array is sorted when the function has
returned? (Ignoring the comparison callbacks, which too don't reveal why
sorting algorithm is used.)

Thus, if we consider these more complex functions, and you agree that
they only guarantee the effects that are in place when they have
returned, wouldn't it then make most sense to take this as the default
way to understand the specification?  Why would it be worthwhile to
special-case simpler functions such as memcpy?

> Not the order in which the chars are copied, for sure,
> but still, it says the function should copy each and every one of those
> chars.  It doesn't state how to copy a char, but anything other than
> load from src[i] and store the loaded value in dest[i] is hardly a copy.
> 
> So while this makes room an interrupted copy to leave dest[i] in an
> unspecified state that could be its earlier value or the newly-copied
> one, it would be hard to argue that anything else complies with the
> behavior specification enclosed in quotes above.

Why couldn't it be a partially copied value?  Where does the standard
disallow bit-wise copy, or require atomic operations for every access to
char?

> I can see value in making simplifying assumptions to reason about
> behavior in the presence of multiple threads, and I realize that the no
> data race requirements can enable *reasoning* about sequential functions
> in such contexts as if only the pre- and post-conditions mattered, I do
> not agree that applying similar reasoning to go backwards is logically
> sound.
> 
> I mean, “I perceive this as a sequential function, which enables
> simplifying assumptions about internal behavior in multi-threaded
> contexts, therefore I can disregard the explicit behavior specification
> and only look at explicit or inferred (pre- and?) post-conditions to
> reason in any context whatsoever, or to implement the function however I
> like, even deviating from the specification, as long as it still
> satisfies the post-conditions when given the pre-conditions” doesn't
> hold, because there are issues that arise besides those that come up in
> multi-threaded contexts, to which the simplifying assumptions for
> reasoning about multi-threaded contexts do not apply.

I'm not sure I understand what you're saying.

First, I think what is precisely the behavioral specification is
something we still disagree about.  From my perspective, guaranteeing
the effects of a (sequential, non-synchronizing, non-volatile, ...)
function when it returns is perfectly in line with the specifications.
So, from my perspective, nothing is disregarded.

Compilers rely on the as-if rule to make optimizations.  For example,
store speculative values if they can prove that some value will be
written to a variable in all executions of the function.  That
speculative store might never have a right value.  Unless I
misunderstand you, such an compiler optimization would be incorrect in
your opinion because it stores a value that the abstract machine might
never store.

C11 (N1570) 5.1.2.3p9-10 start with the following sentences:
"An implementation might define a one-to-one correspondence between
abstract and actual semantics: at every sequence point, the values of
the actual objects would agree with those specified by the abstract
semantics. The keyword volatile would then be redundant.
Alternatively, an implementation might perform various optimizations
within each translation unit, such that the actual semantics would agree
with the abstract semantics only when making function calls across
translation unit boundaries."

Wouldn't the alternative implementation not be able to provide what you
argue the standard requires?

> > Well, the comparison callbacks can't just look at will at every piece
> > of intermediate state.
> 
> Why is that?  I mean, what, if any, part of the relevant standards says
> so?
> 
> > So, I agree that these *specific* memory locations are intermediate
> > states, but the comparison functions are not guaranteed to be able to
> > look at other elements of the arrays and find sensible information in
> > those.
> 
> The important question here IMHO is whether looking at them is invokes
> undefined behavior, or just yields unspecified values, possibly narrowed
> to a subset of all values that might be held by the types of the objects
> in those locations, if there can even be valid assumptions about the
> types of those memory locations.

I'm not sure about the comparison functions.  But even if there should
be a stronger requirement for the comparison functions, this wouldn't
imply that accesses from other threads wouldn't be a data race.

> 
> >> I'm just trying to figure out what the
> >> heck you mean by “sequential function”, and by “sequential
> >> specification”.
> 
> > What I mean is that they are not concurrent specifications that make
> > guarantees about states of an unfinished execution as visible to
> > concurrent observers.  They only make guarantees about the state after a
> > function has finished executing.  (Sorry if I'm using shared-memory
> > synchronization terminology here, but given that we want to distinguish
> > between concurrent and non-concurrent, that seems to make sense.)
> 
> Thanks.  The definitely makes sense, when the goal is to reason about
> shared-memory multi-threaded (henceforth SMMT) issues.  But there are
> other issues for which this distinction, or the simplifications in SMMT
> reasoning that follow from it, don't apply, and may even contradict
> other standard-imposed requirements.  So please take the “sequential
> function” claims with a grain of salt, and don't use them to discard
> parts of the specification you don't generally have to worry about when
> you're thinking of SMMT, when the context is not limited to SMMT.

So which issues are you thinking about, and how do they affect
MT-Safety?

> >> I had understood the latter had to do with
> >> specifications limited to pre- and post-conditions, but the standards
> >> we've been talking about do not limit function specifications to that.
> 
> > Why do you think that is the case?
> 
> What does “that” mean?  That I had understood it in a certain way?  Or
> that the standards do not limit specs to pre- and post-conditions?

The latter.

> > The callback, or composition of functions in general, is one thing you
> > mentioned, and I hope was able to convince you that this doesn't give
> > guarantees about the caller (e.g., qsort) to the callee (e.g.,
> > comparison function), except when those guarantees overlap with
> > preconditions for the callee.
> 
> I'm afraid you haven't, but you've helped me understand our differences
> in reasoning, because I won't turn specifications of behavior into pre-
> and post-conditions and label a function as sequential to then pretend
> the original specifications did not exist and did not impose any other
> requirements that are not necessarily relevant for SMMT contexts, but
> that might be in other contexts.
> 
> 
> > "A function that may be safely invoked by an application while the
> > asynchronous form of cancellation is enabled."
> 
> > That doesn't really tell me a lot :)  I can interpret "safely invoked"
> > to at least mean that the mere act of cancellation will not break
> > anything.  But it doesn't tell me which state one can expect after
> > cancellation.
> 
> Yup.  Again, the important question is: is it undefined or unspecified?

Do you think that unspecified would really help you a lot?  What do you
think it means?  Is it all values allowed by a type?  Or something else?

> > One way to define safety would be to say that a cancelled function
> > should either take effect or not, but never partially take effect.  IOW,
> > it's either just the precondition or the postcondition that holds.
> 
> This would be a way to extend the simplifying assumptions of sequential
> functions to some other contexts.  Sequential functions would
> essentially be regarded as, and required to behave as, atomic.

Atomic if cancelled, yes.

> > Another option would be to allow specified intermediate steps to take
> > effect.  For 2), we could say that cancellation happens anywhere between
> > the steps the abstract machine would do, but not within a step.  This
> > would be satisfied under the requirement you assumed for memset and
> > strcpy implementations, I believe.
> 
> Yeah, with the caveat that the order of steps of the abstract machine
> that may be used to carry out the required behavior is not specified.
> So, interrupting memset, you might observe that dest[i+1] is modified
> while dest[i] wasn't yet, or vice-versa.

The problem with that is that it still would need to be specified what
the actual steps are.  This is done explicitly for memset_s, but not for
memset, IMO.  It's not only the order that the standard mentions
additionally for memset_s, though -- it also explicitly requires that
the implementation is strictly equivalent to the abstract machine
(K.3.7.4.1p4):
"Unlike memset, any call to the memset_s function shall be evaluated
strictly according to the rules of the abstract machine as described in
(5.1.2.3)."

Why would the standard add these requirements for memset_s if memset
already had them?

> > For 3), it could be cancellation between any of the atomic steps, unless
> > otherwise specified.  For condvar wait, for example, this could be one
> > of the three parts: lock release, wakeup, lock acquisition.
> 
> Eeek, it would be Really Bad (TM) IMHO if a condvar wait could be
> canceled while the lock is not held: this could mess with enclosing
> cleanup handlers that, among other things, release the lock.

condvar wait was just an example.  (It's a bad one, because it's not a
cancellation point.)

> What states can cancellation cleanup handlers reliably inspect, anyway?
> Are they to be regarded as running in async signal context, so that they
> can't reliably access local state and are very limited in global state?
> Or are they allowed to access local state, plus any global state that
> could be accessed after pthread_join()ing the canceled thread?

I don't know.  I'd focus on what you call the global state first.

> >> Does this mean it is safe to access the variables
> >> that were partially modified by the interrupted memcpy/strcpy/whatever,
> >> and that this provides means to safely inspect intermediate states?
> 
> > For normal memcpy, strcpy, and other functions in group 1), the
> > intermediate states aren't defined
> 
> Again, not defined or not specified?

Again, what's your definition of "unspecified"?
  

Patch

diff --git a/manual/job.texi b/manual/job.texi
index 4f9bd81..095c26d 100644
--- a/manual/job.texi
+++ b/manual/job.texi
@@ -1039,10 +1039,12 @@  The function @code{ctermid} is declared in the header file
 @comment stdio.h
 @comment POSIX.1
 @deftypefun {char *} ctermid (char *@var{string})
-@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
+@safety{@prelim{}@mtsafe{@mtsposix{/!string}}@assafe{}@acsafe{}}
 @c This function is a stub by default; the actual implementation, for
-@c posix systems, returns an internal buffer if passed a NULL string,
-@c but the internal buffer is always set to /dev/tty.
+@c posix systems, returns a pointer to a string literal if passed a NULL
+@c string.  It's not clear we want to commit to being MT-Safe in the
+@c !string case, so maybe add mtasurace{:ctermid/!string} when we take
+@c prelim out, to make room for using a static buffer in the future.
 The @code{ctermid} function returns a string containing the file name of
 the controlling terminal for the current process.  If @var{string} is
 not a null pointer, it should be an array that can hold at least
diff --git a/sysdeps/posix/ctermid.c b/sysdeps/posix/ctermid.c
index 0ef9a3f..ca81d42 100644
--- a/sysdeps/posix/ctermid.c
+++ b/sysdeps/posix/ctermid.c
@@ -19,17 +19,18 @@ 
 #include <string.h>
 
 
-/* Return the name of the controlling terminal.
-   If S is not NULL, the name is copied into it (it should be at
-   least L_ctermid bytes long), otherwise a static buffer is used.  */
+/* Return the name of the controlling terminal.  If S is not NULL, the
+   name is copied into it (it should be at least L_ctermid bytes
+   long), otherwise we return a pointer to a non-const but read-only
+   string literal, that POSIX states the caller must not modify.  */
 char *
 ctermid (s)
      char *s;
 {
-  static char name[L_ctermid];
+  char *name = (char /*drop const*/ *) "/dev/tty";
 
   if (s == NULL)
-    s = name;
+    return name;
 
-  return strcpy (s, "/dev/tty");
+  return strcpy (s, name);
 }