Discussion:
Proposal for new memory_order_consume definition
(too old to reply)
Paul E. McKenney
2016-02-18 01:11:03 UTC
Permalink
Hello!

A proposal (quaintly identified as P0190R0) for a new memory_order_consume
definition may be found here:

http://www2.rdrop.com/users/paulmck/submission/consume.2016.02.10b.pdf

As requested at the October C++ Standards Committee meeting, this
is a follow-on to P0098R1 that picks one alternative and describes
it in detail. This approach focuses on existing practice, with the
goal of supporting existing code with existing compilers. In the last
clang/LLVM patch I saw for basic support of this change, you could count
the changed lines and still have lots of fingers and toes left over.
Those who have been following this story will recognize that this is
a very happy contrast to work that would be required to implement the
definition in the current standard.

I expect that P0190R0 will be discussed at the upcoming C++ Standards
Committee meeting taking place the week of February 29th. Points of
discussion are likely to include:

o May memory_order_consume dependency ordering be used in
unannotated code? I believe that this must be the case,
especially given that this is our experience base. P0190R0
therefore recommends this approach.

o If memory_order_consume dependency ordering can be used in
unannotated code, must implementations support annotation?
I believe that annotation support should be required, at the very
least for formal verification, which can be quite difficult to
carry out on unannotated code. In addition, it seems likely
that annotations can enable much better diagnostics. P0190R0
therefore recommends this approach.

o If implementations must support annotation, what form should that
annotation take? P0190R0 recommends the [[carries_dependency]]
attribute, but I am not picky as long as it can be (1) applied
to all relevant pointer-like objects and (2) used in C as well
as C++. ;-)

o If memory_order_consume dependency ordering can be used in
unannotated code, how best to define the situations where
the compiler can determine the exact value of the pointer in
question? (In current defacto implementations, this can
defeat dependency ordering. Interestingly enough, this case
is not present in the Linux kernel, but still needs to be
defined.)

Options include:

o Provide new intrinsics that carry out the
comparisons, but guarantee to preserve dependencies,
as recommended by P0190R0 (std::pointer_cmp_eq_dep(),
std::pointer_cmp_ne_dep(), std::pointer_cmp_gt_dep(),
std::pointer_cmp_ge_dep(), std::pointer_cmp_lt_dep(),
and std::pointer_cmp_le_dep()).

o State that -any- comparison involving an unannotated
pointer loses the dependency.

o How is the common idiom of marking pointers by setting low-order
bits to be supported when those pointers carry dependencies?
At the moment, I believe that setting bits in pointers results in
undefined behavior even without dependency ordering, so P0190R0
kicks this particular can down the road. One option that
has been suggested is to provide intrinsics for this purpose.
(Sorry, but I forget who suggested this.)

Thoughts?

Thanx, Paul
Tony V E
2016-02-20 02:15:29 UTC
Permalink
‎If implementations must support annotation, what form should that
annotation take?  P0190R0 recommends the [[carries_dependency]]
attribute, but I am not picky as long as it can be (1) applied
to all relevant pointer-like objects and (2) used in C as well
as C++.  ;-)

If an implementation must support it, then it is not an annotation but a keyword. So no [[]] 



Sent from my BlackBerry portable Babbage Device
  Original Message  
From: Paul E. McKenney
Sent: Thursday, February 18, 2016 4:58 AM
To: ***@lists.isocpp.org; linux-***@vger.kernel.org; linux-***@vger.kernel.org; ***@gcc.gnu.org; llvm-***@lists.llvm.org
Reply To: ***@lists.isocpp.org
Cc: ***@infradead.org; ***@ucl.ac.uk; ***@arm.com; ***@redhat.com; ***@arm.com; ***@inria.fr; ***@linux-foundation.org; ***@cl.cam.ac.uk; ***@linux-foundation.org; ***@kernel.org
Subject: [isocpp-parallel] Proposal for new memory_order_consume definition

Hello!

A proposal (quaintly identified as P0190R0) for a new memory_order_consume
definition may be found here:

http://www2.rdrop.com/users/paulmck/submission/consume.2016.02.10b.pdf

As requested at the October C++ Standards Committee meeting, this
is a follow-on to P0098R1 that picks one alternative and describes
it in detail. This approach focuses on existing practice, with the
goal of supporting existing code with existing compilers. In the last
clang/LLVM patch I saw for basic support of this change, you could count
the changed lines and still have lots of fingers and toes left over.
Those who have been following this story will recognize that this is
a very happy contrast to work that would be required to implement the
definition in the current standard.

I expect that P0190R0 will be discussed at the upcoming C++ Standards
Committee meeting taking place the week of February 29th. Points of
discussion are likely to include:

o May memory_order_consume dependency ordering be used in
unannotated code? I believe that this must be the case,
especially given that this is our experience base. P0190R0
therefore recommends this approach.

o If memory_order_consume dependency ordering can be used in
unannotated code, must implementations support annotation?
I believe that annotation support should be required, at the very
least for formal verification, which can be quite difficult to
carry out on unannotated code. In addition, it seems likely
that annotations can enable much better diagnostics. P0190R0
therefore recommends this approach.

o If implementations must support annotation, what form should that
annotation take? P0190R0 recommends the [[carries_dependency]]
attribute, but I am not picky as long as it can be (1) applied
to all relevant pointer-like objects and (2) used in C as well
as C++. ;-)

o If memory_order_consume dependency ordering can be used in
unannotated code, how best to define the situations where
the compiler can determine the exact value of the pointer in
question? (In current defacto implementations, this can
defeat dependency ordering. Interestingly enough, this case
is not present in the Linux kernel, but still needs to be
defined.)

Options include:

o Provide new intrinsics that carry out the
comparisons, but guarantee to preserve dependencies,
as recommended by P0190R0 (std::pointer_cmp_eq_dep(),
std::pointer_cmp_ne_dep(), std::pointer_cmp_gt_dep(),
std::pointer_cmp_ge_dep(), std::pointer_cmp_lt_dep(),
and std::pointer_cmp_le_dep()).

o State that -any- comparison involving an unannotated
pointer loses the dependency.

o How is the common idiom of marking pointers by setting low-order
bits to be supported when those pointers carry dependencies?
At the moment, I believe that setting bits in pointers results in
undefined behavior even without dependency ordering, so P0190R0
kicks this particular can down the road. One option that
has been suggested is to provide intrinsics for this purpose.
(Sorry, but I forget who suggested this.)

Thoughts?

Thanx, Paul
Paul E. McKenney
2016-02-20 19:53:29 UTC
Permalink
Post by Tony V E
‎If implementations must support annotation, what form should that
annotation take?  P0190R0 recommends the [[carries_dependency]]
attribute, but I am not picky as long as it can be (1) applied
to all relevant pointer-like objects and (2) used in C as well
as C++.  ;-)
If an implementation must support it, then it is not an annotation but a keyword. So no [[]] 
I would be good with that approach, especially if the WG14 continues
to stay away from annotations.

For whatever it is worth, the introduction of intrinsics for comparisons
that avoid breaking dependencies enables the annotation to remain
optional.

Thanx, Paul
Post by Tony V E
Sent from my BlackBerry portable Babbage Device
  Original Message  
From: Paul E. McKenney
Sent: Thursday, February 18, 2016 4:58 AM
Subject: [isocpp-parallel] Proposal for new memory_order_consume definition
Hello!
A proposal (quaintly identified as P0190R0) for a new memory_order_consume
http://www2.rdrop.com/users/paulmck/submission/consume.2016.02.10b.pdf
As requested at the October C++ Standards Committee meeting, this
is a follow-on to P0098R1 that picks one alternative and describes
it in detail. This approach focuses on existing practice, with the
goal of supporting existing code with existing compilers. In the last
clang/LLVM patch I saw for basic support of this change, you could count
the changed lines and still have lots of fingers and toes left over.
Those who have been following this story will recognize that this is
a very happy contrast to work that would be required to implement the
definition in the current standard.
I expect that P0190R0 will be discussed at the upcoming C++ Standards
Committee meeting taking place the week of February 29th. Points of
o May memory_order_consume dependency ordering be used in
unannotated code? I believe that this must be the case,
especially given that this is our experience base. P0190R0
therefore recommends this approach.
o If memory_order_consume dependency ordering can be used in
unannotated code, must implementations support annotation?
I believe that annotation support should be required, at the very
least for formal verification, which can be quite difficult to
carry out on unannotated code. In addition, it seems likely
that annotations can enable much better diagnostics. P0190R0
therefore recommends this approach.
o If implementations must support annotation, what form should that
annotation take? P0190R0 recommends the [[carries_dependency]]
attribute, but I am not picky as long as it can be (1) applied
to all relevant pointer-like objects and (2) used in C as well
as C++. ;-)
o If memory_order_consume dependency ordering can be used in
unannotated code, how best to define the situations where
the compiler can determine the exact value of the pointer in
question? (In current defacto implementations, this can
defeat dependency ordering. Interestingly enough, this case
is not present in the Linux kernel, but still needs to be
defined.)
o Provide new intrinsics that carry out the
comparisons, but guarantee to preserve dependencies,
as recommended by P0190R0 (std::pointer_cmp_eq_dep(),
std::pointer_cmp_ne_dep(), std::pointer_cmp_gt_dep(),
std::pointer_cmp_ge_dep(), std::pointer_cmp_lt_dep(),
and std::pointer_cmp_le_dep()).
o State that -any- comparison involving an unannotated
pointer loses the dependency.
o How is the common idiom of marking pointers by setting low-order
bits to be supported when those pointers carry dependencies?
At the moment, I believe that setting bits in pointers results in
undefined behavior even without dependency ordering, so P0190R0
kicks this particular can down the road. One option that
has been suggested is to provide intrinsics for this purpose.
(Sorry, but I forget who suggested this.)
Thoughts?
Thanx, Paul
_______________________________________________
Parallel mailing list
Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
Link to this post: http://lists.isocpp.org/parallel/2016/02/0040.php
_______________________________________________
Parallel mailing list
Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
Link to this post: http://lists.isocpp.org/parallel/2016/02/0045.php
Lawrence Crowl
2016-02-26 23:57:16 UTC
Permalink
If carries_dependency affects semantics, then it should not be an
attribute.
The original design, or at least my understanding of it, was that it not
have semantics; it was only a suggestion to the compiler that it should
preserve dependencies instead of inserting a fence at the call site.
Dependency-based ordering would be preserved in either case.
Yes, but there is a performance penalty, though I do not know how severe.
When do the pragmatics become sufficiently severe that they become
semantics?
But I think we're moving away from that view towards something that doesn't
quietly add fences.
I do not think we can quite get away with defining a dependency in a way
that is unconditionally preserved by existing compilers, and thus I think
that we do probably need annotations along the dependency path. I just
don't see a way to otherwise deal with the case in which a compiler infers
an equivalent pointer and dereferences that instead of the original. This
can happen under so many (unlikely but) hard-to-define conditions that it
seems undefinable in an implementation-independent manner. "If the
implementation is able then <the semantics change>" is, in my opinion, not
acceptable standards text.
Thus I see no way to both avoid adding syntax to functions that preserve
dependencies and continue to allow existing transformations that remove
dependencies we care about, e.g. due to equality comparisons. We can
hopefully ensure that without annotations compilers break things with very
low probability, so that there is a reasonable path forward for existing
code relying on dependency ordering (which currently also breaks with very
low probability unless you understand what the compiler is doing). But I
don't see a way for the standard to guarantee correctness without the added
syntax (or added optimization constraints that effectively assume all
functions were annotated).
On Sat, Feb 20, 2016 at 11:53 AM, Paul E. McKenney <
Post by Tony V E
Post by Paul E. McKenney
‎If implementations must support annotation, what form should that
annotation take? P0190R0 recommends the [[carries_dependency]]
attribute, but I am not picky as long as it can be (1) applied
to all relevant pointer-like objects and (2) used in C as well
as C++. ;-)
If an implementation must support it, then it is not an annotation but
a
keyword. So no [[]]
I would be good with that approach, especially if the WG14 continues
to stay away from annotations.
For whatever it is worth, the introduction of intrinsics for comparisons
that avoid breaking dependencies enables the annotation to remain
optional.
Thanx, Paul
Post by Paul E. McKenney
Sent from my BlackBerry portable Babbage Device
Original Message
From: Paul E. McKenney
Sent: Thursday, February 18, 2016 4:58 AM
Subject: [isocpp-parallel] Proposal for new memory_order_consume
definition
Post by Paul E. McKenney
Hello!
A proposal (quaintly identified as P0190R0) for a new
memory_order_consume
Post by Paul E. McKenney
http://www2.rdrop.com/users/paulmck/submission/consume.2016.02.10b.pdf
As requested at the October C++ Standards Committee meeting, this
is a follow-on to P0098R1 that picks one alternative and describes
it in detail. This approach focuses on existing practice, with the
goal of supporting existing code with existing compilers. In the last
clang/LLVM patch I saw for basic support of this change, you could count
the changed lines and still have lots of fingers and toes left over.
Those who have been following this story will recognize that this is
a very happy contrast to work that would be required to implement the
definition in the current standard.
I expect that P0190R0 will be discussed at the upcoming C++ Standards
Committee meeting taking place the week of February 29th. Points of
o May memory_order_consume dependency ordering be used in
unannotated code? I believe that this must be the case,
especially given that this is our experience base. P0190R0
therefore recommends this approach.
o If memory_order_consume dependency ordering can be used in
unannotated code, must implementations support annotation?
I believe that annotation support should be required, at the very
least for formal verification, which can be quite difficult to
carry out on unannotated code. In addition, it seems likely
that annotations can enable much better diagnostics. P0190R0
therefore recommends this approach.
o If implementations must support annotation, what form should that
annotation take? P0190R0 recommends the [[carries_dependency]]
attribute, but I am not picky as long as it can be (1) applied
to all relevant pointer-like objects and (2) used in C as well
as C++. ;-)
o If memory_order_consume dependency ordering can be used in
unannotated code, how best to define the situations where
the compiler can determine the exact value of the pointer in
question? (In current defacto implementations, this can
defeat dependency ordering. Interestingly enough, this case
is not present in the Linux kernel, but still needs to be
defined.)
o Provide new intrinsics that carry out the
comparisons, but guarantee to preserve dependencies,
as recommended by P0190R0 (std::pointer_cmp_eq_dep(),
std::pointer_cmp_ne_dep(), std::pointer_cmp_gt_dep(),
std::pointer_cmp_ge_dep(), std::pointer_cmp_lt_dep(),
and std::pointer_cmp_le_dep()).
o State that -any- comparison involving an unannotated
pointer loses the dependency.
o How is the common idiom of marking pointers by setting low-order
bits to be supported when those pointers carry dependencies?
At the moment, I believe that setting bits in pointers results in
undefined behavior even without dependency ordering, so P0190R0
kicks this particular can down the road. One option that
has been suggested is to provide intrinsics for this purpose.
(Sorry, but I forget who suggested this.)
Thoughts?
Thanx, Paul
_______________________________________________
Parallel mailing list
Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
Link to this post: http://lists.isocpp.org/parallel/2016/02/0040.php
_______________________________________________
Parallel mailing list
Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
Link to this post: http://lists.isocpp.org/parallel/2016/02/0045.php
_______________________________________________
Parallel mailing list
Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
Link to this post: http://lists.isocpp.org/parallel/2016/02/0046.php
--
Lawrence Crowl
Paul E. McKenney
2016-02-27 17:06:27 UTC
Permalink
If carries_dependency affects semantics, then it should not be an attribute.
I am not picky about the form of the marking.
The original design, or at least my understanding of it, was that it not
have semantics; it was only a suggestion to the compiler that it should
preserve dependencies instead of inserting a fence at the call site.
Dependency-based ordering would be preserved in either case. But I think
we're moving away from that view towards something that doesn't quietly add
fences.
Yes, we do need to allow typical implementations to avoid quiet fence
addition.
I do not think we can quite get away with defining a dependency in a way
that is unconditionally preserved by existing compilers, and thus I think
that we do probably need annotations along the dependency path. I just
don't see a way to otherwise deal with the case in which a compiler infers
an equivalent pointer and dereferences that instead of the original. This
can happen under so many (unlikely but) hard-to-define conditions that it
seems undefinable in an implementation-independent manner. "If the
implementation is able then <the semantics change>" is, in my opinion, not
acceptable standards text.
Hmmm...

But we do already have something very similar with signed integer
overflow. If the compiler can see a way to generate faster code that
does not handle the overflow case, then the semantics suddenly change
from twos-complement arithmetic to something very strange. The standard
does not specify all the ways that the implementation might deduce that
faster code can be generated by ignoring the overflow case, it instead
simply says that signed integer overflow invoked undefined behavior.

And if that is a problem, you use unsigned integers instead of signed
integers.

So it seems that we should be able to do something very similar here.
If you don't use marking, and the compiler deduces that a given pointer
that carries a given dependency is equal to some other pointer not
carrying that same dependency, there is no dependency ordering. And,
just as with the signed-integer-overflow case, if that is a problem for
you, you can mark the pointers that you intend to carry dependencies.

In both the signed-integer-overflow and pointer-value-deduction cases,
most use cases don't need to care. In the integer case, this is because
most use cases have small integer values that don't overflow. In the
pointer case, this is because when the data structure is composed of
lots of heap-allocated data items, the compiler really cannot deduce
anything.

Other safe pointer use cases involve statically allocated data items
whose contents are compile-time constants (thus avoiding the need for
any sort of ordering) and sentinel data items (as in the Linux kernel's
cicular linked lists) where there is no dereferencing.
Thus I see no way to both avoid adding syntax to functions that preserve
dependencies and continue to allow existing transformations that remove
dependencies we care about, e.g. due to equality comparisons. We can
hopefully ensure that without annotations compilers break things with very
low probability, so that there is a reasonable path forward for existing
code relying on dependency ordering (which currently also breaks with very
low probability unless you understand what the compiler is doing). But I
don't see a way for the standard to guarantee correctness without the added
syntax (or added optimization constraints that effectively assume all
functions were annotated).
Your second sentence ("We can hopefully ensure...") does give me hope
that we might be able to reach agreement. The intent of P0190R0 is
to define a subset of operations where dependencies will be carried.
Note that P0190R0 does call out comparisons as potentially unsafe.

Thanx, Paul
On Sat, Feb 20, 2016 at 11:53 AM, Paul E. McKenney <
Post by Tony V E
Post by Paul E. McKenney
‎If implementations must support annotation, what form should that
annotation take? P0190R0 recommends the [[carries_dependency]]
attribute, but I am not picky as long as it can be (1) applied
to all relevant pointer-like objects and (2) used in C as well
as C++. ;-)
If an implementation must support it, then it is not an annotation but a
keyword. So no [[]]
I would be good with that approach, especially if the WG14 continues
to stay away from annotations.
For whatever it is worth, the introduction of intrinsics for comparisons
that avoid breaking dependencies enables the annotation to remain
optional.
Thanx, Paul
Post by Paul E. McKenney
Sent from my BlackBerry portable Babbage Device
Original Message
From: Paul E. McKenney
Sent: Thursday, February 18, 2016 4:58 AM
Subject: [isocpp-parallel] Proposal for new memory_order_consume
definition
Post by Paul E. McKenney
Hello!
A proposal (quaintly identified as P0190R0) for a new
memory_order_consume
Post by Paul E. McKenney
http://www2.rdrop.com/users/paulmck/submission/consume.2016.02.10b.pdf
As requested at the October C++ Standards Committee meeting, this
is a follow-on to P0098R1 that picks one alternative and describes
it in detail. This approach focuses on existing practice, with the
goal of supporting existing code with existing compilers. In the last
clang/LLVM patch I saw for basic support of this change, you could count
the changed lines and still have lots of fingers and toes left over.
Those who have been following this story will recognize that this is
a very happy contrast to work that would be required to implement the
definition in the current standard.
I expect that P0190R0 will be discussed at the upcoming C++ Standards
Committee meeting taking place the week of February 29th. Points of
o May memory_order_consume dependency ordering be used in
unannotated code? I believe that this must be the case,
especially given that this is our experience base. P0190R0
therefore recommends this approach.
o If memory_order_consume dependency ordering can be used in
unannotated code, must implementations support annotation?
I believe that annotation support should be required, at the very
least for formal verification, which can be quite difficult to
carry out on unannotated code. In addition, it seems likely
that annotations can enable much better diagnostics. P0190R0
therefore recommends this approach.
o If implementations must support annotation, what form should that
annotation take? P0190R0 recommends the [[carries_dependency]]
attribute, but I am not picky as long as it can be (1) applied
to all relevant pointer-like objects and (2) used in C as well
as C++. ;-)
o If memory_order_consume dependency ordering can be used in
unannotated code, how best to define the situations where
the compiler can determine the exact value of the pointer in
question? (In current defacto implementations, this can
defeat dependency ordering. Interestingly enough, this case
is not present in the Linux kernel, but still needs to be
defined.)
o Provide new intrinsics that carry out the
comparisons, but guarantee to preserve dependencies,
as recommended by P0190R0 (std::pointer_cmp_eq_dep(),
std::pointer_cmp_ne_dep(), std::pointer_cmp_gt_dep(),
std::pointer_cmp_ge_dep(), std::pointer_cmp_lt_dep(),
and std::pointer_cmp_le_dep()).
o State that -any- comparison involving an unannotated
pointer loses the dependency.
o How is the common idiom of marking pointers by setting low-order
bits to be supported when those pointers carry dependencies?
At the moment, I believe that setting bits in pointers results in
undefined behavior even without dependency ordering, so P0190R0
kicks this particular can down the road. One option that
has been suggested is to provide intrinsics for this purpose.
(Sorry, but I forget who suggested this.)
Thoughts?
Thanx, Paul
_______________________________________________
Parallel mailing list
Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
Link to this post: http://lists.isocpp.org/parallel/2016/02/0040.php
_______________________________________________
Parallel mailing list
Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
Link to this post: http://lists.isocpp.org/parallel/2016/02/0045.php
_______________________________________________
Parallel mailing list
Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
Link to this post: http://lists.isocpp.org/parallel/2016/02/0046.php
_______________________________________________
Parallel mailing list
Subscription: http://lists.isocpp.org/mailman/listinfo.cgi/parallel
Link to this post: http://lists.isocpp.org/parallel/2016/02/0049.php
Paul E. McKenney
2016-02-27 23:10:47 UTC
Permalink
Post by Paul E. McKenney
But we do already have something very similar with signed integer
overflow. If the compiler can see a way to generate faster code that
does not handle the overflow case, then the semantics suddenly change
from twos-complement arithmetic to something very strange. The standard
does not specify all the ways that the implementation might deduce that
faster code can be generated by ignoring the overflow case, it instead
simply says that signed integer overflow invoked undefined behavior.
And if that is a problem, you use unsigned integers instead of signed
integers.
Actually, in the case of there Linux kernel we just tell the compiler to
not be an ass. We use
-fno-strict-overflow
That is the one!
or something. I forget the exact compiler flag needed for "the standard is
as broken piece of shit and made things undefined for very bad reasons".
See also there idiotic standard C alias rules. Same deal.
For which we use -fno-strict-aliasing.
So no, standards aren't that important. When the standards screw up, the
right answer is not to turn the other cheek.
Agreed, hence my current (perhaps quixotic and insane) attempt to get
the standard to do something useful for dependency ordering. But if
that doesn't work, yes, a fallback position is to get the relevant
compilers to provide flags to avoid problematic behavior, similar to
-fno-strict-overflow.

Thanx, Paul
And undefined behavior is pretty much *always* a sign of "the standard is
wrong".
Linus
Markus Trippelsdorf
2016-02-28 08:27:20 UTC
Permalink
Post by Paul E. McKenney
Post by Paul E. McKenney
But we do already have something very similar with signed integer
overflow. If the compiler can see a way to generate faster code that
does not handle the overflow case, then the semantics suddenly change
from twos-complement arithmetic to something very strange. The standard
does not specify all the ways that the implementation might deduce that
faster code can be generated by ignoring the overflow case, it instead
simply says that signed integer overflow invoked undefined behavior.
And if that is a problem, you use unsigned integers instead of signed
integers.
Actually, in the case of there Linux kernel we just tell the compiler to
not be an ass. We use
-fno-strict-overflow
That is the one!
or something. I forget the exact compiler flag needed for "the standard is
as broken piece of shit and made things undefined for very bad reasons".
See also there idiotic standard C alias rules. Same deal.
For which we use -fno-strict-aliasing.
Do not forget -fno-delete-null-pointer-checks.

So the kernel obviously is already using its own C dialect, that is
pretty far from standard C.
All these options also have a negative impact on the performance of the
generated code.
--
Markus
Linus Torvalds
2016-02-28 16:14:00 UTC
Permalink
On Sun, Feb 28, 2016 at 12:27 AM, Markus Trippelsdorf
Post by Markus Trippelsdorf
Post by Paul E. McKenney
-fno-strict-overflow
-fno-strict-aliasing.
Do not forget -fno-delete-null-pointer-checks.
So the kernel obviously is already using its own C dialect, that is
pretty far from standard C.
All these options also have a negative impact on the performance of the
generated code.
They really don't.

Have you ever seen code that cared about signed integer overflow?
Yeah, getting it right can make the compiler generate an extra ALU
instruction once in a blue moon, but trust me - you'll never notice.
You *will* notice when you suddenly have a crash or a security issue
due to bad code generation, though.

The idiotic C alias rules aren't even worth discussing. They were a
mistake. The kernel doesn't use some "C dialect pretty far from
standard C". Yeah, let's just say that the original C designers were
better at their job than a gaggle of standards people who were making
bad crap up to make some Fortran-style programs go faster.

They don't speed up normal code either, they just introduce undefined
behavior in a lot of code.

And deleting NULL pointer checks because somebody made a mistake, and
then turning that small mistake into a real and exploitable security
hole? Not so smart either.

The fact is, undefined compiler behavior is never a good idea. Not for
serious projects.

Performance doesn't come from occasional small and odd
micro-optimizations. I care about performance a lot, and I actually
look at generated code and do profiling etc. None of those three
options have *ever* shown up as issues. But the incorrect code they
generate? It has.

Linus
c***@pathscale.com
2016-02-28 16:51:18 UTC
Permalink
Sometimes Linus says some really flippant and funny things but gosh I couldn't agree more.. with one tiny nit..

Properly written Fortran and a good compiler is potentially as fast or faster than typical C version in HPC codes. (yes you may be able to get the c version faster, but it would take some effort.)

  Original Message  
From: Linus Torvalds via llvm-dev
Sent: Sunday, February 28, 2016 23:13
To: Markus Trippelsdorf
Reply To: Linus Torvalds
Cc: linux-***@vger.kernel.org; ***@gcc.gnu.org; Jade Alglave; ***@lists.isocpp.org; llvm-***@lists.llvm.org; Will Deacon; Linux Kernel Mailing List; David Howells; Peter Zijlstra; Ramana Radhakrishnan; Luc Maranget; Andrew Morton; Paul McKenney; Ingo Molnar
Subject: Re: [llvm-dev] [isocpp-parallel] Proposal for new memory_order_consume definition

On Sun, Feb 28, 2016 at 12:27 AM, Markus Trippelsdorf
Post by Markus Trippelsdorf
Post by Paul E. McKenney
-fno-strict-overflow
-fno-strict-aliasing.
Do not forget -fno-delete-null-pointer-checks.
So the kernel obviously is already using its own C dialect, that is
pretty far from standard C.
All these options also have a negative impact on the performance of the
generated code.
They really don't.

Have you ever seen code that cared about signed integer overflow?
Yeah, getting it right can make the compiler generate an extra ALU
instruction once in a blue moon, but trust me - you'll never notice.
You *will* notice when you suddenly have a crash or a security issue
due to bad code generation, though.

The idiotic C alias rules aren't even worth discussing. They were a
mistake. The kernel doesn't use some "C dialect pretty far from
standard C". Yeah, let's just say that the original C designers were
better at their job than a gaggle of standards people who were making
bad crap up to make some Fortran-style programs go faster.

They don't speed up normal code either, they just introduce undefined
behavior in a lot of code.

And deleting NULL pointer checks because somebody made a mistake, and
then turning that small mistake into a real and exploitable security
hole? Not so smart either.

The fact is, undefined compiler behavior is never a good idea. Not for
serious projects.

Performance doesn't come from occasional small and odd
micro-optimizations. I care about performance a lot, and I actually
look at generated code and do profiling etc. None of those three
options have *ever* shown up as issues. But the incorrect code they
generate? It has.

Linus
Michael Matz
2016-02-29 17:37:13 UTC
Permalink
Hi,
Post by Linus Torvalds
Post by Markus Trippelsdorf
So the kernel obviously is already using its own C dialect, that is
pretty far from standard C. All these options also have a negative
impact on the performance of the generated code.
They really don't.
They do.
Post by Linus Torvalds
Have you ever seen code that cared about signed integer overflow?
Yeah, getting it right can make the compiler generate an extra ALU
instruction once in a blue moon, but trust me - you'll never notice.
You *will* notice when you suddenly have a crash or a security issue
due to bad code generation, though.
No, that's not at all the important piece of making signed overflow
undefined. The important part is with induction variables controlling
loops:

short i; for (i = start; i < end; i++)
vs.
unsigned short u; for (u = start; u < end; u++)

For the former you're allowed to assume that the loop will terminate, and
that its iteration count is easily computable. For the latter you get
modulo arithmetic and (if start/end are of larger type than u, say 'int')
it might not even terminate at all. That has direct consequences of
vectorizability of such loops (or profitability of such transformation)
and hence quite important performance implications in practice. Not for
the kernel of course. Now we can endlessly debate how (non)practical it
is to write HPC code in C or C++, but there we are.
Post by Linus Torvalds
The fact is, undefined compiler behavior is never a good idea. Not for
serious projects.
Perhaps if these undefinednesses wouldn't have been put into the standard,
people wouldn't have written HPC code, and if that were so the world would
be a nicer place sometimes (certainly for the compiler). Alas, it isn't.


Ciao,
Michael.
Linus Torvalds
2016-02-29 17:57:24 UTC
Permalink
Post by Michael Matz
The important part is with induction variables controlling
short i; for (i = start; i < end; i++)
vs.
unsigned short u; for (u = start; u < end; u++)
For the former you're allowed to assume that the loop will terminate, and
that its iteration count is easily computable. For the latter you get
modulo arithmetic and (if start/end are of larger type than u, say 'int')
it might not even terminate at all. That has direct consequences of
vectorizability of such loops (or profitability of such transformation)
and hence quite important performance implications in practice.
Stop bullshitting me.

It would generally force the compiler to add a few extra checks when
you do vectorize (or, more generally, do any kind of loop unrolling),
and yes, it would make things slightly more painful. You might, for
example, need to add code to handle the wraparound and have a more
complex non-unrolled head/tail version for that case.

In theory you could do a whole "restart the unrolled loop around the
index wraparound" if you actually cared about the performance of such
a case - but since nobody would ever care about that, it's more likely
that you'd just do it with a non-unrolled fallback (which would likely
be identical to the tail fixup).

It would be painful, yes.

But it wouldn't be fundamentally hard, or hurt actual performance fundamentally.

It would be _inconvenient_ for compiler writers, and the bad ones
would argue vehemently against it.

. and it's how a "go fast" mode would be implemented by a compiler
writer initially as a compiler option, for those HPC people. Then you
have a use case and implementation example, and can go to the
standards body and say "look, we have people who use this already, it
breaks almost no code, and it makes our compiler able to generate much
faster code".

Which is why the standard was written to be good for compiler writers,
not actual users.

Of course, in real life HPC performance is often more about doing the
cache blocking etc, and I've seen people move to more parameterized
languages rather than C to get best performance. Generate the code
from a much higher-level description, and be able to do a much better
job, and leave C to do the low-level job, and let people do the
important part.

But no. Instead the C compiler people still argue for bad features
that were a misdesign and a wart on the language.

At the very least it should have been left as a "go unsafe, go fast"
option, and standardize *that*, instead of screwing everybody else
over.

The HPC people end up often using those anyway, because it turns out
that they'll happily get rid of proper rounding etc if it buys them a
couple of percent on their workload. Things like "I really want you
to generate multiply-accumulate instructions because I don't mind
having intermediates with higher precision" etc.

Linus
Lawrence Crowl
2016-02-29 19:38:40 UTC
Permalink
Post by Linus Torvalds
The fact is, undefined compiler behavior is never a good idea. Not for
serious projects.
Actually, undefined behavior is essential for serious projects, but
not for the reasons mentioned.

If the language has no undefined behavior, then from the compiler's view,
there is no such thing as a bad program. All programs will compile and
enter functional debug (possibly after shipping to customer). On the
other hand, a language with undefined behavior makes it possible for
compilers (and their run-time support) to identify a program as wrong.

The problem with the latest spate of compiler optimizations was not the
optimization, but the lack of warnings about exploiting undefined behavior.
--
Lawrence Crowl
James Y Knight
2016-02-29 21:12:54 UTC
Permalink
No, you really don't need undefined behavior in the standard in order
to enable bug-finding.

The standard could've (and still could...) make signed integer
overflow "implementation-defined" rather than "undefined". Compilers
would thus be required to have *some documented meaning* for it (e.g.
wrap 2's-complement, wrap 1's-complement, saturate to min/max, trap,
or whatever...), but must not have the current "Anything goes! I can
set your cat on fire if the optimizer feels like it today!" behavior.

Such a change to the standard would not reduce any ability to do error
checking, as compilers that want to be helpful could perfectly-well
define it to trap at runtime when given certain compiler flags, and
perfectly well warn you of your dependence upon unportable
implementation-defined behavior (or, that your program is going to
trap), at build-time.

[Sending again as a plain-text email, since a bunch of mailing lists
apparently hate on multipart messages that even contain a text/html
part...]

On Mon, Feb 29, 2016 at 2:38 PM, Lawrence Crowl via llvm-dev
Post by Lawrence Crowl
Post by Linus Torvalds
The fact is, undefined compiler behavior is never a good idea. Not for
serious projects.
Actually, undefined behavior is essential for serious projects, but
not for the reasons mentioned.
If the language has no undefined behavior, then from the compiler's view,
there is no such thing as a bad program. All programs will compile and
enter functional debug (possibly after shipping to customer). On the
other hand, a language with undefined behavior makes it possible for
compilers (and their run-time support) to identify a program as wrong.
The problem with the latest spate of compiler optimizations was not the
optimization, but the lack of warnings about exploiting undefined behavior.
--
Lawrence Crowl
_______________________________________________
LLVM Developers mailing list
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Toon Moene
2016-02-29 21:30:36 UTC
Permalink
Post by Linus Torvalds
Yeah, let's just say that the original C designers were
better at their job than a gaggle of standards people who were making
bad crap up to make some Fortran-style programs go faster.
The original C designers were defining a language that would make it
easy to write operating systems in (and not having to rely on assembler).

I misled the quote where they said they first tried Fortran (and
concluded it didn't fit their purpose).

BTW, Fortran was designed around floating point arithmetic (and its
non-relation to the mathematical concept of the field of the reals).

It used integers only for counting and indexing arrays, so it had no
purpose for "signed integers that overflowed". Therefore, to the Fortran
standard, this was "undefined". It was literally "undefined" - as it was
not described by the standard's text.
--
Toon Moene - e-mail: ***@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Michael Matz
2016-02-29 18:18:06 UTC
Permalink
Hi,
Post by Paul E. McKenney
But we do already have something very similar with signed integer
overflow. If the compiler can see a way to generate faster code that
does not handle the overflow case, then the semantics suddenly change
from twos-complement arithmetic to something very strange. The standard
does not specify all the ways that the implementation might deduce that
faster code can be generated by ignoring the overflow case, it instead
simply says that signed integer overflow invoked undefined behavior.
And if that is a problem, you use unsigned integers instead of signed
integers.
So it seems that we should be able to do something very similar here.
For this case the important pice of information to convey one or the other
meaning in source code is the _type_ of involved entities, not annotations
on the operations. signed type -> undefined overflow, unsigned type ->
modulo arithmetic; easy, and it nicely carries automatically through
operation chains (and pointers) without any annotations.

I feel much of the complexity in the memory order specifications, also
with your recent (much better) wording to explain dependency chains, would
be much easier if the 'carries-dependency' would be encoded into the types
of operands. For purpose of example, let's call the marker "blaeh" (not
atomic to not confuse with existing use :) ):

int foo;
blaeh int global;
int *somep;
blae int *blaehp;
f () {
blaehp = &foo; // might be okay, adds restrictions on accesses through
// blaehp, but not through 'foo' directly
blaehp = &global;
if (somep == blaehp)
{
/* Even though the value is equal ... */
... *blaehp ... /* ... a compiler can't rewrite this into *somep */
}
}

A "carries-dependency" on some operation (e.g. a call) would be added by
using a properly typed pointer at those arguments (or return type) where
it matters. You can't give a blaeh pointer to something only accepting
non-blaeh pointers (without cast).

Pointer addition and similar transformations involving a blaeh pointer and
some integer would still give a blaeh pointer, and hence by default also
solve the problem of cancellations.

Such marking via types would not solve all problems in an optimal way if
you had two overlapping but independend dependency chains (all of them
would collapse to one chain and hence made dependend, which still is
conservatively correct).

OTOH introducing new type qualifiers is a much larger undertaking, so I
can understand one wants to avoid this. I think it'd ultimately be
clearer, though.


Ciao,
Michael.
Paul E. McKenney
2016-03-01 01:28:42 UTC
Permalink
Post by Michael Matz
Hi,
Post by Paul E. McKenney
But we do already have something very similar with signed integer
overflow. If the compiler can see a way to generate faster code that
does not handle the overflow case, then the semantics suddenly change
from twos-complement arithmetic to something very strange. The standard
does not specify all the ways that the implementation might deduce that
faster code can be generated by ignoring the overflow case, it instead
simply says that signed integer overflow invoked undefined behavior.
And if that is a problem, you use unsigned integers instead of signed
integers.
So it seems that we should be able to do something very similar here.
For this case the important pice of information to convey one or the other
meaning in source code is the _type_ of involved entities, not annotations
on the operations. signed type -> undefined overflow, unsigned type ->
modulo arithmetic; easy, and it nicely carries automatically through
operation chains (and pointers) without any annotations.
I feel much of the complexity in the memory order specifications, also
with your recent (much better) wording to explain dependency chains, would
be much easier if the 'carries-dependency' would be encoded into the types
of operands. For purpose of example, let's call the marker "blaeh" (not
int foo;
blaeh int global;
int *somep;
blae int *blaehp;
f () {
blaehp = &foo; // might be okay, adds restrictions on accesses through
// blaehp, but not through 'foo' directly
blaehp = &global;
if (somep == blaehp)
{
/* Even though the value is equal ... */
... *blaehp ... /* ... a compiler can't rewrite this into *somep */
}
}
A "carries-dependency" on some operation (e.g. a call) would be added by
using a properly typed pointer at those arguments (or return type) where
it matters. You can't give a blaeh pointer to something only accepting
non-blaeh pointers (without cast).
Pointer addition and similar transformations involving a blaeh pointer and
some integer would still give a blaeh pointer, and hence by default also
solve the problem of cancellations.
Such marking via types would not solve all problems in an optimal way if
you had two overlapping but independend dependency chains (all of them
would collapse to one chain and hence made dependend, which still is
conservatively correct).
OTOH introducing new type qualifiers is a much larger undertaking, so I
can understand one wants to avoid this. I think it'd ultimately be
clearer, though.
As has been stated in this thread, we do need the unmarked variant.

For the marked variant, there are quite a few possible solutions with
varying advantages and disadvantages:

o Attribute already exists, but is not carried by the type system.
Could be enforced by external tools.

o Storage class could be added with fewer effects on the type
system, but the reaction to this suggestion in October was
not all that positive.

o Non-type keywords for objects has been suggested, might be worth
revisiting.

o Adding to the type system allows type enforcement on the one
hand, but makes it harder to write code that can be used for
both RCU-protected and not-RCU-protected data structures.
(This sort of thing is not uncommon in the Linux kernel.)

There are probably others, but those are the ones I recall at the
moment.

Thanx, Paul

Loading...