259 lines
9.3 KiB
Plaintext
259 lines
9.3 KiB
Plaintext
CONTROL DEPENDENCIES
|
|
====================
|
|
|
|
A major difficulty with control dependencies is that current compilers
|
|
do not support them. One purpose of this document is therefore to
|
|
help you prevent your compiler from breaking your code. However,
|
|
control dependencies also pose other challenges, which leads to the
|
|
second purpose of this document, namely to help you to avoid breaking
|
|
your own code, even in the absence of help from your compiler.
|
|
|
|
One such challenge is that control dependencies order only later stores.
|
|
Therefore, a load-load control dependency will not preserve ordering
|
|
unless a read memory barrier is provided. Consider the following code:
|
|
|
|
q = READ_ONCE(a);
|
|
if (q)
|
|
p = READ_ONCE(b);
|
|
|
|
This is not guaranteed to provide any ordering because some types of CPUs
|
|
are permitted to predict the result of the load from "b". This prediction
|
|
can cause other CPUs to see this load as having happened before the load
|
|
from "a". This means that an explicit read barrier is required, for example
|
|
as follows:
|
|
|
|
q = READ_ONCE(a);
|
|
if (q) {
|
|
smp_rmb();
|
|
p = READ_ONCE(b);
|
|
}
|
|
|
|
However, stores are not speculated. This means that ordering is
|
|
(usually) guaranteed for load-store control dependencies, as in the
|
|
following example:
|
|
|
|
q = READ_ONCE(a);
|
|
if (q)
|
|
WRITE_ONCE(b, 1);
|
|
|
|
Control dependencies can pair with each other and with other types
|
|
of ordering. But please note that neither the READ_ONCE() nor the
|
|
WRITE_ONCE() are optional. Without the READ_ONCE(), the compiler might
|
|
fuse the load from "a" with other loads. Without the WRITE_ONCE(),
|
|
the compiler might fuse the store to "b" with other stores. Worse yet,
|
|
the compiler might convert the store into a load and a check followed
|
|
by a store, and this compiler-generated load would not be ordered by
|
|
the control dependency.
|
|
|
|
Furthermore, if the compiler is able to prove that the value of variable
|
|
"a" is always non-zero, it would be well within its rights to optimize
|
|
the original example by eliminating the "if" statement as follows:
|
|
|
|
q = a;
|
|
b = 1; /* BUG: Compiler and CPU can both reorder!!! */
|
|
|
|
So don't leave out either the READ_ONCE() or the WRITE_ONCE().
|
|
In particular, although READ_ONCE() does force the compiler to emit a
|
|
load, it does *not* force the compiler to actually use the loaded value.
|
|
|
|
It is tempting to try use control dependencies to enforce ordering on
|
|
identical stores on both branches of the "if" statement as follows:
|
|
|
|
q = READ_ONCE(a);
|
|
if (q) {
|
|
barrier();
|
|
WRITE_ONCE(b, 1);
|
|
do_something();
|
|
} else {
|
|
barrier();
|
|
WRITE_ONCE(b, 1);
|
|
do_something_else();
|
|
}
|
|
|
|
Unfortunately, current compilers will transform this as follows at high
|
|
optimization levels:
|
|
|
|
q = READ_ONCE(a);
|
|
barrier();
|
|
WRITE_ONCE(b, 1); /* BUG: No ordering vs. load from a!!! */
|
|
if (q) {
|
|
/* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
|
|
do_something();
|
|
} else {
|
|
/* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
|
|
do_something_else();
|
|
}
|
|
|
|
Now there is no conditional between the load from "a" and the store to
|
|
"b", which means that the CPU is within its rights to reorder them: The
|
|
conditional is absolutely required, and must be present in the final
|
|
assembly code, after all of the compiler and link-time optimizations
|
|
have been applied. Therefore, if you need ordering in this example,
|
|
you must use explicit memory ordering, for example, smp_store_release():
|
|
|
|
q = READ_ONCE(a);
|
|
if (q) {
|
|
smp_store_release(&b, 1);
|
|
do_something();
|
|
} else {
|
|
smp_store_release(&b, 1);
|
|
do_something_else();
|
|
}
|
|
|
|
Without explicit memory ordering, control-dependency-based ordering is
|
|
guaranteed only when the stores differ, for example:
|
|
|
|
q = READ_ONCE(a);
|
|
if (q) {
|
|
WRITE_ONCE(b, 1);
|
|
do_something();
|
|
} else {
|
|
WRITE_ONCE(b, 2);
|
|
do_something_else();
|
|
}
|
|
|
|
The initial READ_ONCE() is still required to prevent the compiler from
|
|
knowing too much about the value of "a".
|
|
|
|
But please note that you need to be careful what you do with the local
|
|
variable "q", otherwise the compiler might be able to guess the value
|
|
and again remove the conditional branch that is absolutely required to
|
|
preserve ordering. For example:
|
|
|
|
q = READ_ONCE(a);
|
|
if (q % MAX) {
|
|
WRITE_ONCE(b, 1);
|
|
do_something();
|
|
} else {
|
|
WRITE_ONCE(b, 2);
|
|
do_something_else();
|
|
}
|
|
|
|
If MAX is compile-time defined to be 1, then the compiler knows that
|
|
(q % MAX) must be equal to zero, regardless of the value of "q".
|
|
The compiler is therefore within its rights to transform the above code
|
|
into the following:
|
|
|
|
q = READ_ONCE(a);
|
|
WRITE_ONCE(b, 2);
|
|
do_something_else();
|
|
|
|
Given this transformation, the CPU is not required to respect the ordering
|
|
between the load from variable "a" and the store to variable "b". It is
|
|
tempting to add a barrier(), but this does not help. The conditional
|
|
is gone, and the barrier won't bring it back. Therefore, if you need
|
|
to relying on control dependencies to produce this ordering, you should
|
|
make sure that MAX is greater than one, perhaps as follows:
|
|
|
|
q = READ_ONCE(a);
|
|
BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
|
|
if (q % MAX) {
|
|
WRITE_ONCE(b, 1);
|
|
do_something();
|
|
} else {
|
|
WRITE_ONCE(b, 2);
|
|
do_something_else();
|
|
}
|
|
|
|
Please note once again that each leg of the "if" statement absolutely
|
|
must store different values to "b". As in previous examples, if the two
|
|
values were identical, the compiler could pull this store outside of the
|
|
"if" statement, destroying the control dependency's ordering properties.
|
|
|
|
You must also be careful avoid relying too much on boolean short-circuit
|
|
evaluation. Consider this example:
|
|
|
|
q = READ_ONCE(a);
|
|
if (q || 1 > 0)
|
|
WRITE_ONCE(b, 1);
|
|
|
|
Because the first condition cannot fault and the second condition is
|
|
always true, the compiler can transform this example as follows, again
|
|
destroying the control dependency's ordering:
|
|
|
|
q = READ_ONCE(a);
|
|
WRITE_ONCE(b, 1);
|
|
|
|
This is yet another example showing the importance of preventing the
|
|
compiler from out-guessing your code. Again, although READ_ONCE() really
|
|
does force the compiler to emit code for a given load, the compiler is
|
|
within its rights to discard the loaded value.
|
|
|
|
In addition, control dependencies apply only to the then-clause and
|
|
else-clause of the "if" statement in question. In particular, they do
|
|
not necessarily order the code following the entire "if" statement:
|
|
|
|
q = READ_ONCE(a);
|
|
if (q) {
|
|
WRITE_ONCE(b, 1);
|
|
} else {
|
|
WRITE_ONCE(b, 2);
|
|
}
|
|
WRITE_ONCE(c, 1); /* BUG: No ordering against the read from "a". */
|
|
|
|
It is tempting to argue that there in fact is ordering because the
|
|
compiler cannot reorder volatile accesses and also cannot reorder
|
|
the writes to "b" with the condition. Unfortunately for this line
|
|
of reasoning, the compiler might compile the two writes to "b" as
|
|
conditional-move instructions, as in this fanciful pseudo-assembly
|
|
language:
|
|
|
|
ld r1,a
|
|
cmp r1,$0
|
|
cmov,ne r4,$1
|
|
cmov,eq r4,$2
|
|
st r4,b
|
|
st $1,c
|
|
|
|
The control dependencies would then extend only to the pair of cmov
|
|
instructions and the store depending on them. This means that a weakly
|
|
ordered CPU would have no dependency of any sort between the load from
|
|
"a" and the store to "c". In short, control dependencies provide ordering
|
|
only to the stores in the then-clause and else-clause of the "if" statement
|
|
in question (including functions invoked by those two clauses), and not
|
|
to code following that "if" statement.
|
|
|
|
|
|
In summary:
|
|
|
|
(*) Control dependencies can order prior loads against later stores.
|
|
However, they do *not* guarantee any other sort of ordering:
|
|
Not prior loads against later loads, nor prior stores against
|
|
later anything. If you need these other forms of ordering, use
|
|
smp_load_acquire(), smp_store_release(), or, in the case of prior
|
|
stores and later loads, smp_mb().
|
|
|
|
(*) If both legs of the "if" statement contain identical stores to
|
|
the same variable, then you must explicitly order those stores,
|
|
either by preceding both of them with smp_mb() or by using
|
|
smp_store_release(). Please note that it is *not* sufficient to use
|
|
barrier() at beginning and end of each leg of the "if" statement
|
|
because, as shown by the example above, optimizing compilers can
|
|
destroy the control dependency while respecting the letter of the
|
|
barrier() law.
|
|
|
|
(*) Control dependencies require at least one run-time conditional
|
|
between the prior load and the subsequent store, and this
|
|
conditional must involve the prior load. If the compiler is able
|
|
to optimize the conditional away, it will have also optimized
|
|
away the ordering. Careful use of READ_ONCE() and WRITE_ONCE()
|
|
can help to preserve the needed conditional.
|
|
|
|
(*) Control dependencies require that the compiler avoid reordering the
|
|
dependency into nonexistence. Careful use of READ_ONCE() or
|
|
atomic{,64}_read() can help to preserve your control dependency.
|
|
|
|
(*) Control dependencies apply only to the then-clause and else-clause
|
|
of the "if" statement containing the control dependency, including
|
|
any functions that these two clauses call. Control dependencies
|
|
do *not* apply to code beyond the end of that "if" statement.
|
|
|
|
(*) Control dependencies pair normally with other types of barriers.
|
|
|
|
(*) Control dependencies do *not* provide multicopy atomicity. If you
|
|
need all the CPUs to agree on the ordering of a given store against
|
|
all other accesses, use smp_mb().
|
|
|
|
(*) Compilers do not understand control dependencies. It is therefore
|
|
your job to ensure that they do not break your code.
|