Add the atomics report from kib
This commit is contained in:
parent
912615d064
commit
efc92d8a44
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/head/; revision=47579
1 changed files with 143 additions and 0 deletions
|
@ -1207,4 +1207,147 @@
|
|||
</help>
|
||||
</project>
|
||||
|
||||
<project cat='arch'>
|
||||
<title>Atomics</title>
|
||||
|
||||
<contact>
|
||||
<person>
|
||||
<name>
|
||||
<given>Konstantin</given>
|
||||
<common>Belousov</common>
|
||||
</name>
|
||||
<email>kib@FreeBSD.org</email>
|
||||
</person>
|
||||
|
||||
<person>
|
||||
<name>
|
||||
<given>Alan</given>
|
||||
<common>Cox</common>
|
||||
</name>
|
||||
<email>alc@FreeBSD.org</email>
|
||||
</person>
|
||||
|
||||
<person>
|
||||
<name>
|
||||
<given>Bruce</given>
|
||||
<common>Evans</common>
|
||||
</name>
|
||||
<email>bde@FreeBSD.org</email>
|
||||
</person>
|
||||
</contact>
|
||||
|
||||
<body>
|
||||
<p>Atomic operations serve two fundamental purposes. First, they
|
||||
are the building blocks for expressing synchronization algorithms
|
||||
in a single, machine-independent way using high-level languages.
|
||||
In essense, atomics abstract the different building blocks
|
||||
supported by the various architectures on which &os; runs,
|
||||
making it easier to develop and reason about lock-less code by
|
||||
hiding hardware-level details.</p>
|
||||
|
||||
<p>Atomics also provide the barrier operations that allow software
|
||||
to control the effects on memory of out-of-order and speculative
|
||||
execution in modern processors as well as optimizations by
|
||||
compilers. This capability is especially important to
|
||||
multithreaded software, such as the &os; kernel, when running
|
||||
on systems where multiple processors communicate through a shared
|
||||
main memory.</p>
|
||||
|
||||
<p>Each machine architecture defines a memory model, which
|
||||
specifies the possible effects on memory of out-of-order and
|
||||
speculative execution. More precisely, it specifies the extent to
|
||||
which the machine may visibly reorder memory accesses in order to
|
||||
optimize performance. Unfortunately, there are almost as many
|
||||
models as architectures. Moreover, some architectures, for
|
||||
instance IA32 or Sparcv9 TSO, are relatively strongly ordered. In
|
||||
contrast, others, like PowerPC or ARM, are very relaxed. In
|
||||
effect, atomics define a very relaxed abstract memory model for
|
||||
&os;'s machine-independent code that can be efficiently
|
||||
realized on any of these architectures.</p>
|
||||
|
||||
<p>However, most &os; development and testing still happens on
|
||||
x86 machines, which, when combined with x86's strongly ordered
|
||||
memory model, leads to errors in the use of atomics, specifically,
|
||||
barriers. In other words, the code is not properly written to
|
||||
&os;'s abstract memory model, but the strong ordering of the
|
||||
x86 architecture hides this fact. The architectures impacted
|
||||
by the code that incorrectly uses atomics are less popular or
|
||||
have limited availability, and the resulting bugs from the misuse
|
||||
of atomics are hard to diagnose.</p>
|
||||
|
||||
<p>The goal of this project is to audit and upgrade the usage of
|
||||
lockless facilities, hopefully fixing bugs before they are
|
||||
observed in the wild.</p>
|
||||
|
||||
<p>&os; defines its own set of atomics operations, like many
|
||||
other operating systems. But unlike other operating systems, &os;
|
||||
models its atomics and barriers on the release consistency model,
|
||||
which is also known as acquire/release model. This is the same
|
||||
model which is used by the C11 and C++11 language standards as
|
||||
well as the new 64-bit ARM architecture. Despite having
|
||||
syntactical differences, C11 and &os; atomics share essentially
|
||||
the same semantics. Consequently, ample tutorials about the C11
|
||||
memory model and algorithms expressed with C11 atomics can be
|
||||
trivially reused under &os;.</p>
|
||||
|
||||
<p>One facility of C11 that was missing from &os; atomics,
|
||||
was fences. Fences are bidirectional barrier operations
|
||||
which could not be expressed by the existing atomic+barrier
|
||||
accesses. They were added in r285283.</p>
|
||||
|
||||
<p>Due to the strong memory model implemented by x86 processors,
|
||||
atomic_load_acq() and atomic_store_rel() can be implemented by
|
||||
plain load and store instructions with only a compiler barrier; no
|
||||
additional ordering constraints are required. This simplification
|
||||
of atomic_store_rel() was done some time ago in r236456. The
|
||||
atomic_load_acq() change was done in r285934, after careful review
|
||||
of all its uses in the kernel and user-space to ensure that no
|
||||
hidden dependency on a stronger implementation was left.</p>
|
||||
|
||||
<p>The only reordering in memory accesses which is allowed on
|
||||
x86 is that loads may be reordered with older stores to different
|
||||
locations. This results from the use of store buffers at the
|
||||
micro-architecural level. So, to ensure sequentially consistent
|
||||
behavior on x86, a store/load barrier needs to be issued, which
|
||||
can be done with an MFENCE instruction or by any locked RMW
|
||||
operation. The latter approach is recommended by the optimization
|
||||
guides from Intel and AMD. It was noted that careful selection of
|
||||
the scratch memory location, which is modified by the locked RWM
|
||||
operation, can reduce the cost of barrier by avoiding false data
|
||||
dependencies. The corresponding optimization was committed in
|
||||
r284901.</p>
|
||||
|
||||
<p>The atomic(9) man page was often a cause of confusion due to
|
||||
both erroneous and ambiguous statements. The most significant of
|
||||
these issues were addressed in changes r286513 and r286784.</p>
|
||||
|
||||
<p>Some examples of our preemptive fixes to the misuse of atomics
|
||||
that would only become evident on weakly ordered machines
|
||||
are:</p>
|
||||
|
||||
<ul>
|
||||
<li>A very important lockless algorithm, used in both the
|
||||
kernel and libc, is the timekeeping functionality implemented in
|
||||
<tt>kern/kern_tc.c</tt> and the userspace
|
||||
<tt>__vdso_gettimeofday</tt>. This algorithm relied on x86 TSO
|
||||
behavior. It was fixed in r284178 and r285286.</li>
|
||||
|
||||
<li>The <tt>kern/kern_intr.c</tt> lockless updates to the
|
||||
<tt>it_need</tt> indicator were corrected in r285607.</li>
|
||||
|
||||
<li>An issue with
|
||||
<tt>kern/subr_smp.c:smp_rendezvous_cpus()</tt> not guaranteeing
|
||||
the visibility of updates done on other CPUs to the caller was
|
||||
fixed in r285771.</li>
|
||||
|
||||
<li>The <tt>pthread_once()</tt> implementation was fixed to
|
||||
include missed barriers in r287556.</li>
|
||||
</ul>
|
||||
</body>
|
||||
|
||||
<sponsor>
|
||||
The FreeBSD Foundation (Konstantin Belousov's work)
|
||||
</sponsor>
|
||||
</project>
|
||||
|
||||
</report>
|
||||
|
|
Loading…
Reference in a new issue