diff --git a/en_US.ISO8859-1/articles/smp/Makefile b/en_US.ISO8859-1/articles/smp/Makefile new file mode 100644 index 0000000000..85675c8e15 --- /dev/null +++ b/en_US.ISO8859-1/articles/smp/Makefile @@ -0,0 +1,18 @@ +# $FreeBSD$ + +MAINTAINER=jhb@FreeBSD.org + +DOC?= article + +FORMATS?= html + +INSTALL_COMPRESSED?=gz +INSTALL_ONLY_COMPRESSED?= + +JADEFLAGS+= -V %generate-article-toc% + +SRCS= article.sgml + +DOC_PREFIX?= ${.CURDIR}/../../.. + +.include "${DOC_PREFIX}/share/mk/doc.project.mk" diff --git a/en_US.ISO8859-1/articles/smp/article.sgml b/en_US.ISO8859-1/articles/smp/article.sgml new file mode 100644 index 0000000000..3f6b233f60 --- /dev/null +++ b/en_US.ISO8859-1/articles/smp/article.sgml @@ -0,0 +1,934 @@ + +%man; + + +%authors; + + + + +]> + +
+ + SMPng Design Document + + + + John + Baldwin + + + Robert + Watson + + + + $FreeBSD$ + + + 2002 + John Baldwin + Robert Watson + + + + This document presents the current design and implementation of + the SMPng Architecture. First, the basic primitives and tools are + introduced. Next, a general architecture for the FreeBSD kernel's + synchronization and execution model is laid out. Then, locking + strategies for specific subsystems are discussed, documenting the + approaches taken to introduce fine-grained synchronization and + parallelism for each subsystem. Finally, detailed implementation + notes are provided to motivate design choices, and make the reader + aware of important implications involving the use of specific + primitives. + + + + + Introduction + + This document is a work-in-progress, and will be updated to + reflect on-going design and implementation activities associated + with the SMPng Project. Many sections currently exist only in + outline form, but will be fleshed out as work proceeds. Updates or + suggestions regarding the document may be directed to the document + editors. + + The goal of SMPng is to allow concurrency in the kernel. + The kernel is basically one rather large and complex program. To + make the kernel multithreaded we use some of the same tools used + to make other programs multithreaded. These include mutexes, + reader/writer locks, semaphores, and condition variables. For + definitions of many of the terms, please see + . + + + + Basic Tools and Locking Fundamentals + + + Atomic Instructions and Memory Barriers + + There are several existing treatments of memory barriers + and atomic instructions, so this section will not include a + lot of detail. To put it simply, one cannot go around reading + variables without a lock if a lock is used to protect writes + to that variable. This becomes obvious when you consider that + memory barriers simply determine relative order of memory + operations; they do not make any guarantee about timing of + memory operations. That is, a memory barrier does not force + the contents of a CPU's local cache or store buffer to flush. + Instead, the memory barrier at lock release simply ensures + that all writes to the protected data will be visible to other + CPU's or devices if the write to release the lock is visible. + The CPU is free to keep that data in its cache or store buffer + as long as it wants. However, if another CPU performs an + atomic instruction on the same datum, the first CPU must + guarantee that the updated value is made visible to the second + CPU along with any other operations that memory barriers may + require. + + For example, assuming a simple model where data is + considered visible when it is in main memory (or a global + cache), when an atomic instruction is triggered on one CPU, + other CPU's store buffers and caches must flush any writes to + that same cache line along with any pending operations behind + a memory barrier. + + This requires one to take special care when using an item + protected by atomic instructions. For example, in the sleep + mutex implementation, we have to use an + atomic_cmpset rather than an + atomic_set to turn on the + MTX_CONTESTED bit. The reason is that we + read the value of mtx_lock into a + variable and then make a decision based on that read. + However, the value we read may be stale, or it may change + while we are making our decision. Thus, when the + atomic_set executed, it may end up + setting the bit on another value than the one we made the + decision on. Thus, we have to use an + atomic_cmpset to set the value only if + the value we made the decision on is up-to-date and + valid. + + Finally, atomic instructions only allow one item to be + updated or read. If one needs to atomically update several + items, then a lock must be used instad. For example, if two + counters must be read and have values that are consistent + relative to each other, then those counters must be protected + by a lock rather than by separate atomic instructions. + + + + Read Locks versus Write Locks + + Read locks do not need to be as strong as write locks. + Both types of locks need to ensure that the data they are + accessing is not stale. However, only write access requires + exclusive access. Multiple threads can safely read a value. + Using different types of locks for reads and writes can be + implemented in a number of ways. + + First, sx locks can be used in this manner by using an + exclusive lock when writing and a shared lock when reading. + This method is quite straightforward. + + A second method is a bit more obscure. You can protect a + datum with multiple locks. Then for reading that data you + simply need to have a read lock of one of the locks. However, + to write to the data, you need to have a write lock of all of + the locks. This can make writing rather expensive but can be + useful when data is accessed in various ways. For example, + the parent process pointer is proctected by both the + proctree_lock sx lock and the per-process mutex. Sometimes + the proc lock is easier as we are just checking to see who a + parent of a process is that we already have locked. However, + other places such as inferior need to + walk the tree of processes via parent pointers and locking + each process would be prohibitive as well as a pain to + guarantee that the condition you are checking remains valid + for both the check and the actions taken as a result of the + check. + + + + Locking Conditions and Results + + If you need a lock to check the state of a variable so + that you can take an action based on the state you read, you + can't just hold the lock while reading the variable and then + drop the lock before you act on the value you read. Once you + drop the lock, the variable can change rendering your decision + invalid. Thus, you must hold the lock both while reading the + variable and while performing the action as a result of the + test. + + + + + General Architecture and Design + + + Interrupt Handling + + Following the pattern of several other multithreaded Unix + kernels, FreeBSD deals with interrupt handlers by giving them + their own thread context. Providing a context for interrupt + handlers allows them to block on locks. To help avoid + latency, however, interrupt threads run at real-time kernel + priority. Thus, interrupt handlers should not execute for very + long to avoid starving other kernel threads. In addition, + since multiple handlers may share an interrupt thread, + interrupt handlers should not sleep or use a sleepable lock to + avoid starving another interrupt handler. + + The interrupt threads currently in FreeBSD are referred to + as heavyweight interrupt threads. They are called this + because switching to an interrupt thread involves a full + context switch. In the initial implementation, the kernel was + not preemptive and thus interrupts that interrupted a kernel + thread would have to wait until the kernel thread blocked or + returned to userland before they would have an opportunity to + run. + + To deal with the latency problems, the kernel in FreeBSD + has been made preemptive. Currently, we only preempt a kernel + thread when we release a sleep mutex or when an interrupt + comes in. However, the plan is to make the FreeBSD kernel + fully preemptive as described below. + + Not all interrupt handlers execute in a thread context. + Instead, some handlers execute directly in primary interrupt + context. These interrupt handlers are currently misnamed + fast interrupt handlers since the + INTR_FAST flag used in earlier versions + of the kernel is used to mark these handlers. The only + interrupts which currently use these types of interrupt + handlers are clock interrupts and serial I/O device + interrupts. Since these handlers do not have their own + context, they may not acquire blocking locks and thus may only + use spin mutexes. + + Finally, there is one optional optimization that can be + added in MD code called lightweight context switches. Since + an interrupt thread executes in a kernel context, it can + borrow the vmspace of any process. Thus, in a lightweight + context switch, the switch to the interrupt thread does not + switch vmspaces but borrows the vmspace of the interrupted + thread. In order to ensure that the vmspace of the + interrupted thread doesn't disappear out from under us, the + interrupted thread is not allowed to execute until the + interrupt thread is no longer borrowing its vmspace. This can + happen when the interrupt thread either blocks or finishes. + If an interrupt thread blocks, then it will use its own + context when it is made runnable again. Thus, it can release + the interrupted thread. + + The cons of this optimization are that they are very + machine specific and complex and thus only worth the effor if + their is a large performance improvement. At this point it is + probably too early to tell, and in fact, will probably hurt + performance as almost all interrupt handlers will immediately + block on Giant and require a thread fixup when they block. + Also, an alternative method of interrupt handling has been + proposed by Mike Smith that works like so: + + + + Each interrupt handler has two parts: a predicate + which runs in primary interrupt context and a handler + which runs in its own thread context. + + + + If an interrupt handler has a predicate, then when an + interrupt is triggered, the predicate is run. If the + predicate returns true then the interrupt is assumed to be + fully handled and the kernel returns from the interrupt. + If the predicate returns false or there is no predicate, + then the threaded handler is scheduled to run. + + + + Fitting light weight context switches into this scheme + might prove rather complicated. Since we may want to change + to this scheme at some point in the future, it is probably + best to defer work on light weight context switches until we + have settled on the final interrupt handling architecture and + determined how light weight context switches might or might + not fit into it. + + + + Kernel Preemption and Critical Sections + + + Kernel Preemption in a Nutshell + + Kernel preemption is fairly simple. The basic idea is + that a CPU should always be doing the highest priority work + available. Well, that is the ideal at least. There are a + couple of cases where the expense of achieving the ideal is + not worth being perfect. + + Implementing full kernel preemption is very + straightforward: when you schedule a thread to be executed + by putting it on a runqueue, you check to see if it's + priority is higher than the currently executing thread. If + so, you initiate a context switch to that thread. + + While locks can protect most data in the case of a + preemption, not all of the kernel is preemption safe. For + example, if a thread holding a spin mutex preempted and the + new thread attempts to grab the same spin mutex, the new + thread may spin forever as the interrupted thread may never + get a chance to execute. Also, some code such as the code + to assign an address space number for a process during + exec() on the Alpha needs to not be preempted as it supports + the actual context switch code. Preemption is disabled for + these code sections by using a critical section. + + + + Critical Sections + + The responsibility of the critical section API is to + prevent context switches inside of a critical section. With + a fully preemptive kernel, every + setrunqueue of a thread other than the + current thread is a preemption point. One implementation is + for critical_enter to set a per-thread + flag that is cleared by its counterpart. If + setrunqueue is called with this flag + set, it doesn't preempt regarless of the priority of the new + thread relative to the current thread. However, since + critical sections are used in spin mutexes to prevent + context switches and multiple spin mutexes can be acquired, + the critical section API must support nesting. For this + reason the current implementation uses a nesting count + instead of a single per-thread flag. + + In order to minimize latency, preemptions inside of a + critical section are deferred rather than dropped. If a + thread is made runnable that would normally be preempted to + outside of a critical section, then a per-thread flag is set + to indicate that there is a pending preemption. When the + outermost critical section is exited, the flag is checked. + If the flag is set, then the current thread is preempted to + allow the higher priority thread to run. + + Interrupts pose a problem with regards to spin mutexes. + If a low-level interrupt handler needs a lock, it needs to + not interrupt any code needing that lock to avoid possible + data structure corruption. Currently, providing this + mechanism is piggybacked onto critical section API by means + of the cpu_critical_enter and + cpu_critical_exit functions. Currently + this API disables and reenables interrupts on all of + FreeBSD's current platforms. This approach may not be + purely optimal, but it is simple to understand and simple to + get right. Theoretically, this second API need only be used + for spin mutexes that are used in primary interrupt context. + However, to make the code simpler, it is used for all spin + mutexes and even all critical sections. It may be desirable + to split out the MD API from the MI API and only use it in + conjunction with the MI API in the spin mutex + implementation. If this approach is taken, then the MD API + likely would need a rename to show that it is a separate API + now. + + + + Design Tradeoffs + + As mentioned earlier, a couple of tradeoffs have been + made to sacrafice cases where perfect preemption may not + always provide the best performance. + + The first tradeoff is that the preemption code does not + take other CPUs into account. Suppose we have a two CPU's A + and B with the priority of A's thread as 4 and the priority + of B's thread as 2. If CPU B makes a thread with priority 1 + runnable, then in theory, we want CPU A to switch to the new + thread so that we will be running the two highest priority + runnable threads. However, the cost of determining which + CPU to enforce a preemption on as well as actually signaling + that CPU via an IPI along with the synchronization that + would be required would be enormous. Thus, the current code + would instead force CPU B to switch to the higher priority + thread. Note that this still puts the system in a better + position as CPU B is executing a thread of priority 1 rather + than a thread of priority 2. + + The second tradeoff limits immediate kernel preemption + to real-time priority kernel threads. In the simple case of + preemption defined above, a thread is always preempted + immediately (or as soon as a critical section is exited) if + a higher priority thread is made runnable. However, many + threads executing in the kernel only execute in a kernel + context for a short time before either blocking or returning + to userland. Thus, if the kernel preempts these threads to + run another non-realtime kernel thread, the kernel may + switch out the executing thread just before it is about to + sleep or execute. The cache on the CPU must then adjust to + the new thread. When the kernel returns to the interrupted + CPU, it must refill all the cache informatino that was lost. + In addition, two extra context switches are performed that + could be avoided if the kernel deferred the preemption until + the first thread blocked or returned to userland. Thus, by + default, the preemption code will only preempt immediately + if the higher priority thread is a real-time priority + thread. + + Turning on full kernel preemption for all kernel threads + has value as a debugging aid since it exposes more race + conditions. It is especially useful on UP systems were many + races are hard to simulate otherwise. Thus, there will be a + kernel option to enable preemption for all kernel threads + that can be used for debugging purposes. + + + + + Thread Migration + + Simply put, a thread migrates when it moves from one CPU + to another. In a non-preemptive kernel this can only happen + at well-defined points such as when calling + tsleep or returning to userland. + However, in the preemptive kernel, an interrupt can force a + preemption and possible migration at any time. This can have + negative affects on per-CPU data since with the exception of + curthread and curpcb the + data can change whenever you migrate. Since you can + potentially migrate at any time this renders per-CPU data + rather useless. Thus it is desirable to be able to disable + migration for sections of code that need per-CPU data to be + stable. + + Critical sections currently prevent migration since they + don't allow context switches. However, this may be too strong + of a requirement to enforce in some cases since a critical + section also effectively blocks interrupt threads on the + current processor. As a result, it may be desirable to + provide an API whereby code may indicate that if the current + thread is preempted it should not migrate to another + CPU. + + One possible implementation is to use a per-thread nesting + count td_pinnest along with a + td_pincpu which is updated to the current + CPU on each context switch. Each CPU has its own run queue + that holds threads pinned to that CPU. A thread is pinned + when its nesting count is greater than zero and a thread + starts off unpinned with a nesting count of zero. When a + thread is put on a runqueue, we check to see if it is pinned. + If so, we put it on the per-CPU runqueue, otherwise we put it + on the global runqueue. When + choosethread is called to retrieve the + next thread, it could either always prefer bound threads to + unbound threads or use some sort of bias when comparing + priorities. If the nesting count is only ever written to by + the thread itself and is only read by other threads when the + owning thread is not executing but while holding the + sched_lock, then + td_pinnest will not need any other locks. + The migrate_disable function would + increment the nesting count and + migrate_enable would decrement the + nesting count. Due to the locking requirements specified + above, they will only operate on the current thread and thus + would not need to handle the case of making a thread + migratable that currently resides on a per-CPU run + queue. + + It is still debatable if this API is needed or if the + critical section API is sufficient by itself. Many of the + places that need to prevent migration also need to prevent + preemption as well, and in those places a critical section + must be used regardless. + + + + Callouts + + The timeout() kernel facility permits + kernel services to register funtions for execution as part + of the softclock() software interrupt. + Events are scheduled based on a desired number of clock + ticks, and callbacks to the consumer-provided function + will occur at approximately the right time. + + The global list of pending timeout events is protected + by a global spin mutex, callout_lock; + all access to the timeout list must be performed with this + mutex held. When softclock() is + woken up, it scans the list of pending timeouts for those + that should fire. In order to avoid lock order reversal, + the softclock thread will release the + callout_lock mutex when invoking the + provided timeout() callback function. + If the CALLOUT_MPSAFE flag was not set + during registration, then Giant will be grabbed before + invoking the callout, and then released afterwards. The + callout_lock mutex will be re-grabbed + before proceeding. The softclock() + code is careful to leave the list in a consistent state + while releasing the mutex. If DIAGNOSTIC + is enabled, then the time taken to execute each function is + measured, and a warning generated if it exceeds a + threshold. + + + + + Specific Locking Strategies + + + Credentials + + struct ucred is the system + internal credential structure, and is generally used as the + basis for process-driven access control. BSD-derived systems + use a "copy-on-write" model for credential data: multiple + references may exist for a credential structure, and when a + change needs to be made, the structure is duplicated, + modified, and then the reference replaced. Due to wide-spread + caching of the credential to implement access control on open, + this results in substantial memory savings. With a move to + fine-grained SMP, this model also saves substantially on + locking operations by requiring that modification only occur + on an unshared credential, avoiding the need for explicit + synchronization when consuming a known-shared + credential. + + Credential structures with a single reference are + considered mutable; shared credential structures must not be + modified or a race condition is risked. A mutex, + cr_mtxp protects the reference + count of the struct ucred so as to + maintain consistency. Any use of the structure requires a + valid reference for the duration of the use, or the structure + may be released out from under the illegitimate + consumer. + + The struct ucred mutex is a leaf + mutex, and for performance reasons, is implemented via a mutex + pool. + + + + File Descriptors and File Descriptor Tables + + Details to follow. + + + + Jail Structures + + struct prison stores + administrative details pertinent to the maintenance of jails + created using the &man.jail.2; API. This includes the + per-jail hostname, IP address, and related settings. This + structure is reference-counted since pointers to instances of + the structure are shared by many credential structures. A + single mutex, pr_mtx protects read + and write access to the reference count and all mutable + variables inside the struct jail. Some variables are set only + when the jail is created, and a valid reference to the + struct prison is sufficient to read + these values. The precise locking of each entry is documented + via comments in jail.h. + + + + MAC Framework + + The TrustedBSD MAC Framework maintains data in a variety + of kernel objects, in the form of struct + label. In general, labels in kernel objects + are protected by the same lock as the remainder of the kernel + object. For example, the v_label + label in struct vnode is protected + by the vnode lock on the vnode. + + In addition to labels maintained in standard kernel objects, + the MAC Framework also maintains a list of registered and + active policies. The policy list is protected by a global + mutex (mac_policy_list_lock) and a busy + count (also protected by the mutex). Since many access + control checks may occur in parallel, entry to the framework + for a read-only access to the policy list requires holding the + mutex while incrementing (and later decrementing) the busy + count. The mutex need not be held for the duration of the + MAC entry operation--some operations, such as label operations + on file system objects--are long-lived. To modify the policy + list, such as during policy registration and deregistration, + the mutex must be held and the reference count must be zero, + to prevent modification of the list while it is in use. + + A condition variable, + mac_policy_list_not_busy, is available to + threads that need to wait for the list to become unbusy, but + this condition variable must only be waited on if the caller is + holding no other locks, or a lock order violation may be + possible. The busy count, in effect, acts as a form of + reader/writer lock over access to the framework: the difference + is that, unlike with an sxlock, consumers waiting for the list + to become unbusy may be starved, rather than permitting lock + order problems with regards to the busy count and other locks + that may be held on entry to (or inside) the MAC Framework. + + + + Modules + + For the module subsystem there exists a single lock that is + used to protect the shared data. This lock is a shared/exclusive + (SX) lock and has a good chance of needing to be acquired (shared + or exclusively), therefore there are a few macros that have been + added to make access to the lock more easy. These macros can be + located in sys/module.h and are quite basic + in terms of usage. The main structures protected under this lock + are the module_t structures (when shared) + and the global modulelist_t structure, + modules. One should review the related source code in + kern/kern_module.c to further understand the + locking strategy. + + + + Newbus Device Tree + + The newbus system will have one sx lock. Readers will + lock it &man.sx.slock.9; and writers will lock it + &man.sx.xlock.9;. Internal only functions will not do locking + at all. The externally visable ones will lock as needed. + Those items that don't matter if the race is won or lost will + not be locked, since they tend to be read all over the place + (eg &man.device.get.softc.9;). There will be relatively few + changes to the newbus datastructures, so a single lock should + be sufficient and not impose a performance penalty. + + + + Pipes + + ... + + + + Processes and Threads + + - process hiearachy + - proc locks, references + - thread-specific copies of proc entries to freeze during system + calls, including td_ucred + - inter-process operations + - process groups and sessions + + + + Scheduler + + Lots of references to sched_lock and notes + pointing at specific primitives and related magic elsewhere in the + document. + + + + Select and Poll + + The select() and poll() functions permit threads to block + waiting on events on file descriptors--most frequently, whether + or not the file descriptors are readable or writable. + + ... + + + + SIGIO + + The SIGIO service permits processes to request the delivery + of a SIGIO signal to its process group when the read/write status + of specified file descriptors changes. At most one process or + process group is permitted to register for SIGIO from any given + kernel object, and that process or group is referred to as + the owner. Each object supporting SIGIO registration contains + pointer field that is NULL if the object is not registered, or + points to a struct sigio describing + the registration. This field is protected by a global mutex, + sigio_lock. Callers to SIGIO maintenance + functions must pass in this field "by reference" so that local + register copies of the field are not made when unprotected by + the lock. + + One struct sigio is allocated for + each registered object associated with any process or process + group, and contains back-pointers to the object, owner, signal + information, a credential, and the general disposition of the + registration. Each process or progress group contains a list of + registered struct sigio structures, + p_sigiolst for processes, and + pg_sigiolst for process groups. + These lists are protected by the process or process group + locks respectively. Most fields in each struct + sigio are constant for the duration of the + registration, with the exception of the + sio_pgsigio field which links the + struct sigio into the process or + process group list. Developers implementing new kernel + objects supporting SIGIO will, in general, want to avoid + holding structure locks while invoking SIGIO supporting + functions, such as fsetown() + or funsetown() to avoid + defining a lock order between structure locks and the global + SIGIO lock. This is generally possible through use of an + elevated reference count on the structure, such as reliance + on a file descriptor reference to a pipe during a pipe + operation. + + + + sysctl + + The sysctl() MIB service is invoked + from both within the kernel and from userland applications + using a system call. At least two issues are raised in locking: + first, the protection of the structures maintaining the + namespace, and second, interactions with kernel variables and + functions that are accessed by the sysctl interface. Since + sysctl permits the direct export (and modification) of + kernel statistics and configuration parameters, the sysctl + mechanism must become aware of appropriate locking semantics + for those variables. Currently, sysctl makes use of a + single global sxlock to serialize use + of sysctl(); however, it is assumed to operate under Giant + and other protections are not provided. The remainder of + this section speculates on locking and semantic changes + to sysctl. + + - Need to change the order of operations for sysctl's that + update values from read old, copyin and copyout, write new to + copyin, lock, read old and write new, unlock, copyout. Normal + sysctl's that just copyout the old value and set a new value + that they copyin may still be able to follow the old model. + However, it may be cleaner to use the second model for all of + the sysctl handlers to avoid lock operations. + + - To allow for the common case, a sysctl could embed a + pointer to a mutex in the SYSCTL_FOO macros and in the struct. + This would work for most sysctls. For values protected by sx + locks, spin mutexes, or other locking strategies besides a + single sleep mutex, SYSCTL_PROC nodes could be used to get the + locking right. + + + + Taskqueue + + The taskqueue's interface has two basic locks associated + with it in order to protect the related shared data. The + taskqueue_queues_mutex is meant to serve as a + lock to protect the taskqueue_queues TAILQ. + The other mutex lock associated with this system is the one in the + struct taskqueue data structure. The + use of the synchronization primitive here is to protect the + integrity of the data in the struct + taskqueue. It should be noted that there are no + separate macros to assist the user in locking down his/her own work + since these locks are most likely not going to be used outside of + kern/subr_taskqueue.c. + + + + + Implementation Notes + + + Details of the Mutex Implementation + + - Should we require mutexes to be owned for mtx_destroy() + since we can't safely assert that they are unowned by anyone + else otherwise? + + + Spin Mutexes + + - Use a critical section... + + + + Sleep Mutexes + + - Describe the races with contested mutexes + + - Why it's safe to read mtx_lock of a contested mutex + when holding sched_lock. + + - Priority propagation + + + + + Witesss + + - What does it do + + - How does it work + + + + + Miscellaneous Topics + + + Interrupt Source and ICU Abstractions + + - struct isrc + + - pic drivers + + + + Other Random Questions/Topics + + Should we pass an interlock into + sema_wait? + + - Generic turnstiles for sleep mutexes and sx locks. + + - Should we have non-sleepable sx locks? + + + + + Definitions + + + atomic + + An operation is atomic if all of its effects are visible + to other CPUs together when the proper access protocol is + followed. In the degenerate case are atomic instructions + provided directly by machine architectures. At a higher + level, if several members of a structure are protected by a + lock, then a set of operations are atomic if they are all + performed while holding the lock without releasing the lock + in between any of the operations. + + operation + + + + + block + + A thread is blocked when it is waiting on a lock, + resource, or condition. Unfortunately this term is a bit + overloaded as a result. + + sleep + + + + + critical section + + A section of code that is not allowed to be preempted. + A critical section is entered and exited using the + &man.critical.enter.9; API. + + + + + MD + + Machine depenedent. + + MI + + + + + memory operation + + A memory operation reads and/or writes to a memory + location. + + + + + MI + + Machine indepenedent. + + MD + + + + + operation + memory operation + + + + primary interrupt context + + Primary interrupt context refers to the code that runs + when an interrupt occurs. This code can either run an + interrupt handler directly or schedule an asynchronous + interrupt thread to execute the interrupt handlers for a + given interrupt source. + + + + + realtime kernel thread + + A high priority kernel thread. Currently, the only + realtime priority kernel threads are interrupt threads. + + thread + + + + + sleep + + A thread is asleep when it is blocked on a condition + variable or a sleep queue via msleep or + tsleep. + + block + + + + + sleepable lock + + A sleepable lock is a lock that can be held by a thread + which is asleep. Lockmgr locks and sx locks are currently + the only sleepable locks in FreeBSD. Eventually, some sx + locks such as the allproc and proctree locks may become + non-sleepable locks. + + sleep + + + + + thread + + A kernel thread represented by a struct thread. Threads own + locks and hold a single execution context. + + + +
diff --git a/en_US.ISO8859-1/books/arch-handbook/smp/chapter.sgml b/en_US.ISO8859-1/books/arch-handbook/smp/chapter.sgml new file mode 100644 index 0000000000..3f6b233f60 --- /dev/null +++ b/en_US.ISO8859-1/books/arch-handbook/smp/chapter.sgml @@ -0,0 +1,934 @@ + +%man; + + +%authors; + + + + +]> + +
+ + SMPng Design Document + + + + John + Baldwin + + + Robert + Watson + + + + $FreeBSD$ + + + 2002 + John Baldwin + Robert Watson + + + + This document presents the current design and implementation of + the SMPng Architecture. First, the basic primitives and tools are + introduced. Next, a general architecture for the FreeBSD kernel's + synchronization and execution model is laid out. Then, locking + strategies for specific subsystems are discussed, documenting the + approaches taken to introduce fine-grained synchronization and + parallelism for each subsystem. Finally, detailed implementation + notes are provided to motivate design choices, and make the reader + aware of important implications involving the use of specific + primitives. + + + + + Introduction + + This document is a work-in-progress, and will be updated to + reflect on-going design and implementation activities associated + with the SMPng Project. Many sections currently exist only in + outline form, but will be fleshed out as work proceeds. Updates or + suggestions regarding the document may be directed to the document + editors. + + The goal of SMPng is to allow concurrency in the kernel. + The kernel is basically one rather large and complex program. To + make the kernel multithreaded we use some of the same tools used + to make other programs multithreaded. These include mutexes, + reader/writer locks, semaphores, and condition variables. For + definitions of many of the terms, please see + . + + + + Basic Tools and Locking Fundamentals + + + Atomic Instructions and Memory Barriers + + There are several existing treatments of memory barriers + and atomic instructions, so this section will not include a + lot of detail. To put it simply, one cannot go around reading + variables without a lock if a lock is used to protect writes + to that variable. This becomes obvious when you consider that + memory barriers simply determine relative order of memory + operations; they do not make any guarantee about timing of + memory operations. That is, a memory barrier does not force + the contents of a CPU's local cache or store buffer to flush. + Instead, the memory barrier at lock release simply ensures + that all writes to the protected data will be visible to other + CPU's or devices if the write to release the lock is visible. + The CPU is free to keep that data in its cache or store buffer + as long as it wants. However, if another CPU performs an + atomic instruction on the same datum, the first CPU must + guarantee that the updated value is made visible to the second + CPU along with any other operations that memory barriers may + require. + + For example, assuming a simple model where data is + considered visible when it is in main memory (or a global + cache), when an atomic instruction is triggered on one CPU, + other CPU's store buffers and caches must flush any writes to + that same cache line along with any pending operations behind + a memory barrier. + + This requires one to take special care when using an item + protected by atomic instructions. For example, in the sleep + mutex implementation, we have to use an + atomic_cmpset rather than an + atomic_set to turn on the + MTX_CONTESTED bit. The reason is that we + read the value of mtx_lock into a + variable and then make a decision based on that read. + However, the value we read may be stale, or it may change + while we are making our decision. Thus, when the + atomic_set executed, it may end up + setting the bit on another value than the one we made the + decision on. Thus, we have to use an + atomic_cmpset to set the value only if + the value we made the decision on is up-to-date and + valid. + + Finally, atomic instructions only allow one item to be + updated or read. If one needs to atomically update several + items, then a lock must be used instad. For example, if two + counters must be read and have values that are consistent + relative to each other, then those counters must be protected + by a lock rather than by separate atomic instructions. + + + + Read Locks versus Write Locks + + Read locks do not need to be as strong as write locks. + Both types of locks need to ensure that the data they are + accessing is not stale. However, only write access requires + exclusive access. Multiple threads can safely read a value. + Using different types of locks for reads and writes can be + implemented in a number of ways. + + First, sx locks can be used in this manner by using an + exclusive lock when writing and a shared lock when reading. + This method is quite straightforward. + + A second method is a bit more obscure. You can protect a + datum with multiple locks. Then for reading that data you + simply need to have a read lock of one of the locks. However, + to write to the data, you need to have a write lock of all of + the locks. This can make writing rather expensive but can be + useful when data is accessed in various ways. For example, + the parent process pointer is proctected by both the + proctree_lock sx lock and the per-process mutex. Sometimes + the proc lock is easier as we are just checking to see who a + parent of a process is that we already have locked. However, + other places such as inferior need to + walk the tree of processes via parent pointers and locking + each process would be prohibitive as well as a pain to + guarantee that the condition you are checking remains valid + for both the check and the actions taken as a result of the + check. + + + + Locking Conditions and Results + + If you need a lock to check the state of a variable so + that you can take an action based on the state you read, you + can't just hold the lock while reading the variable and then + drop the lock before you act on the value you read. Once you + drop the lock, the variable can change rendering your decision + invalid. Thus, you must hold the lock both while reading the + variable and while performing the action as a result of the + test. + + + + + General Architecture and Design + + + Interrupt Handling + + Following the pattern of several other multithreaded Unix + kernels, FreeBSD deals with interrupt handlers by giving them + their own thread context. Providing a context for interrupt + handlers allows them to block on locks. To help avoid + latency, however, interrupt threads run at real-time kernel + priority. Thus, interrupt handlers should not execute for very + long to avoid starving other kernel threads. In addition, + since multiple handlers may share an interrupt thread, + interrupt handlers should not sleep or use a sleepable lock to + avoid starving another interrupt handler. + + The interrupt threads currently in FreeBSD are referred to + as heavyweight interrupt threads. They are called this + because switching to an interrupt thread involves a full + context switch. In the initial implementation, the kernel was + not preemptive and thus interrupts that interrupted a kernel + thread would have to wait until the kernel thread blocked or + returned to userland before they would have an opportunity to + run. + + To deal with the latency problems, the kernel in FreeBSD + has been made preemptive. Currently, we only preempt a kernel + thread when we release a sleep mutex or when an interrupt + comes in. However, the plan is to make the FreeBSD kernel + fully preemptive as described below. + + Not all interrupt handlers execute in a thread context. + Instead, some handlers execute directly in primary interrupt + context. These interrupt handlers are currently misnamed + fast interrupt handlers since the + INTR_FAST flag used in earlier versions + of the kernel is used to mark these handlers. The only + interrupts which currently use these types of interrupt + handlers are clock interrupts and serial I/O device + interrupts. Since these handlers do not have their own + context, they may not acquire blocking locks and thus may only + use spin mutexes. + + Finally, there is one optional optimization that can be + added in MD code called lightweight context switches. Since + an interrupt thread executes in a kernel context, it can + borrow the vmspace of any process. Thus, in a lightweight + context switch, the switch to the interrupt thread does not + switch vmspaces but borrows the vmspace of the interrupted + thread. In order to ensure that the vmspace of the + interrupted thread doesn't disappear out from under us, the + interrupted thread is not allowed to execute until the + interrupt thread is no longer borrowing its vmspace. This can + happen when the interrupt thread either blocks or finishes. + If an interrupt thread blocks, then it will use its own + context when it is made runnable again. Thus, it can release + the interrupted thread. + + The cons of this optimization are that they are very + machine specific and complex and thus only worth the effor if + their is a large performance improvement. At this point it is + probably too early to tell, and in fact, will probably hurt + performance as almost all interrupt handlers will immediately + block on Giant and require a thread fixup when they block. + Also, an alternative method of interrupt handling has been + proposed by Mike Smith that works like so: + + + + Each interrupt handler has two parts: a predicate + which runs in primary interrupt context and a handler + which runs in its own thread context. + + + + If an interrupt handler has a predicate, then when an + interrupt is triggered, the predicate is run. If the + predicate returns true then the interrupt is assumed to be + fully handled and the kernel returns from the interrupt. + If the predicate returns false or there is no predicate, + then the threaded handler is scheduled to run. + + + + Fitting light weight context switches into this scheme + might prove rather complicated. Since we may want to change + to this scheme at some point in the future, it is probably + best to defer work on light weight context switches until we + have settled on the final interrupt handling architecture and + determined how light weight context switches might or might + not fit into it. + + + + Kernel Preemption and Critical Sections + + + Kernel Preemption in a Nutshell + + Kernel preemption is fairly simple. The basic idea is + that a CPU should always be doing the highest priority work + available. Well, that is the ideal at least. There are a + couple of cases where the expense of achieving the ideal is + not worth being perfect. + + Implementing full kernel preemption is very + straightforward: when you schedule a thread to be executed + by putting it on a runqueue, you check to see if it's + priority is higher than the currently executing thread. If + so, you initiate a context switch to that thread. + + While locks can protect most data in the case of a + preemption, not all of the kernel is preemption safe. For + example, if a thread holding a spin mutex preempted and the + new thread attempts to grab the same spin mutex, the new + thread may spin forever as the interrupted thread may never + get a chance to execute. Also, some code such as the code + to assign an address space number for a process during + exec() on the Alpha needs to not be preempted as it supports + the actual context switch code. Preemption is disabled for + these code sections by using a critical section. + + + + Critical Sections + + The responsibility of the critical section API is to + prevent context switches inside of a critical section. With + a fully preemptive kernel, every + setrunqueue of a thread other than the + current thread is a preemption point. One implementation is + for critical_enter to set a per-thread + flag that is cleared by its counterpart. If + setrunqueue is called with this flag + set, it doesn't preempt regarless of the priority of the new + thread relative to the current thread. However, since + critical sections are used in spin mutexes to prevent + context switches and multiple spin mutexes can be acquired, + the critical section API must support nesting. For this + reason the current implementation uses a nesting count + instead of a single per-thread flag. + + In order to minimize latency, preemptions inside of a + critical section are deferred rather than dropped. If a + thread is made runnable that would normally be preempted to + outside of a critical section, then a per-thread flag is set + to indicate that there is a pending preemption. When the + outermost critical section is exited, the flag is checked. + If the flag is set, then the current thread is preempted to + allow the higher priority thread to run. + + Interrupts pose a problem with regards to spin mutexes. + If a low-level interrupt handler needs a lock, it needs to + not interrupt any code needing that lock to avoid possible + data structure corruption. Currently, providing this + mechanism is piggybacked onto critical section API by means + of the cpu_critical_enter and + cpu_critical_exit functions. Currently + this API disables and reenables interrupts on all of + FreeBSD's current platforms. This approach may not be + purely optimal, but it is simple to understand and simple to + get right. Theoretically, this second API need only be used + for spin mutexes that are used in primary interrupt context. + However, to make the code simpler, it is used for all spin + mutexes and even all critical sections. It may be desirable + to split out the MD API from the MI API and only use it in + conjunction with the MI API in the spin mutex + implementation. If this approach is taken, then the MD API + likely would need a rename to show that it is a separate API + now. + + + + Design Tradeoffs + + As mentioned earlier, a couple of tradeoffs have been + made to sacrafice cases where perfect preemption may not + always provide the best performance. + + The first tradeoff is that the preemption code does not + take other CPUs into account. Suppose we have a two CPU's A + and B with the priority of A's thread as 4 and the priority + of B's thread as 2. If CPU B makes a thread with priority 1 + runnable, then in theory, we want CPU A to switch to the new + thread so that we will be running the two highest priority + runnable threads. However, the cost of determining which + CPU to enforce a preemption on as well as actually signaling + that CPU via an IPI along with the synchronization that + would be required would be enormous. Thus, the current code + would instead force CPU B to switch to the higher priority + thread. Note that this still puts the system in a better + position as CPU B is executing a thread of priority 1 rather + than a thread of priority 2. + + The second tradeoff limits immediate kernel preemption + to real-time priority kernel threads. In the simple case of + preemption defined above, a thread is always preempted + immediately (or as soon as a critical section is exited) if + a higher priority thread is made runnable. However, many + threads executing in the kernel only execute in a kernel + context for a short time before either blocking or returning + to userland. Thus, if the kernel preempts these threads to + run another non-realtime kernel thread, the kernel may + switch out the executing thread just before it is about to + sleep or execute. The cache on the CPU must then adjust to + the new thread. When the kernel returns to the interrupted + CPU, it must refill all the cache informatino that was lost. + In addition, two extra context switches are performed that + could be avoided if the kernel deferred the preemption until + the first thread blocked or returned to userland. Thus, by + default, the preemption code will only preempt immediately + if the higher priority thread is a real-time priority + thread. + + Turning on full kernel preemption for all kernel threads + has value as a debugging aid since it exposes more race + conditions. It is especially useful on UP systems were many + races are hard to simulate otherwise. Thus, there will be a + kernel option to enable preemption for all kernel threads + that can be used for debugging purposes. + + + + + Thread Migration + + Simply put, a thread migrates when it moves from one CPU + to another. In a non-preemptive kernel this can only happen + at well-defined points such as when calling + tsleep or returning to userland. + However, in the preemptive kernel, an interrupt can force a + preemption and possible migration at any time. This can have + negative affects on per-CPU data since with the exception of + curthread and curpcb the + data can change whenever you migrate. Since you can + potentially migrate at any time this renders per-CPU data + rather useless. Thus it is desirable to be able to disable + migration for sections of code that need per-CPU data to be + stable. + + Critical sections currently prevent migration since they + don't allow context switches. However, this may be too strong + of a requirement to enforce in some cases since a critical + section also effectively blocks interrupt threads on the + current processor. As a result, it may be desirable to + provide an API whereby code may indicate that if the current + thread is preempted it should not migrate to another + CPU. + + One possible implementation is to use a per-thread nesting + count td_pinnest along with a + td_pincpu which is updated to the current + CPU on each context switch. Each CPU has its own run queue + that holds threads pinned to that CPU. A thread is pinned + when its nesting count is greater than zero and a thread + starts off unpinned with a nesting count of zero. When a + thread is put on a runqueue, we check to see if it is pinned. + If so, we put it on the per-CPU runqueue, otherwise we put it + on the global runqueue. When + choosethread is called to retrieve the + next thread, it could either always prefer bound threads to + unbound threads or use some sort of bias when comparing + priorities. If the nesting count is only ever written to by + the thread itself and is only read by other threads when the + owning thread is not executing but while holding the + sched_lock, then + td_pinnest will not need any other locks. + The migrate_disable function would + increment the nesting count and + migrate_enable would decrement the + nesting count. Due to the locking requirements specified + above, they will only operate on the current thread and thus + would not need to handle the case of making a thread + migratable that currently resides on a per-CPU run + queue. + + It is still debatable if this API is needed or if the + critical section API is sufficient by itself. Many of the + places that need to prevent migration also need to prevent + preemption as well, and in those places a critical section + must be used regardless. + + + + Callouts + + The timeout() kernel facility permits + kernel services to register funtions for execution as part + of the softclock() software interrupt. + Events are scheduled based on a desired number of clock + ticks, and callbacks to the consumer-provided function + will occur at approximately the right time. + + The global list of pending timeout events is protected + by a global spin mutex, callout_lock; + all access to the timeout list must be performed with this + mutex held. When softclock() is + woken up, it scans the list of pending timeouts for those + that should fire. In order to avoid lock order reversal, + the softclock thread will release the + callout_lock mutex when invoking the + provided timeout() callback function. + If the CALLOUT_MPSAFE flag was not set + during registration, then Giant will be grabbed before + invoking the callout, and then released afterwards. The + callout_lock mutex will be re-grabbed + before proceeding. The softclock() + code is careful to leave the list in a consistent state + while releasing the mutex. If DIAGNOSTIC + is enabled, then the time taken to execute each function is + measured, and a warning generated if it exceeds a + threshold. + + + + + Specific Locking Strategies + + + Credentials + + struct ucred is the system + internal credential structure, and is generally used as the + basis for process-driven access control. BSD-derived systems + use a "copy-on-write" model for credential data: multiple + references may exist for a credential structure, and when a + change needs to be made, the structure is duplicated, + modified, and then the reference replaced. Due to wide-spread + caching of the credential to implement access control on open, + this results in substantial memory savings. With a move to + fine-grained SMP, this model also saves substantially on + locking operations by requiring that modification only occur + on an unshared credential, avoiding the need for explicit + synchronization when consuming a known-shared + credential. + + Credential structures with a single reference are + considered mutable; shared credential structures must not be + modified or a race condition is risked. A mutex, + cr_mtxp protects the reference + count of the struct ucred so as to + maintain consistency. Any use of the structure requires a + valid reference for the duration of the use, or the structure + may be released out from under the illegitimate + consumer. + + The struct ucred mutex is a leaf + mutex, and for performance reasons, is implemented via a mutex + pool. + + + + File Descriptors and File Descriptor Tables + + Details to follow. + + + + Jail Structures + + struct prison stores + administrative details pertinent to the maintenance of jails + created using the &man.jail.2; API. This includes the + per-jail hostname, IP address, and related settings. This + structure is reference-counted since pointers to instances of + the structure are shared by many credential structures. A + single mutex, pr_mtx protects read + and write access to the reference count and all mutable + variables inside the struct jail. Some variables are set only + when the jail is created, and a valid reference to the + struct prison is sufficient to read + these values. The precise locking of each entry is documented + via comments in jail.h. + + + + MAC Framework + + The TrustedBSD MAC Framework maintains data in a variety + of kernel objects, in the form of struct + label. In general, labels in kernel objects + are protected by the same lock as the remainder of the kernel + object. For example, the v_label + label in struct vnode is protected + by the vnode lock on the vnode. + + In addition to labels maintained in standard kernel objects, + the MAC Framework also maintains a list of registered and + active policies. The policy list is protected by a global + mutex (mac_policy_list_lock) and a busy + count (also protected by the mutex). Since many access + control checks may occur in parallel, entry to the framework + for a read-only access to the policy list requires holding the + mutex while incrementing (and later decrementing) the busy + count. The mutex need not be held for the duration of the + MAC entry operation--some operations, such as label operations + on file system objects--are long-lived. To modify the policy + list, such as during policy registration and deregistration, + the mutex must be held and the reference count must be zero, + to prevent modification of the list while it is in use. + + A condition variable, + mac_policy_list_not_busy, is available to + threads that need to wait for the list to become unbusy, but + this condition variable must only be waited on if the caller is + holding no other locks, or a lock order violation may be + possible. The busy count, in effect, acts as a form of + reader/writer lock over access to the framework: the difference + is that, unlike with an sxlock, consumers waiting for the list + to become unbusy may be starved, rather than permitting lock + order problems with regards to the busy count and other locks + that may be held on entry to (or inside) the MAC Framework. + + + + Modules + + For the module subsystem there exists a single lock that is + used to protect the shared data. This lock is a shared/exclusive + (SX) lock and has a good chance of needing to be acquired (shared + or exclusively), therefore there are a few macros that have been + added to make access to the lock more easy. These macros can be + located in sys/module.h and are quite basic + in terms of usage. The main structures protected under this lock + are the module_t structures (when shared) + and the global modulelist_t structure, + modules. One should review the related source code in + kern/kern_module.c to further understand the + locking strategy. + + + + Newbus Device Tree + + The newbus system will have one sx lock. Readers will + lock it &man.sx.slock.9; and writers will lock it + &man.sx.xlock.9;. Internal only functions will not do locking + at all. The externally visable ones will lock as needed. + Those items that don't matter if the race is won or lost will + not be locked, since they tend to be read all over the place + (eg &man.device.get.softc.9;). There will be relatively few + changes to the newbus datastructures, so a single lock should + be sufficient and not impose a performance penalty. + + + + Pipes + + ... + + + + Processes and Threads + + - process hiearachy + - proc locks, references + - thread-specific copies of proc entries to freeze during system + calls, including td_ucred + - inter-process operations + - process groups and sessions + + + + Scheduler + + Lots of references to sched_lock and notes + pointing at specific primitives and related magic elsewhere in the + document. + + + + Select and Poll + + The select() and poll() functions permit threads to block + waiting on events on file descriptors--most frequently, whether + or not the file descriptors are readable or writable. + + ... + + + + SIGIO + + The SIGIO service permits processes to request the delivery + of a SIGIO signal to its process group when the read/write status + of specified file descriptors changes. At most one process or + process group is permitted to register for SIGIO from any given + kernel object, and that process or group is referred to as + the owner. Each object supporting SIGIO registration contains + pointer field that is NULL if the object is not registered, or + points to a struct sigio describing + the registration. This field is protected by a global mutex, + sigio_lock. Callers to SIGIO maintenance + functions must pass in this field "by reference" so that local + register copies of the field are not made when unprotected by + the lock. + + One struct sigio is allocated for + each registered object associated with any process or process + group, and contains back-pointers to the object, owner, signal + information, a credential, and the general disposition of the + registration. Each process or progress group contains a list of + registered struct sigio structures, + p_sigiolst for processes, and + pg_sigiolst for process groups. + These lists are protected by the process or process group + locks respectively. Most fields in each struct + sigio are constant for the duration of the + registration, with the exception of the + sio_pgsigio field which links the + struct sigio into the process or + process group list. Developers implementing new kernel + objects supporting SIGIO will, in general, want to avoid + holding structure locks while invoking SIGIO supporting + functions, such as fsetown() + or funsetown() to avoid + defining a lock order between structure locks and the global + SIGIO lock. This is generally possible through use of an + elevated reference count on the structure, such as reliance + on a file descriptor reference to a pipe during a pipe + operation. + + + + sysctl + + The sysctl() MIB service is invoked + from both within the kernel and from userland applications + using a system call. At least two issues are raised in locking: + first, the protection of the structures maintaining the + namespace, and second, interactions with kernel variables and + functions that are accessed by the sysctl interface. Since + sysctl permits the direct export (and modification) of + kernel statistics and configuration parameters, the sysctl + mechanism must become aware of appropriate locking semantics + for those variables. Currently, sysctl makes use of a + single global sxlock to serialize use + of sysctl(); however, it is assumed to operate under Giant + and other protections are not provided. The remainder of + this section speculates on locking and semantic changes + to sysctl. + + - Need to change the order of operations for sysctl's that + update values from read old, copyin and copyout, write new to + copyin, lock, read old and write new, unlock, copyout. Normal + sysctl's that just copyout the old value and set a new value + that they copyin may still be able to follow the old model. + However, it may be cleaner to use the second model for all of + the sysctl handlers to avoid lock operations. + + - To allow for the common case, a sysctl could embed a + pointer to a mutex in the SYSCTL_FOO macros and in the struct. + This would work for most sysctls. For values protected by sx + locks, spin mutexes, or other locking strategies besides a + single sleep mutex, SYSCTL_PROC nodes could be used to get the + locking right. + + + + Taskqueue + + The taskqueue's interface has two basic locks associated + with it in order to protect the related shared data. The + taskqueue_queues_mutex is meant to serve as a + lock to protect the taskqueue_queues TAILQ. + The other mutex lock associated with this system is the one in the + struct taskqueue data structure. The + use of the synchronization primitive here is to protect the + integrity of the data in the struct + taskqueue. It should be noted that there are no + separate macros to assist the user in locking down his/her own work + since these locks are most likely not going to be used outside of + kern/subr_taskqueue.c. + + + + + Implementation Notes + + + Details of the Mutex Implementation + + - Should we require mutexes to be owned for mtx_destroy() + since we can't safely assert that they are unowned by anyone + else otherwise? + + + Spin Mutexes + + - Use a critical section... + + + + Sleep Mutexes + + - Describe the races with contested mutexes + + - Why it's safe to read mtx_lock of a contested mutex + when holding sched_lock. + + - Priority propagation + + + + + Witesss + + - What does it do + + - How does it work + + + + + Miscellaneous Topics + + + Interrupt Source and ICU Abstractions + + - struct isrc + + - pic drivers + + + + Other Random Questions/Topics + + Should we pass an interlock into + sema_wait? + + - Generic turnstiles for sleep mutexes and sx locks. + + - Should we have non-sleepable sx locks? + + + + + Definitions + + + atomic + + An operation is atomic if all of its effects are visible + to other CPUs together when the proper access protocol is + followed. In the degenerate case are atomic instructions + provided directly by machine architectures. At a higher + level, if several members of a structure are protected by a + lock, then a set of operations are atomic if they are all + performed while holding the lock without releasing the lock + in between any of the operations. + + operation + + + + + block + + A thread is blocked when it is waiting on a lock, + resource, or condition. Unfortunately this term is a bit + overloaded as a result. + + sleep + + + + + critical section + + A section of code that is not allowed to be preempted. + A critical section is entered and exited using the + &man.critical.enter.9; API. + + + + + MD + + Machine depenedent. + + MI + + + + + memory operation + + A memory operation reads and/or writes to a memory + location. + + + + + MI + + Machine indepenedent. + + MD + + + + + operation + memory operation + + + + primary interrupt context + + Primary interrupt context refers to the code that runs + when an interrupt occurs. This code can either run an + interrupt handler directly or schedule an asynchronous + interrupt thread to execute the interrupt handlers for a + given interrupt source. + + + + + realtime kernel thread + + A high priority kernel thread. Currently, the only + realtime priority kernel threads are interrupt threads. + + thread + + + + + sleep + + A thread is asleep when it is blocked on a condition + variable or a sleep queue via msleep or + tsleep. + + block + + + + + sleepable lock + + A sleepable lock is a lock that can be held by a thread + which is asleep. Lockmgr locks and sx locks are currently + the only sleepable locks in FreeBSD. Eventually, some sx + locks such as the allproc and proctree locks may become + non-sleepable locks. + + sleep + + + + + thread + + A kernel thread represented by a struct thread. Threads own + locks and hold a single execution context. + + + +