o Split the FreeBSD Developers Handbook into two books:
- FreeBSD Architecture Handbook, which is a book about the FreeBSD architecture. The SMP article have also been moved into the new arch handbook as a separate chapter. - FreeBSD Developers Handbook, which is a book about developing on FreeBSD; basically what was left when the architecture parts was moved away. o Hook up the new FreeBSD Architecture Handbook to the build. o Remove the SMP article since it is now part of the FreeBSD Architecture Handbook. The relevant files from the FreeBSD Developers Handbook have been repository copied to the new FreeBSD Architecture Handbook. This is just step one in the split, both books need some work to be real seperate books. E.g. the FreeBSD Architecture Handbook still needs an introduction. Repository copy by: joe Requested by: rwatson Approved by: murray, ceri (mentor)
This commit is contained in:
parent
5421dc303f
commit
541f5ec33d
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/head/; revision=17783
26 changed files with 55 additions and 19061 deletions
en_US.ISO8859-1
articles/smp
books
Makefile
arch-handbook
developers-handbook
Makefilebook.sgml
boot
chapters.entdriverbasics
isa
jail
kobj
locking
mac.entmac
newbus
pccard
pci
scsi
sound
sysinit
usb
vm
|
@ -1,18 +0,0 @@
|
|||
# $FreeBSD$
|
||||
|
||||
MAINTAINER=jhb@FreeBSD.org
|
||||
|
||||
DOC?= article
|
||||
|
||||
FORMATS?= html
|
||||
|
||||
INSTALL_COMPRESSED?=gz
|
||||
INSTALL_ONLY_COMPRESSED?=
|
||||
|
||||
WITH_ARTICLE_TOC?=YES
|
||||
|
||||
SRCS= article.sgml
|
||||
|
||||
DOC_PREFIX?= ${.CURDIR}/../../..
|
||||
|
||||
.include "${DOC_PREFIX}/share/mk/doc.project.mk"
|
|
@ -1,959 +0,0 @@
|
|||
<!DOCTYPE article PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN" [
|
||||
<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
|
||||
%man;
|
||||
|
||||
<!ENTITY % authors PUBLIC "-//FreeBSD//ENTITIES DocBook Author Entities//EN">
|
||||
%authors;
|
||||
<!ENTITY % misc PUBLIC "-//FreeBSD//ENTITIES DocBook Miscellaneous FreeBSD Entities//EN">
|
||||
%misc;
|
||||
<!ENTITY % freebsd PUBLIC "-//FreeBSD//ENTITIES DocBook Miscellaneous FreeBSD Entities//EN">
|
||||
%freebsd;
|
||||
|
||||
<!--ENTITY % mailing-lists PUBLIC "-//FreeBSD//ENTITIES DocBook Mailing List Entities//EN"-->
|
||||
<!--
|
||||
%mailing-lists;
|
||||
-->
|
||||
|
||||
]>
|
||||
|
||||
<article>
|
||||
<articleinfo>
|
||||
<title>SMPng Design Document</title>
|
||||
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>John</firstname>
|
||||
<surname>Baldwin</surname>
|
||||
</author>
|
||||
<author>
|
||||
<firstname>Robert</firstname>
|
||||
<surname>Watson</surname>
|
||||
</author>
|
||||
</authorgroup>
|
||||
|
||||
<pubdate>$FreeBSD$</pubdate>
|
||||
|
||||
<copyright>
|
||||
<year>2002</year>
|
||||
<year>2003</year>
|
||||
<holder>John Baldwin</holder>
|
||||
<holder>Robert Watson</holder>
|
||||
</copyright>
|
||||
|
||||
<abstract>
|
||||
<para>This document presents the current design and implementation of
|
||||
the SMPng Architecture. First, the basic primitives and tools are
|
||||
introduced. Next, a general architecture for the FreeBSD kernel's
|
||||
synchronization and execution model is laid out. Then, locking
|
||||
strategies for specific subsystems are discussed, documenting the
|
||||
approaches taken to introduce fine-grained synchronization and
|
||||
parallelism for each subsystem. Finally, detailed implementation
|
||||
notes are provided to motivate design choices, and make the reader
|
||||
aware of important implications involving the use of specific
|
||||
primitives. </para>
|
||||
</abstract>
|
||||
</articleinfo>
|
||||
|
||||
<sect1>
|
||||
<title>Introduction</title>
|
||||
|
||||
<para>This document is a work-in-progress, and will be updated to
|
||||
reflect on-going design and implementation activities associated
|
||||
with the SMPng Project. Many sections currently exist only in
|
||||
outline form, but will be fleshed out as work proceeds. Updates or
|
||||
suggestions regarding the document may be directed to the document
|
||||
editors.</para>
|
||||
|
||||
<para>The goal of SMPng is to allow concurrency in the kernel.
|
||||
The kernel is basically one rather large and complex program. To
|
||||
make the kernel multi-threaded we use some of the same tools used
|
||||
to make other programs multi-threaded. These include mutexes,
|
||||
shared/exclusive locks, semaphores, and condition variables. For
|
||||
the definitions of these and other SMP-related terms, please see
|
||||
the <xref linkend="glossary"> section of this article.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Basic Tools and Locking Fundamentals</title>
|
||||
|
||||
<sect2>
|
||||
<title>Atomic Instructions and Memory Barriers</title>
|
||||
|
||||
<para>There are several existing treatments of memory barriers
|
||||
and atomic instructions, so this section will not include a
|
||||
lot of detail. To put it simply, one can not go around reading
|
||||
variables without a lock if a lock is used to protect writes
|
||||
to that variable. This becomes obvious when you consider that
|
||||
memory barriers simply determine relative order of memory
|
||||
operations; they do not make any guarantee about timing of
|
||||
memory operations. That is, a memory barrier does not force
|
||||
the contents of a CPU's local cache or store buffer to flush.
|
||||
Instead, the memory barrier at lock release simply ensures
|
||||
that all writes to the protected data will be visible to other
|
||||
CPU's or devices if the write to release the lock is visible.
|
||||
The CPU is free to keep that data in its cache or store buffer
|
||||
as long as it wants. However, if another CPU performs an
|
||||
atomic instruction on the same datum, the first CPU must
|
||||
guarantee that the updated value is made visible to the second
|
||||
CPU along with any other operations that memory barriers may
|
||||
require.</para>
|
||||
|
||||
<para>For example, assuming a simple model where data is
|
||||
considered visible when it is in main memory (or a global
|
||||
cache), when an atomic instruction is triggered on one CPU,
|
||||
other CPU's store buffers and caches must flush any writes to
|
||||
that same cache line along with any pending operations behind
|
||||
a memory barrier.</para>
|
||||
|
||||
<para>This requires one to take special care when using an item
|
||||
protected by atomic instructions. For example, in the sleep
|
||||
mutex implementation, we have to use an
|
||||
<function>atomic_cmpset</function> rather than an
|
||||
<function>atomic_set</function> to turn on the
|
||||
<constant>MTX_CONTESTED</constant> bit. The reason is that we
|
||||
read the value of <structfield>mtx_lock</structfield> into a
|
||||
variable and then make a decision based on that read.
|
||||
However, the value we read may be stale, or it may change
|
||||
while we are making our decision. Thus, when the
|
||||
<function>atomic_set</function> executed, it may end up
|
||||
setting the bit on another value than the one we made the
|
||||
decision on. Thus, we have to use an
|
||||
<function>atomic_cmpset</function> to set the value only if
|
||||
the value we made the decision on is up-to-date and
|
||||
valid.</para>
|
||||
|
||||
<para>Finally, atomic instructions only allow one item to be
|
||||
updated or read. If one needs to atomically update several
|
||||
items, then a lock must be used instead. For example, if two
|
||||
counters must be read and have values that are consistent
|
||||
relative to each other, then those counters must be protected
|
||||
by a lock rather than by separate atomic instructions.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Read Locks versus Write Locks</title>
|
||||
|
||||
<para>Read locks do not need to be as strong as write locks.
|
||||
Both types of locks need to ensure that the data they are
|
||||
accessing is not stale. However, only write access requires
|
||||
exclusive access. Multiple threads can safely read a value.
|
||||
Using different types of locks for reads and writes can be
|
||||
implemented in a number of ways.</para>
|
||||
|
||||
<para>First, sx locks can be used in this manner by using an
|
||||
exclusive lock when writing and a shared lock when reading.
|
||||
This method is quite straightforward.</para>
|
||||
|
||||
<para>A second method is a bit more obscure. You can protect a
|
||||
datum with multiple locks. Then for reading that data you
|
||||
simply need to have a read lock of one of the locks. However,
|
||||
to write to the data, you need to have a write lock of all of
|
||||
the locks. This can make writing rather expensive but can be
|
||||
useful when data is accessed in various ways. For example,
|
||||
the parent process pointer is protected by both the
|
||||
proctree_lock sx lock and the per-process mutex. Sometimes
|
||||
the proc lock is easier as we are just checking to see who a
|
||||
parent of a process is that we already have locked. However,
|
||||
other places such as <function>inferior</function> need to
|
||||
walk the tree of processes via parent pointers and locking
|
||||
each process would be prohibitive as well as a pain to
|
||||
guarantee that the condition you are checking remains valid
|
||||
for both the check and the actions taken as a result of the
|
||||
check.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Locking Conditions and Results</title>
|
||||
|
||||
<para>If you need a lock to check the state of a variable so
|
||||
that you can take an action based on the state you read, you
|
||||
can not just hold the lock while reading the variable and then
|
||||
drop the lock before you act on the value you read. Once you
|
||||
drop the lock, the variable can change rendering your decision
|
||||
invalid. Thus, you must hold the lock both while reading the
|
||||
variable and while performing the action as a result of the
|
||||
test.</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>General Architecture and Design</title>
|
||||
|
||||
<sect2>
|
||||
<title>Interrupt Handling</title>
|
||||
|
||||
<para>Following the pattern of several other multi-threaded &unix;
|
||||
kernels, FreeBSD deals with interrupt handlers by giving them
|
||||
their own thread context. Providing a context for interrupt
|
||||
handlers allows them to block on locks. To help avoid
|
||||
latency, however, interrupt threads run at real-time kernel
|
||||
priority. Thus, interrupt handlers should not execute for very
|
||||
long to avoid starving other kernel threads. In addition,
|
||||
since multiple handlers may share an interrupt thread,
|
||||
interrupt handlers should not sleep or use a sleepable lock to
|
||||
avoid starving another interrupt handler.</para>
|
||||
|
||||
<para>The interrupt threads currently in FreeBSD are referred to
|
||||
as heavyweight interrupt threads. They are called this
|
||||
because switching to an interrupt thread involves a full
|
||||
context switch. In the initial implementation, the kernel was
|
||||
not preemptive and thus interrupts that interrupted a kernel
|
||||
thread would have to wait until the kernel thread blocked or
|
||||
returned to userland before they would have an opportunity to
|
||||
run.</para>
|
||||
|
||||
<para>To deal with the latency problems, the kernel in FreeBSD
|
||||
has been made preemptive. Currently, we only preempt a kernel
|
||||
thread when we release a sleep mutex or when an interrupt
|
||||
comes in. However, the plan is to make the FreeBSD kernel
|
||||
fully preemptive as described below.</para>
|
||||
|
||||
<para>Not all interrupt handlers execute in a thread context.
|
||||
Instead, some handlers execute directly in primary interrupt
|
||||
context. These interrupt handlers are currently misnamed
|
||||
<quote>fast</quote> interrupt handlers since the
|
||||
<constant>INTR_FAST</constant> flag used in earlier versions
|
||||
of the kernel is used to mark these handlers. The only
|
||||
interrupts which currently use these types of interrupt
|
||||
handlers are clock interrupts and serial I/O device
|
||||
interrupts. Since these handlers do not have their own
|
||||
context, they may not acquire blocking locks and thus may only
|
||||
use spin mutexes.</para>
|
||||
|
||||
<para>Finally, there is one optional optimization that can be
|
||||
added in MD code called lightweight context switches. Since
|
||||
an interrupt thread executes in a kernel context, it can
|
||||
borrow the vmspace of any process. Thus, in a lightweight
|
||||
context switch, the switch to the interrupt thread does not
|
||||
switch vmspaces but borrows the vmspace of the interrupted
|
||||
thread. In order to ensure that the vmspace of the
|
||||
interrupted thread does not disappear out from under us, the
|
||||
interrupted thread is not allowed to execute until the
|
||||
interrupt thread is no longer borrowing its vmspace. This can
|
||||
happen when the interrupt thread either blocks or finishes.
|
||||
If an interrupt thread blocks, then it will use its own
|
||||
context when it is made runnable again. Thus, it can release
|
||||
the interrupted thread.</para>
|
||||
|
||||
<para>The cons of this optimization are that they are very
|
||||
machine specific and complex and thus only worth the effort if
|
||||
their is a large performance improvement. At this point it is
|
||||
probably too early to tell, and in fact, will probably hurt
|
||||
performance as almost all interrupt handlers will immediately
|
||||
block on Giant and require a thread fix-up when they block.
|
||||
Also, an alternative method of interrupt handling has been
|
||||
proposed by Mike Smith that works like so:</para>
|
||||
|
||||
<orderedlist>
|
||||
<listitem>
|
||||
<para>Each interrupt handler has two parts: a predicate
|
||||
which runs in primary interrupt context and a handler
|
||||
which runs in its own thread context.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>If an interrupt handler has a predicate, then when an
|
||||
interrupt is triggered, the predicate is run. If the
|
||||
predicate returns true then the interrupt is assumed to be
|
||||
fully handled and the kernel returns from the interrupt.
|
||||
If the predicate returns false or there is no predicate,
|
||||
then the threaded handler is scheduled to run.</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
|
||||
<para>Fitting light weight context switches into this scheme
|
||||
might prove rather complicated. Since we may want to change
|
||||
to this scheme at some point in the future, it is probably
|
||||
best to defer work on light weight context switches until we
|
||||
have settled on the final interrupt handling architecture and
|
||||
determined how light weight context switches might or might
|
||||
not fit into it.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Kernel Preemption and Critical Sections</title>
|
||||
|
||||
<sect3>
|
||||
<title>Kernel Preemption in a Nutshell</title>
|
||||
|
||||
<para>Kernel preemption is fairly simple. The basic idea is
|
||||
that a CPU should always be doing the highest priority work
|
||||
available. Well, that is the ideal at least. There are a
|
||||
couple of cases where the expense of achieving the ideal is
|
||||
not worth being perfect.</para>
|
||||
|
||||
<para>Implementing full kernel preemption is very
|
||||
straightforward: when you schedule a thread to be executed
|
||||
by putting it on a runqueue, you check to see if it's
|
||||
priority is higher than the currently executing thread. If
|
||||
so, you initiate a context switch to that thread.</para>
|
||||
|
||||
<para>While locks can protect most data in the case of a
|
||||
preemption, not all of the kernel is preemption safe. For
|
||||
example, if a thread holding a spin mutex preempted and the
|
||||
new thread attempts to grab the same spin mutex, the new
|
||||
thread may spin forever as the interrupted thread may never
|
||||
get a chance to execute. Also, some code such as the code
|
||||
to assign an address space number for a process during
|
||||
exec() on the Alpha needs to not be preempted as it supports
|
||||
the actual context switch code. Preemption is disabled for
|
||||
these code sections by using a critical section.</para>
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>Critical Sections</title>
|
||||
|
||||
<para>The responsibility of the critical section API is to
|
||||
prevent context switches inside of a critical section. With
|
||||
a fully preemptive kernel, every
|
||||
<function>setrunqueue</function> of a thread other than the
|
||||
current thread is a preemption point. One implementation is
|
||||
for <function>critical_enter</function> to set a per-thread
|
||||
flag that is cleared by its counterpart. If
|
||||
<function>setrunqueue</function> is called with this flag
|
||||
set, it does not preempt regardless of the priority of the new
|
||||
thread relative to the current thread. However, since
|
||||
critical sections are used in spin mutexes to prevent
|
||||
context switches and multiple spin mutexes can be acquired,
|
||||
the critical section API must support nesting. For this
|
||||
reason the current implementation uses a nesting count
|
||||
instead of a single per-thread flag.</para>
|
||||
|
||||
<para>In order to minimize latency, preemptions inside of a
|
||||
critical section are deferred rather than dropped. If a
|
||||
thread is made runnable that would normally be preempted to
|
||||
outside of a critical section, then a per-thread flag is set
|
||||
to indicate that there is a pending preemption. When the
|
||||
outermost critical section is exited, the flag is checked.
|
||||
If the flag is set, then the current thread is preempted to
|
||||
allow the higher priority thread to run.</para>
|
||||
|
||||
<para>Interrupts pose a problem with regards to spin mutexes.
|
||||
If a low-level interrupt handler needs a lock, it needs to
|
||||
not interrupt any code needing that lock to avoid possible
|
||||
data structure corruption. Currently, providing this
|
||||
mechanism is piggybacked onto critical section API by means
|
||||
of the <function>cpu_critical_enter</function> and
|
||||
<function>cpu_critical_exit</function> functions. Currently
|
||||
this API disables and re-enables interrupts on all of
|
||||
FreeBSD's current platforms. This approach may not be
|
||||
purely optimal, but it is simple to understand and simple to
|
||||
get right. Theoretically, this second API need only be used
|
||||
for spin mutexes that are used in primary interrupt context.
|
||||
However, to make the code simpler, it is used for all spin
|
||||
mutexes and even all critical sections. It may be desirable
|
||||
to split out the MD API from the MI API and only use it in
|
||||
conjunction with the MI API in the spin mutex
|
||||
implementation. If this approach is taken, then the MD API
|
||||
likely would need a rename to show that it is a separate API
|
||||
now.</para>
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>Design Tradeoffs</title>
|
||||
|
||||
<para>As mentioned earlier, a couple of trade-offs have been
|
||||
made to sacrifice cases where perfect preemption may not
|
||||
always provide the best performance.</para>
|
||||
|
||||
<para>The first trade-off is that the preemption code does not
|
||||
take other CPUs into account. Suppose we have a two CPU's A
|
||||
and B with the priority of A's thread as 4 and the priority
|
||||
of B's thread as 2. If CPU B makes a thread with priority 1
|
||||
runnable, then in theory, we want CPU A to switch to the new
|
||||
thread so that we will be running the two highest priority
|
||||
runnable threads. However, the cost of determining which
|
||||
CPU to enforce a preemption on as well as actually signaling
|
||||
that CPU via an IPI along with the synchronization that
|
||||
would be required would be enormous. Thus, the current code
|
||||
would instead force CPU B to switch to the higher priority
|
||||
thread. Note that this still puts the system in a better
|
||||
position as CPU B is executing a thread of priority 1 rather
|
||||
than a thread of priority 2.</para>
|
||||
|
||||
<para>The second trade-off limits immediate kernel preemption
|
||||
to real-time priority kernel threads. In the simple case of
|
||||
preemption defined above, a thread is always preempted
|
||||
immediately (or as soon as a critical section is exited) if
|
||||
a higher priority thread is made runnable. However, many
|
||||
threads executing in the kernel only execute in a kernel
|
||||
context for a short time before either blocking or returning
|
||||
to userland. Thus, if the kernel preempts these threads to
|
||||
run another non-realtime kernel thread, the kernel may
|
||||
switch out the executing thread just before it is about to
|
||||
sleep or execute. The cache on the CPU must then adjust to
|
||||
the new thread. When the kernel returns to the interrupted
|
||||
CPU, it must refill all the cache information that was lost.
|
||||
In addition, two extra context switches are performed that
|
||||
could be avoided if the kernel deferred the preemption until
|
||||
the first thread blocked or returned to userland. Thus, by
|
||||
default, the preemption code will only preempt immediately
|
||||
if the higher priority thread is a real-time priority
|
||||
thread.</para>
|
||||
|
||||
<para>Turning on full kernel preemption for all kernel threads
|
||||
has value as a debugging aid since it exposes more race
|
||||
conditions. It is especially useful on UP systems were many
|
||||
races are hard to simulate otherwise. Thus, there will be a
|
||||
kernel option to enable preemption for all kernel threads
|
||||
that can be used for debugging purposes.</para>
|
||||
</sect3>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Thread Migration</title>
|
||||
|
||||
<para>Simply put, a thread migrates when it moves from one CPU
|
||||
to another. In a non-preemptive kernel this can only happen
|
||||
at well-defined points such as when calling
|
||||
<function>tsleep</function> or returning to userland.
|
||||
However, in the preemptive kernel, an interrupt can force a
|
||||
preemption and possible migration at any time. This can have
|
||||
negative affects on per-CPU data since with the exception of
|
||||
<varname>curthread</varname> and <varname>curpcb</varname> the
|
||||
data can change whenever you migrate. Since you can
|
||||
potentially migrate at any time this renders per-CPU data
|
||||
rather useless. Thus it is desirable to be able to disable
|
||||
migration for sections of code that need per-CPU data to be
|
||||
stable.</para>
|
||||
|
||||
<para>Critical sections currently prevent migration since they
|
||||
do not allow context switches. However, this may be too strong
|
||||
of a requirement to enforce in some cases since a critical
|
||||
section also effectively blocks interrupt threads on the
|
||||
current processor. As a result, it may be desirable to
|
||||
provide an API whereby code may indicate that if the current
|
||||
thread is preempted it should not migrate to another
|
||||
CPU.</para>
|
||||
|
||||
<para>One possible implementation is to use a per-thread nesting
|
||||
count <varname>td_pinnest</varname> along with a
|
||||
<varname>td_pincpu</varname> which is updated to the current
|
||||
CPU on each context switch. Each CPU has its own run queue
|
||||
that holds threads pinned to that CPU. A thread is pinned
|
||||
when its nesting count is greater than zero and a thread
|
||||
starts off unpinned with a nesting count of zero. When a
|
||||
thread is put on a runqueue, we check to see if it is pinned.
|
||||
If so, we put it on the per-CPU runqueue, otherwise we put it
|
||||
on the global runqueue. When
|
||||
<function>choosethread</function> is called to retrieve the
|
||||
next thread, it could either always prefer bound threads to
|
||||
unbound threads or use some sort of bias when comparing
|
||||
priorities. If the nesting count is only ever written to by
|
||||
the thread itself and is only read by other threads when the
|
||||
owning thread is not executing but while holding the
|
||||
<varname>sched_lock</varname>, then
|
||||
<varname>td_pinnest</varname> will not need any other locks.
|
||||
The <function>migrate_disable</function> function would
|
||||
increment the nesting count and
|
||||
<function>migrate_enable</function> would decrement the
|
||||
nesting count. Due to the locking requirements specified
|
||||
above, they will only operate on the current thread and thus
|
||||
would not need to handle the case of making a thread
|
||||
migrateable that currently resides on a per-CPU run
|
||||
queue.</para>
|
||||
|
||||
<para>It is still debatable if this API is needed or if the
|
||||
critical section API is sufficient by itself. Many of the
|
||||
places that need to prevent migration also need to prevent
|
||||
preemption as well, and in those places a critical section
|
||||
must be used regardless.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Callouts</title>
|
||||
|
||||
<para>The <function>timeout()</function> kernel facility permits
|
||||
kernel services to register functions for execution as part
|
||||
of the <function>softclock()</function> software interrupt.
|
||||
Events are scheduled based on a desired number of clock
|
||||
ticks, and callbacks to the consumer-provided function
|
||||
will occur at approximately the right time.</para>
|
||||
|
||||
<para>The global list of pending timeout events is protected
|
||||
by a global spin mutex, <varname>callout_lock</varname>;
|
||||
all access to the timeout list must be performed with this
|
||||
mutex held. When <function>softclock()</function> is
|
||||
woken up, it scans the list of pending timeouts for those
|
||||
that should fire. In order to avoid lock order reversal,
|
||||
the <function>softclock</function> thread will release the
|
||||
<varname>callout_lock</varname> mutex when invoking the
|
||||
provided <function>timeout()</function> callback function.
|
||||
If the <constant>CALLOUT_MPSAFE</constant> flag was not set
|
||||
during registration, then Giant will be grabbed before
|
||||
invoking the callout, and then released afterwards. The
|
||||
<varname>callout_lock</varname> mutex will be re-grabbed
|
||||
before proceeding. The <function>softclock()</function>
|
||||
code is careful to leave the list in a consistent state
|
||||
while releasing the mutex. If <constant>DIAGNOSTIC</constant>
|
||||
is enabled, then the time taken to execute each function is
|
||||
measured, and a warning generated if it exceeds a
|
||||
threshold.</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Specific Locking Strategies</title>
|
||||
|
||||
<sect2>
|
||||
<title>Credentials</title>
|
||||
|
||||
<para><structname>struct ucred</structname> is the kernel's
|
||||
internal credential structure, and is generally used as the
|
||||
basis for process-driven access control within the kernel.
|
||||
BSD-derived systems use a <quote>copy-on-write</quote> model for credential
|
||||
data: multiple references may exist for a credential structure,
|
||||
and when a change needs to be made, the structure is duplicated,
|
||||
modified, and then the reference replaced. Due to wide-spread
|
||||
caching of the credential to implement access control on open,
|
||||
this results in substantial memory savings. With a move to
|
||||
fine-grained SMP, this model also saves substantially on
|
||||
locking operations by requiring that modification only occur
|
||||
on an unshared credential, avoiding the need for explicit
|
||||
synchronization when consuming a known-shared
|
||||
credential.</para>
|
||||
|
||||
<para>Credential structures with a single reference are
|
||||
considered mutable; shared credential structures must not be
|
||||
modified or a race condition is risked. A mutex,
|
||||
<structfield>cr_mtxp</structfield> protects the reference
|
||||
count of <structname>struct ucred</structname> so as to
|
||||
maintain consistency. Any use of the structure requires a
|
||||
valid reference for the duration of the use, or the structure
|
||||
may be released out from under the illegitimate
|
||||
consumer.</para>
|
||||
|
||||
<para>The <structname>struct ucred</structname> mutex is a leaf
|
||||
mutex, and for performance reasons, is implemented via a mutex
|
||||
pool.</para>
|
||||
|
||||
<para>Usually, credentials are used in a read-only manner for access
|
||||
control decisions, and in this case <structfield>td_ucred</structfield>
|
||||
is generally preferred because it requires no locking. When a
|
||||
process' credential is updated the <literal>proc</literal> lock
|
||||
must be held across the check and update operations thus avoid
|
||||
races. The process credential <structfield>p_ucred</structfield>
|
||||
must be used for check and update operations to prevent
|
||||
time-of-check, time-of-use races.</para>
|
||||
|
||||
<para>If system call invocations will perform access control after
|
||||
an update to the process credential, the value of
|
||||
<structfield>td_ucred</structfield> must also be refreshed to
|
||||
the current process value. This will prevent use of a stale
|
||||
credential following a change. The kernel automatically
|
||||
refreshes the <structfield>td_ucred</structfield> pointer in
|
||||
the thread structure from the process
|
||||
<structfield>p_ucred</structfield> whenever a process enters
|
||||
the kernel, permitting use of a fresh credential for kernel
|
||||
access control.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>File Descriptors and File Descriptor Tables</title>
|
||||
|
||||
<para>Details to follow.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Jail Structures</title>
|
||||
|
||||
<para><structname>struct prison</structname> stores
|
||||
administrative details pertinent to the maintenance of jails
|
||||
created using the &man.jail.2; API. This includes the
|
||||
per-jail hostname, IP address, and related settings. This
|
||||
structure is reference-counted since pointers to instances of
|
||||
the structure are shared by many credential structures. A
|
||||
single mutex, <structfield>pr_mtx</structfield> protects read
|
||||
and write access to the reference count and all mutable
|
||||
variables inside the struct jail. Some variables are set only
|
||||
when the jail is created, and a valid reference to the
|
||||
<structname>struct prison</structname> is sufficient to read
|
||||
these values. The precise locking of each entry is documented
|
||||
via comments in <filename>sys/jail.h</filename>.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>MAC Framework</title>
|
||||
|
||||
<para>The TrustedBSD MAC Framework maintains data in a variety
|
||||
of kernel objects, in the form of <structname>struct
|
||||
label</structname>. In general, labels in kernel objects
|
||||
are protected by the same lock as the remainder of the kernel
|
||||
object. For example, the <structfield>v_label</structfield>
|
||||
label in <structname>struct vnode</structname> is protected
|
||||
by the vnode lock on the vnode.</para>
|
||||
|
||||
<para>In addition to labels maintained in standard kernel objects,
|
||||
the MAC Framework also maintains a list of registered and
|
||||
active policies. The policy list is protected by a global
|
||||
mutex (<varname>mac_policy_list_lock</varname>) and a busy
|
||||
count (also protected by the mutex). Since many access
|
||||
control checks may occur in parallel, entry to the framework
|
||||
for a read-only access to the policy list requires holding the
|
||||
mutex while incrementing (and later decrementing) the busy
|
||||
count. The mutex need not be held for the duration of the
|
||||
MAC entry operation--some operations, such as label operations
|
||||
on file system objects--are long-lived. To modify the policy
|
||||
list, such as during policy registration and de-registration,
|
||||
the mutex must be held and the reference count must be zero,
|
||||
to prevent modification of the list while it is in use.</para>
|
||||
|
||||
<para>A condition variable,
|
||||
<varname>mac_policy_list_not_busy</varname>, is available to
|
||||
threads that need to wait for the list to become unbusy, but
|
||||
this condition variable must only be waited on if the caller is
|
||||
holding no other locks, or a lock order violation may be
|
||||
possible. The busy count, in effect, acts as a form of
|
||||
shared/exclusive lock over access to the framework: the difference
|
||||
is that, unlike with an sx lock, consumers waiting for the list
|
||||
to become unbusy may be starved, rather than permitting lock
|
||||
order problems with regards to the busy count and other locks
|
||||
that may be held on entry to (or inside) the MAC Framework.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Modules</title>
|
||||
|
||||
<para>For the module subsystem there exists a single lock that is
|
||||
used to protect the shared data. This lock is a shared/exclusive
|
||||
(SX) lock and has a good chance of needing to be acquired (shared
|
||||
or exclusively), therefore there are a few macros that have been
|
||||
added to make access to the lock more easy. These macros can be
|
||||
located in <filename>sys/module.h</filename> and are quite basic
|
||||
in terms of usage. The main structures protected under this lock
|
||||
are the <structname>module_t</structname> structures (when shared)
|
||||
and the global <structname>modulelist_t</structname> structure,
|
||||
modules. One should review the related source code in
|
||||
<filename>kern/kern_module.c</filename> to further understand the
|
||||
locking strategy.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Newbus Device Tree</title>
|
||||
|
||||
<para>The newbus system will have one sx lock. Readers will
|
||||
hold a shared (read) lock (&man.sx.slock.9;) and writers will hold
|
||||
an exclusive (write) lock (&man.sx.xlock.9;). Internal functions
|
||||
will not do locking at all. Externally visible ones will lock as
|
||||
needed.
|
||||
Those items that do not matter if the race is won or lost will
|
||||
not be locked, since they tend to be read all over the place
|
||||
(e.g. &man.device.get.softc.9;). There will be relatively few
|
||||
changes to the newbus data structures, so a single lock should
|
||||
be sufficient and not impose a performance penalty.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Pipes</title>
|
||||
|
||||
<para>...</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Processes and Threads</title>
|
||||
|
||||
<para>- process hierarchy</para>
|
||||
<para>- proc locks, references</para>
|
||||
<para>- thread-specific copies of proc entries to freeze during system
|
||||
calls, including td_ucred</para>
|
||||
<para>- inter-process operations</para>
|
||||
<para>- process groups and sessions</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Scheduler</title>
|
||||
|
||||
<para>Lots of references to <varname>sched_lock</varname> and notes
|
||||
pointing at specific primitives and related magic elsewhere in the
|
||||
document.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Select and Poll</title>
|
||||
|
||||
<para>The select() and poll() functions permit threads to block
|
||||
waiting on events on file descriptors--most frequently, whether
|
||||
or not the file descriptors are readable or writable.</para>
|
||||
|
||||
<para>...</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>SIGIO</title>
|
||||
|
||||
<para>The SIGIO service permits processes to request the delivery
|
||||
of a SIGIO signal to its process group when the read/write status
|
||||
of specified file descriptors changes. At most one process or
|
||||
process group is permitted to register for SIGIO from any given
|
||||
kernel object, and that process or group is referred to as
|
||||
the owner. Each object supporting SIGIO registration contains
|
||||
pointer field that is NULL if the object is not registered, or
|
||||
points to a <structname>struct sigio</structname> describing
|
||||
the registration. This field is protected by a global mutex,
|
||||
<varname>sigio_lock</varname>. Callers to SIGIO maintenance
|
||||
functions must pass in this field <quote>by reference</quote> so that local
|
||||
register copies of the field are not made when unprotected by
|
||||
the lock.</para>
|
||||
|
||||
<para>One <structname>struct sigio</structname> is allocated for
|
||||
each registered object associated with any process or process
|
||||
group, and contains back-pointers to the object, owner, signal
|
||||
information, a credential, and the general disposition of the
|
||||
registration. Each process or progress group contains a list of
|
||||
registered <structname>struct sigio</structname> structures,
|
||||
<structfield>p_sigiolst</structfield> for processes, and
|
||||
<structfield>pg_sigiolst</structfield> for process groups.
|
||||
These lists are protected by the process or process group
|
||||
locks respectively. Most fields in each <structname>struct
|
||||
sigio</structname> are constant for the duration of the
|
||||
registration, with the exception of the
|
||||
<structfield>sio_pgsigio</structfield> field which links the
|
||||
<structname>struct sigio</structname> into the process or
|
||||
process group list. Developers implementing new kernel
|
||||
objects supporting SIGIO will, in general, want to avoid
|
||||
holding structure locks while invoking SIGIO supporting
|
||||
functions, such as <function>fsetown()</function>
|
||||
or <function>funsetown()</function> to avoid
|
||||
defining a lock order between structure locks and the global
|
||||
SIGIO lock. This is generally possible through use of an
|
||||
elevated reference count on the structure, such as reliance
|
||||
on a file descriptor reference to a pipe during a pipe
|
||||
operation.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Sysctl</title>
|
||||
|
||||
<para>The <function>sysctl()</function> MIB service is invoked
|
||||
from both within the kernel and from userland applications
|
||||
using a system call. At least two issues are raised in locking:
|
||||
first, the protection of the structures maintaining the
|
||||
namespace, and second, interactions with kernel variables and
|
||||
functions that are accessed by the sysctl interface. Since
|
||||
sysctl permits the direct export (and modification) of
|
||||
kernel statistics and configuration parameters, the sysctl
|
||||
mechanism must become aware of appropriate locking semantics
|
||||
for those variables. Currently, sysctl makes use of a
|
||||
single global sx lock to serialize use of sysctl(); however, it
|
||||
is assumed to operate under Giant and other protections are not
|
||||
provided. The remainder of this section speculates on locking
|
||||
and semantic changes to sysctl.</para>
|
||||
|
||||
<para>- Need to change the order of operations for sysctl's that
|
||||
update values from read old, copyin and copyout, write new to
|
||||
copyin, lock, read old and write new, unlock, copyout. Normal
|
||||
sysctl's that just copyout the old value and set a new value
|
||||
that they copyin may still be able to follow the old model.
|
||||
However, it may be cleaner to use the second model for all of
|
||||
the sysctl handlers to avoid lock operations.</para>
|
||||
|
||||
<para>- To allow for the common case, a sysctl could embed a
|
||||
pointer to a mutex in the SYSCTL_FOO macros and in the struct.
|
||||
This would work for most sysctl's. For values protected by sx
|
||||
locks, spin mutexes, or other locking strategies besides a
|
||||
single sleep mutex, SYSCTL_PROC nodes could be used to get the
|
||||
locking right.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Taskqueue</title>
|
||||
|
||||
<para> The taskqueue's interface has two basic locks associated
|
||||
with it in order to protect the related shared data. The
|
||||
<varname>taskqueue_queues_mutex</varname> is meant to serve as a
|
||||
lock to protect the <varname>taskqueue_queues</varname> TAILQ.
|
||||
The other mutex lock associated with this system is the one in the
|
||||
<structname>struct taskqueue</structname> data structure. The
|
||||
use of the synchronization primitive here is to protect the
|
||||
integrity of the data in the <structname>struct
|
||||
taskqueue</structname>. It should be noted that there are no
|
||||
separate macros to assist the user in locking down his/her own work
|
||||
since these locks are most likely not going to be used outside of
|
||||
<filename>kern/subr_taskqueue.c</filename>.</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Implementation Notes</title>
|
||||
|
||||
<sect2>
|
||||
<title>Details of the Mutex Implementation</title>
|
||||
|
||||
<para>- Should we require mutexes to be owned for mtx_destroy()
|
||||
since we can not safely assert that they are unowned by anyone
|
||||
else otherwise?</para>
|
||||
|
||||
<sect3>
|
||||
<title>Spin Mutexes</title>
|
||||
|
||||
<para>- Use a critical section...</para>
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>Sleep Mutexes</title>
|
||||
|
||||
<para>- Describe the races with contested mutexes</para>
|
||||
|
||||
<para>- Why it is safe to read mtx_lock of a contested mutex
|
||||
when holding sched_lock.</para>
|
||||
|
||||
<para>- Priority propagation</para>
|
||||
</sect3>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Witness</title>
|
||||
|
||||
<para>- What does it do</para>
|
||||
|
||||
<para>- How does it work</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1>
|
||||
<title>Miscellaneous Topics</title>
|
||||
|
||||
<sect2>
|
||||
<title>Interrupt Source and ICU Abstractions</title>
|
||||
|
||||
<para>- struct isrc</para>
|
||||
|
||||
<para>- pic drivers</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Other Random Questions/Topics</title>
|
||||
|
||||
<para>Should we pass an interlock into
|
||||
<function>sema_wait</function>?</para>
|
||||
|
||||
<para>- Generic turnstiles for sleep mutexes and sx locks.</para>
|
||||
|
||||
<para>- Should we have non-sleepable sx locks?</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<glossary id="glossary">
|
||||
<title>Glossary</title>
|
||||
|
||||
<glossentry id="atomic">
|
||||
<glossterm>atomic</glossterm>
|
||||
<glossdef>
|
||||
<para>An operation is atomic if all of its effects are visible
|
||||
to other CPUs together when the proper access protocol is
|
||||
followed. In the degenerate case are atomic instructions
|
||||
provided directly by machine architectures. At a higher
|
||||
level, if several members of a structure are protected by a
|
||||
lock, then a set of operations are atomic if they are all
|
||||
performed while holding the lock without releasing the lock
|
||||
in between any of the operations.</para>
|
||||
|
||||
<glossseealso>operation</glossseealso>
|
||||
</glossdef>
|
||||
</glossentry>
|
||||
|
||||
<glossentry id="block">
|
||||
<glossterm>block</glossterm>
|
||||
<glossdef>
|
||||
<para>A thread is blocked when it is waiting on a lock,
|
||||
resource, or condition. Unfortunately this term is a bit
|
||||
overloaded as a result.</para>
|
||||
|
||||
<glossseealso>sleep</glossseealso>
|
||||
</glossdef>
|
||||
</glossentry>
|
||||
|
||||
<glossentry id="critical-section">
|
||||
<glossterm>critical section</glossterm>
|
||||
<glossdef>
|
||||
<para>A section of code that is not allowed to be preempted.
|
||||
A critical section is entered and exited using the
|
||||
&man.critical.enter.9; API.</para>
|
||||
</glossdef>
|
||||
</glossentry>
|
||||
|
||||
<glossentry id="MD">
|
||||
<glossterm>MD</glossterm>
|
||||
<glossdef>
|
||||
<para>Machine dependent.</para>
|
||||
|
||||
<glossseealso>MI</glossseealso>
|
||||
</glossdef>
|
||||
</glossentry>
|
||||
|
||||
<glossentry id="memory-operation">
|
||||
<glossterm>memory operation</glossterm>
|
||||
<glossdef>
|
||||
<para>A memory operation reads and/or writes to a memory
|
||||
location.</para>
|
||||
</glossdef>
|
||||
</glossentry>
|
||||
|
||||
<glossentry id="MI">
|
||||
<glossterm>MI</glossterm>
|
||||
<glossdef>
|
||||
<para>Machine independent.</para>
|
||||
|
||||
<glossseealso>MD</glossseealso>
|
||||
</glossdef>
|
||||
</glossentry>
|
||||
|
||||
<glossentry id="operation">
|
||||
<glossterm>operation</glossterm>
|
||||
<glosssee>memory operation</glosssee>
|
||||
</glossentry>
|
||||
|
||||
<glossentry id="primary-interrupt-context">
|
||||
<glossterm>primary interrupt context</glossterm>
|
||||
<glossdef>
|
||||
<para>Primary interrupt context refers to the code that runs
|
||||
when an interrupt occurs. This code can either run an
|
||||
interrupt handler directly or schedule an asynchronous
|
||||
interrupt thread to execute the interrupt handlers for a
|
||||
given interrupt source.</para>
|
||||
</glossdef>
|
||||
</glossentry>
|
||||
|
||||
<glossentry>
|
||||
<glossterm>realtime kernel thread</glossterm>
|
||||
<glossdef>
|
||||
<para>A high priority kernel thread. Currently, the only
|
||||
realtime priority kernel threads are interrupt threads.</para>
|
||||
|
||||
<glossseealso>thread</glossseealso>
|
||||
</glossdef>
|
||||
</glossentry>
|
||||
|
||||
<glossentry id="sleep">
|
||||
<glossterm>sleep</glossterm>
|
||||
<glossdef>
|
||||
<para>A thread is asleep when it is blocked on a condition
|
||||
variable or a sleep queue via <function>msleep</function> or
|
||||
<function>tsleep</function>.</para>
|
||||
|
||||
<glossseealso>block</glossseealso>
|
||||
</glossdef>
|
||||
</glossentry>
|
||||
|
||||
<glossentry id="sleepable-lock">
|
||||
<glossterm>sleepable lock</glossterm>
|
||||
<glossdef>
|
||||
<para>A sleepable lock is a lock that can be held by a thread
|
||||
which is asleep. Lockmgr locks and sx locks are currently
|
||||
the only sleepable locks in FreeBSD. Eventually, some sx
|
||||
locks such as the allproc and proctree locks may become
|
||||
non-sleepable locks.</para>
|
||||
|
||||
<glossseealso>sleep</glossseealso>
|
||||
</glossdef>
|
||||
</glossentry>
|
||||
|
||||
<glossentry id="thread">
|
||||
<glossterm>thread</glossterm>
|
||||
<glossdef>
|
||||
<para>A kernel thread represented by a struct thread. Threads own
|
||||
locks and hold a single execution context.</para>
|
||||
</glossdef>
|
||||
</glossentry>
|
||||
</glossary>
|
||||
</article>
|
|
@ -1,6 +1,7 @@
|
|||
# $FreeBSD$
|
||||
|
||||
SUBDIR = corp-net-guide
|
||||
SUBDIR = arch-handbook
|
||||
SUBDIR+= corp-net-guide
|
||||
SUBDIR+= design-44bsd
|
||||
SUBDIR+= developers-handbook
|
||||
SUBDIR+= faq
|
||||
|
|
|
@ -15,9 +15,6 @@ HAS_INDEX= true
|
|||
INSTALL_COMPRESSED?= gz
|
||||
INSTALL_ONLY_COMPRESSED?=
|
||||
|
||||
# Images
|
||||
IMAGES_EN= sockets/layers.eps sockets/sain.eps sockets/sainfill.eps sockets/sainlsb.eps sockets/sainmsb.eps sockets/sainserv.eps sockets/serv.eps sockets/serv2.eps sockets/slayers.eps
|
||||
|
||||
#
|
||||
# SRCS lists the individual SGML files that make up the document. Changes
|
||||
# to any of these files will force a rebuild
|
||||
|
@ -26,28 +23,20 @@ IMAGES_EN= sockets/layers.eps sockets/sain.eps sockets/sainfill.eps sockets/sain
|
|||
# SGML content
|
||||
SRCS= book.sgml
|
||||
SRCS+= boot/chapter.sgml
|
||||
SRCS+= dma/chapter.sgml
|
||||
SRCS+= driverbasics/chapter.sgml
|
||||
SRCS+= introduction/chapter.sgml
|
||||
SRCS+= ipv6/chapter.sgml
|
||||
SRCS+= isa/chapter.sgml
|
||||
SRCS+= jail/chapter.sgml
|
||||
SRCS+= kerneldebug/chapter.sgml
|
||||
SRCS+= kobj/chapter.sgml
|
||||
SRCS+= l10n/chapter.sgml
|
||||
SRCS+= locking/chapter.sgml
|
||||
SRCS+= mac/chapter.sgml
|
||||
SRCS+= newbus/chapter.sgml
|
||||
SRCS+= pci/chapter.sgml
|
||||
SRCS+= policies/chapter.sgml
|
||||
SRCS+= scsi/chapter.sgml
|
||||
SRCS+= secure/chapter.sgml
|
||||
SRCS+= sockets/chapter.sgml
|
||||
SRCS+= smp/chapter.sgml
|
||||
SRCS+= sound/chapter.sgml
|
||||
SRCS+= sysinit/chapter.sgml
|
||||
SRCS+= tools/chapter.sgml
|
||||
SRCS+= usb/chapter.sgml
|
||||
SRCS+= vm/chapter.sgml
|
||||
SRCS+= x86/chapter.sgml
|
||||
|
||||
# Entities
|
||||
|
||||
|
|
|
@ -20,7 +20,7 @@
|
|||
|
||||
<book>
|
||||
<bookinfo>
|
||||
<title>FreeBSD Developers' Handbook</title>
|
||||
<title>&os; Architecture Handbook</title>
|
||||
|
||||
<corpauthor>The FreeBSD Documentation Project</corpauthor>
|
||||
|
||||
|
@ -38,7 +38,7 @@
|
|||
&bookinfo.legalnotice;
|
||||
|
||||
<abstract>
|
||||
<para>Welcome to the Developers' Handbook. This manual is a
|
||||
<para>Welcome to the &os; Architecture Handbook. This manual is a
|
||||
<emphasis>work in progress</emphasis> and is the work of many
|
||||
individuals. Many sections do not yet exist and some of those
|
||||
that do exist need to be updated. If you are interested in
|
||||
|
@ -55,33 +55,6 @@
|
|||
</abstract>
|
||||
</bookinfo>
|
||||
|
||||
<part id="Basics">
|
||||
<title>Basics</title>
|
||||
|
||||
&chap.introduction;
|
||||
&chap.tools;
|
||||
&chap.secure;
|
||||
&chap.l10n;
|
||||
&chap.policies;
|
||||
|
||||
</part>
|
||||
|
||||
<part id="ipc">
|
||||
<title>Interprocess Communication</title>
|
||||
|
||||
<chapter id="signals">
|
||||
<title>* Signals</title>
|
||||
|
||||
<para>Signals, pipes, semaphores, message queues, shared memory,
|
||||
ports, sockets, doors</para>
|
||||
|
||||
</chapter>
|
||||
|
||||
&chap.sockets;
|
||||
&chap.ipv6;
|
||||
|
||||
</part>
|
||||
|
||||
<part id="kernel">
|
||||
<title>Kernel</title>
|
||||
|
||||
|
@ -92,8 +65,7 @@
|
|||
&chap.sysinit;
|
||||
&chap.mac;
|
||||
&chap.vm;
|
||||
&chap.dma;
|
||||
&chap.kerneldebug;
|
||||
&chap.smp;
|
||||
|
||||
<chapter id="ufs">
|
||||
<title>* UFS</title>
|
||||
|
@ -145,21 +117,22 @@
|
|||
|
||||
</part>
|
||||
|
||||
<!-- XXX - finish me
|
||||
<part id="architectures">
|
||||
<title>Architectures</title>
|
||||
|
||||
&chap.x86;
|
||||
<chapter id="i386">
|
||||
<title>* I386</title>
|
||||
|
||||
<para>Talk about <literal>i386</literal> specific &os;
|
||||
architecture.</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="alpha">
|
||||
<title>* Alpha</title>
|
||||
|
||||
<para>Talk about the architectural specifics of
|
||||
FreeBSD/alpha.</para>
|
||||
|
||||
<para>Explanation of alignment errors, how to fix, how to
|
||||
ignore.</para>
|
||||
|
||||
<para>Example assembly language code for FreeBSD/alpha.</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="ia64">
|
||||
|
@ -169,56 +142,36 @@
|
|||
FreeBSD/ia64.</para>
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="sparc64">
|
||||
<title>* SPARC64</title>
|
||||
|
||||
<para>Talk about <literal>SPARC64</literal> specific &os;
|
||||
architecture.</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="amd64">
|
||||
<title>* AMD64</title>
|
||||
|
||||
<para>Talk about <literal>AMD64</literal> specific &os;
|
||||
architecture.</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="powerpc">
|
||||
<title>* PowerPC</title>
|
||||
|
||||
<para>Talk about <literal>PowerPC</literal> specific &os;
|
||||
architecture.</para>
|
||||
</chapter>
|
||||
</part>
|
||||
|
||||
-->
|
||||
|
||||
<part id="appendices">
|
||||
<title>Appendices</title>
|
||||
|
||||
<bibliography>
|
||||
|
||||
<biblioentry id="COD" xreflabel="1">
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Dave</firstname>
|
||||
<othername role="MI">A</othername>
|
||||
<surname>Patterson</surname>
|
||||
</author>
|
||||
<author>
|
||||
<firstname>John</firstname>
|
||||
<othername role="MI">L</othername>
|
||||
<surname>Hennessy</surname>
|
||||
</author>
|
||||
</authorgroup>
|
||||
<copyright><year>1998</year><holder>Morgan Kaufmann Publishers,
|
||||
Inc.</holder></copyright>
|
||||
<isbn>1-55860-428-6</isbn>
|
||||
<publisher>
|
||||
<publishername>Morgan Kaufmann Publishers, Inc.</publishername>
|
||||
</publisher>
|
||||
<title>Computer Organization and Design</title>
|
||||
<subtitle>The Hardware / Software Interface</subtitle>
|
||||
<pagenums>1-2</pagenums>
|
||||
</biblioentry>
|
||||
|
||||
<biblioentry xreflabel="2">
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>W.</firstname>
|
||||
<othername role="Middle">Richard</othername>
|
||||
<surname>Stevens</surname>
|
||||
</author>
|
||||
</authorgroup>
|
||||
<copyright><year>1993</year><holder>Addison Wesley Longman,
|
||||
Inc.</holder></copyright>
|
||||
<isbn>0-201-56317-7</isbn>
|
||||
<publisher>
|
||||
<publishername>Addison Wesley Longman, Inc.</publishername>
|
||||
</publisher>
|
||||
<title>Advanced Programming in the Unix Environment</title>
|
||||
<pagenums>1-2</pagenums>
|
||||
</biblioentry>
|
||||
|
||||
<biblioentry xreflabel="3">
|
||||
<biblioentry xreflabel="1">
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Marshall</firstname>
|
||||
|
@ -250,50 +203,6 @@
|
|||
<pagenums>1-2</pagenums>
|
||||
</biblioentry>
|
||||
|
||||
<biblioentry id="Phrack" xreflabel="4">
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Aleph</firstname>
|
||||
<surname>One</surname>
|
||||
</author>
|
||||
</authorgroup>
|
||||
<title>Phrack 49; "Smashing the Stack for Fun and Profit"</title>
|
||||
</biblioentry>
|
||||
|
||||
<biblioentry id="StackGuard" xreflabel="5">
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Chrispin</firstname>
|
||||
<surname>Cowan</surname>
|
||||
</author>
|
||||
<author>
|
||||
<firstname>Calton</firstname>
|
||||
<surname>Pu</surname>
|
||||
</author>
|
||||
<author>
|
||||
<firstname>Dave</firstname>
|
||||
<surname>Maier</surname>
|
||||
</author>
|
||||
</authorgroup>
|
||||
<title>StackGuard; Automatic Adaptive Detection and Prevention of
|
||||
Buffer-Overflow Attacks</title>
|
||||
</biblioentry>
|
||||
|
||||
<biblioentry id="OpenBSD" xreflabel="6">
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Todd</firstname>
|
||||
<surname>Miller</surname>
|
||||
</author>
|
||||
<author>
|
||||
<firstname>Theo</firstname>
|
||||
<surname>de Raadt</surname>
|
||||
</author>
|
||||
</authorgroup>
|
||||
<title>strlcpy and strlcat -- consistent, safe string copy and
|
||||
concatenation.</title>
|
||||
</biblioentry>
|
||||
|
||||
</bibliography>
|
||||
|
||||
<![ %chap.index; [ &chap.index; ]]>
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
<!--
|
||||
Creates entities for each chapter in the FreeBSD Developer's
|
||||
Creates entities for each chapter in the FreeBSD Architecture
|
||||
Handbook. Each entity is named chap.foo, where foo is the value
|
||||
of the id attribute on that chapter, and corresponds to the name of
|
||||
the directory in which that chapter's .sgml file is stored.
|
||||
|
@ -9,29 +9,17 @@
|
|||
$FreeBSD$
|
||||
-->
|
||||
|
||||
<!-- Part one -->
|
||||
<!ENTITY chap.introduction SYSTEM "introduction/chapter.sgml">
|
||||
<!ENTITY chap.tools SYSTEM "tools/chapter.sgml">
|
||||
<!ENTITY chap.secure SYSTEM "secure/chapter.sgml">
|
||||
<!ENTITY chap.l10n SYSTEM "l10n/chapter.sgml">
|
||||
<!ENTITY chap.policies SYSTEM "policies/chapter.sgml">
|
||||
|
||||
<!-- Part two - IPC -->
|
||||
<!ENTITY chap.sockets SYSTEM "sockets/chapter.sgml">
|
||||
<!ENTITY chap.ipv6 SYSTEM "ipv6/chapter.sgml">
|
||||
|
||||
<!-- Part three - Kernel -->
|
||||
<!-- Part one - Kernel -->
|
||||
<!ENTITY chap.boot SYSTEM "boot/chapter.sgml">
|
||||
<!ENTITY chap.kobj SYSTEM "kobj/chapter.sgml">
|
||||
<!ENTITY chap.sysinit SYSTEM "sysinit/chapter.sgml">
|
||||
<!ENTITY chap.locking SYSTEM "locking/chapter.sgml">
|
||||
<!ENTITY chap.vm SYSTEM "vm/chapter.sgml">
|
||||
<!ENTITY chap.dma SYSTEM "dma/chapter.sgml">
|
||||
<!ENTITY chap.kerneldebug SYSTEM "kerneldebug/chapter.sgml">
|
||||
<!ENTITY chap.jail SYSTEM "jail/chapter.sgml">
|
||||
<!ENTITY chap.mac SYSTEM "mac/chapter.sgml">
|
||||
<!ENTITY chap.smp SYSTEM "smp/chapter.sgml">
|
||||
|
||||
<!-- Part four - Device Drivers -->
|
||||
<!-- Part Two - Device Drivers -->
|
||||
<!ENTITY chap.driverbasics SYSTEM "driverbasics/chapter.sgml">
|
||||
<!ENTITY chap.isa SYSTEM "isa/chapter.sgml">
|
||||
<!ENTITY chap.pci SYSTEM "pci/chapter.sgml">
|
||||
|
@ -40,9 +28,5 @@
|
|||
<!ENTITY chap.newbus SYSTEM "newbus/chapter.sgml">
|
||||
<!ENTITY chap.snd SYSTEM "sound/chapter.sgml">
|
||||
|
||||
<!-- Part five - Architectures -->
|
||||
<!ENTITY chap.x86 SYSTEM "x86/chapter.sgml">
|
||||
|
||||
<!-- Part six - Appendices -->
|
||||
<!ENTITY chap.bibliography SYSTEM "bibliography/chapter.sgml">
|
||||
<!-- Part three - Appendices -->
|
||||
<!ENTITY chap.index SYSTEM "index.sgml">
|
||||
|
|
|
@ -1,25 +1,11 @@
|
|||
<!DOCTYPE article PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN" [
|
||||
<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
|
||||
%man;
|
||||
|
||||
<!ENTITY % authors PUBLIC "-//FreeBSD//ENTITIES DocBook Author Entities//EN">
|
||||
%authors;
|
||||
<!ENTITY % misc PUBLIC "-//FreeBSD//ENTITIES DocBook Miscellaneous FreeBSD Entities//EN">
|
||||
%misc;
|
||||
<!ENTITY % freebsd PUBLIC "-//FreeBSD//ENTITIES DocBook Miscellaneous FreeBSD Entities//EN">
|
||||
%freebsd;
|
||||
|
||||
<!--ENTITY % mailing-lists PUBLIC "-//FreeBSD//ENTITIES DocBook Mailing List Entities//EN"-->
|
||||
<!--
|
||||
%mailing-lists;
|
||||
The FreeBSD Documentation Project
|
||||
The FreeBSD SMP Next Generation Project
|
||||
|
||||
$FreeBSD$
|
||||
-->
|
||||
|
||||
]>
|
||||
|
||||
<article>
|
||||
<articleinfo>
|
||||
<title>SMPng Design Document</title>
|
||||
|
||||
<chapter id="smp">
|
||||
<chapterinfo>
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>John</firstname>
|
||||
|
@ -39,8 +25,12 @@
|
|||
<holder>John Baldwin</holder>
|
||||
<holder>Robert Watson</holder>
|
||||
</copyright>
|
||||
</chapterinfo>
|
||||
|
||||
<abstract>
|
||||
<title>SMPng Design Document</title>
|
||||
|
||||
<sect1>
|
||||
<title>Introduction</title>
|
||||
<para>This document presents the current design and implementation of
|
||||
the SMPng Architecture. First, the basic primitives and tools are
|
||||
introduced. Next, a general architecture for the FreeBSD kernel's
|
||||
|
@ -51,11 +41,6 @@
|
|||
notes are provided to motivate design choices, and make the reader
|
||||
aware of important implications involving the use of specific
|
||||
primitives. </para>
|
||||
</abstract>
|
||||
</articleinfo>
|
||||
|
||||
<sect1>
|
||||
<title>Introduction</title>
|
||||
|
||||
<para>This document is a work-in-progress, and will be updated to
|
||||
reflect on-going design and implementation activities associated
|
||||
|
@ -956,4 +941,4 @@
|
|||
</glossdef>
|
||||
</glossentry>
|
||||
</glossary>
|
||||
</article>
|
||||
</chapter>
|
||||
|
|
|
@ -25,28 +25,15 @@ IMAGES_EN= sockets/layers.eps sockets/sain.eps sockets/sainfill.eps sockets/sain
|
|||
|
||||
# SGML content
|
||||
SRCS= book.sgml
|
||||
SRCS+= boot/chapter.sgml
|
||||
SRCS+= dma/chapter.sgml
|
||||
SRCS+= driverbasics/chapter.sgml
|
||||
SRCS+= introduction/chapter.sgml
|
||||
SRCS+= ipv6/chapter.sgml
|
||||
SRCS+= isa/chapter.sgml
|
||||
SRCS+= jail/chapter.sgml
|
||||
SRCS+= kerneldebug/chapter.sgml
|
||||
SRCS+= kobj/chapter.sgml
|
||||
SRCS+= l10n/chapter.sgml
|
||||
SRCS+= locking/chapter.sgml
|
||||
SRCS+= mac/chapter.sgml
|
||||
SRCS+= pci/chapter.sgml
|
||||
SRCS+= policies/chapter.sgml
|
||||
SRCS+= scsi/chapter.sgml
|
||||
SRCS+= secure/chapter.sgml
|
||||
SRCS+= sockets/chapter.sgml
|
||||
SRCS+= sound/chapter.sgml
|
||||
SRCS+= sysinit/chapter.sgml
|
||||
SRCS+= tools/chapter.sgml
|
||||
SRCS+= usb/chapter.sgml
|
||||
SRCS+= vm/chapter.sgml
|
||||
SRCS+= x86/chapter.sgml
|
||||
|
||||
# Entities
|
||||
|
|
|
@ -12,7 +12,6 @@
|
|||
<!ENTITY % freebsd PUBLIC "-//FreeBSD//ENTITIES DocBook Miscellaneous FreeBSD Entities//EN">
|
||||
%freebsd;
|
||||
<!ENTITY % chapters SYSTEM "chapters.ent"> %chapters;
|
||||
<!ENTITY % mac-entities SYSTEM "mac.ent"> %mac-entities;
|
||||
<!ENTITY % authors PUBLIC "-//FreeBSD//ENTITIES DocBook Author Entities//EN"> %authors
|
||||
<!ENTITY % mailing-lists PUBLIC "-//FreeBSD//ENTITIES DocBook Mailing List Entities//EN"> %mailing-lists;
|
||||
<!ENTITY % chap.index "IGNORE">
|
||||
|
@ -85,13 +84,6 @@
|
|||
<part id="kernel">
|
||||
<title>Kernel</title>
|
||||
|
||||
&chap.boot;
|
||||
&chap.locking;
|
||||
&chap.kobj;
|
||||
&chap.jail;
|
||||
&chap.sysinit;
|
||||
&chap.mac;
|
||||
&chap.vm;
|
||||
&chap.dma;
|
||||
&chap.kerneldebug;
|
||||
|
||||
|
@ -131,20 +123,6 @@
|
|||
</chapter>
|
||||
</part>
|
||||
|
||||
<part id="devicedrivers">
|
||||
<title>Device Drivers</title>
|
||||
|
||||
&chap.driverbasics;
|
||||
&chap.isa;
|
||||
&chap.pci;
|
||||
&chap.scsi;
|
||||
&chap.usb;
|
||||
&chap.newbus;
|
||||
|
||||
&chap.snd;
|
||||
|
||||
</part>
|
||||
|
||||
<part id="architectures">
|
||||
<title>Architectures</title>
|
||||
|
||||
|
@ -153,22 +131,11 @@
|
|||
<chapter id="alpha">
|
||||
<title>* Alpha</title>
|
||||
|
||||
<para>Talk about the architectural specifics of
|
||||
FreeBSD/alpha.</para>
|
||||
|
||||
<para>Explanation of alignment errors, how to fix, how to
|
||||
ignore.</para>
|
||||
|
||||
<para>Example assembly language code for FreeBSD/alpha.</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="ia64">
|
||||
<title>* IA-64</title>
|
||||
|
||||
<para>Talk about the architectural specifics of
|
||||
FreeBSD/ia64.</para>
|
||||
|
||||
</chapter>
|
||||
</part>
|
||||
|
||||
<part id="appendices">
|
||||
|
|
File diff suppressed because it is too large
Load diff
|
@ -21,28 +21,11 @@
|
|||
<!ENTITY chap.ipv6 SYSTEM "ipv6/chapter.sgml">
|
||||
|
||||
<!-- Part three - Kernel -->
|
||||
<!ENTITY chap.boot SYSTEM "boot/chapter.sgml">
|
||||
<!ENTITY chap.kobj SYSTEM "kobj/chapter.sgml">
|
||||
<!ENTITY chap.sysinit SYSTEM "sysinit/chapter.sgml">
|
||||
<!ENTITY chap.locking SYSTEM "locking/chapter.sgml">
|
||||
<!ENTITY chap.vm SYSTEM "vm/chapter.sgml">
|
||||
<!ENTITY chap.dma SYSTEM "dma/chapter.sgml">
|
||||
<!ENTITY chap.kerneldebug SYSTEM "kerneldebug/chapter.sgml">
|
||||
<!ENTITY chap.jail SYSTEM "jail/chapter.sgml">
|
||||
<!ENTITY chap.mac SYSTEM "mac/chapter.sgml">
|
||||
|
||||
<!-- Part four - Device Drivers -->
|
||||
<!ENTITY chap.driverbasics SYSTEM "driverbasics/chapter.sgml">
|
||||
<!ENTITY chap.isa SYSTEM "isa/chapter.sgml">
|
||||
<!ENTITY chap.pci SYSTEM "pci/chapter.sgml">
|
||||
<!ENTITY chap.scsi SYSTEM "scsi/chapter.sgml">
|
||||
<!ENTITY chap.usb SYSTEM "usb/chapter.sgml">
|
||||
<!ENTITY chap.newbus SYSTEM "newbus/chapter.sgml">
|
||||
<!ENTITY chap.snd SYSTEM "sound/chapter.sgml">
|
||||
|
||||
<!-- Part five - Architectures -->
|
||||
<!ENTITY chap.x86 SYSTEM "x86/chapter.sgml">
|
||||
|
||||
<!-- Part six - Appendices -->
|
||||
<!ENTITY chap.bibliography SYSTEM "bibliography/chapter.sgml">
|
||||
<!ENTITY chap.index SYSTEM "index.sgml">
|
||||
|
|
|
@ -1,392 +0,0 @@
|
|||
<!--
|
||||
The FreeBSD Documentation Project
|
||||
|
||||
$FreeBSD$
|
||||
-->
|
||||
|
||||
<chapter id="driverbasics">
|
||||
<title>Writing FreeBSD Device Drivers</title>
|
||||
|
||||
<para>This chapter was written by &a.murray; with selections from a
|
||||
variety of sources including the intro(4) manual page by
|
||||
&a.joerg;.</para>
|
||||
|
||||
<sect1 id="driverbasics-intro">
|
||||
<title>Introduction</title>
|
||||
<para>This chapter provides a brief introduction to writing device
|
||||
drivers for FreeBSD. A device in this context is a term used
|
||||
mostly for hardware-related stuff that belongs to the system,
|
||||
like disks, printers, or a graphics display with its keyboard.
|
||||
A device driver is the software component of the operating
|
||||
system that controls a specific device. There are also
|
||||
so-called pseudo-devices where a device driver emulates the
|
||||
behavior of a device in software without any particular
|
||||
underlying hardware. Device drivers can be compiled into the
|
||||
system statically or loaded on demand through the dynamic kernel
|
||||
linker facility `kld'.</para>
|
||||
|
||||
<para>Most devices in a Unix-like operating system are accessed
|
||||
through device-nodes, sometimes also called special files.
|
||||
These files are usually located under the directory
|
||||
<filename>/dev</filename> in the filesystem hierarchy.
|
||||
In releases of FreeBSD older than 5.0-RELEASE, where
|
||||
&man.devfs.5; support is not integrated into FreeBSD,
|
||||
each device node must be
|
||||
created statically and independent of the existence of the
|
||||
associated device driver. Most device nodes on the system are
|
||||
created by running <command>MAKEDEV</command>.</para>
|
||||
|
||||
<para>Device drivers can roughly be broken down into two
|
||||
categories; character and network device drivers.</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="driverbasics-kld">
|
||||
<title>Dynamic Kernel Linker Facility - KLD</title>
|
||||
|
||||
<para>The kld interface allows system administrators to
|
||||
dynamically add and remove functionality from a running system.
|
||||
This allows device driver writers to load their new changes into
|
||||
a running kernel without constantly rebooting to test
|
||||
changes.</para>
|
||||
|
||||
<para>The kld interface is used through the following
|
||||
privileged commands:
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><simpara><command>kldload</command> - loads a new kernel
|
||||
module</simpara></listitem>
|
||||
<listitem><simpara><command>kldunload</command> - unloads a kernel
|
||||
module</simpara></listitem>
|
||||
<listitem><simpara><command>kldstat</command> - lists the currently loaded
|
||||
modules</simpara></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
|
||||
<para>Skeleton Layout of a kernel module</para>
|
||||
|
||||
<programlisting>/*
|
||||
* KLD Skeleton
|
||||
* Inspired by Andrew Reiter's Daemonnews article
|
||||
*/
|
||||
|
||||
#include <sys/types.h>
|
||||
#include <sys/module.h>
|
||||
#include <sys/systm.h> /* uprintf */
|
||||
#include <sys/errno.h>
|
||||
#include <sys/param.h> /* defines used in kernel.h */
|
||||
#include <sys/kernel.h> /* types used in module initialization */
|
||||
|
||||
/*
|
||||
* Load handler that deals with the loading and unloading of a KLD.
|
||||
*/
|
||||
|
||||
static int
|
||||
skel_loader(struct module *m, int what, void *arg)
|
||||
{
|
||||
int err = 0;
|
||||
|
||||
switch (what) {
|
||||
case MOD_LOAD: /* kldload */
|
||||
uprintf("Skeleton KLD loaded.\n");
|
||||
break;
|
||||
case MOD_UNLOAD:
|
||||
uprintf("Skeleton KLD unloaded.\n");
|
||||
break;
|
||||
default:
|
||||
err = EINVAL;
|
||||
break;
|
||||
}
|
||||
return(err);
|
||||
}
|
||||
|
||||
/* Declare this module to the rest of the kernel */
|
||||
|
||||
static moduledata_t skel_mod = {
|
||||
"skel",
|
||||
skel_loader,
|
||||
NULL
|
||||
};
|
||||
|
||||
DECLARE_MODULE(skeleton, skel_mod, SI_SUB_KLD, SI_ORDER_ANY);</programlisting>
|
||||
|
||||
|
||||
<sect2>
|
||||
<title>Makefile</title>
|
||||
|
||||
<para>FreeBSD provides a makefile include that you can use to
|
||||
quickly compile your kernel addition.</para>
|
||||
|
||||
<programlisting>SRCS=skeleton.c
|
||||
KMOD=skeleton
|
||||
|
||||
.include <bsd.kmod.mk></programlisting>
|
||||
|
||||
<para>Simply running <command>make</command> with this makefile
|
||||
will create a file <filename>skeleton.ko</filename> that can
|
||||
be loaded into your system by typing:
|
||||
<screen>&prompt.root; <userinput>kldload -v ./skeleton.ko</userinput></screen>
|
||||
</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="driverbasics-access">
|
||||
<title>Accessing a device driver</title>
|
||||
|
||||
<para>Unix provides a common set of system calls for user
|
||||
applications to use. The upper layers of the kernel dispatch
|
||||
these calls to the corresponding device driver when a user
|
||||
accesses a device node. The <command>/dev/MAKEDEV</command>
|
||||
script makes most of the device nodes for your system but if you
|
||||
are doing your own driver development it may be necessary to
|
||||
create your own device nodes with <command>mknod</command>.
|
||||
</para>
|
||||
|
||||
<sect2>
|
||||
<title>Creating static device nodes</title>
|
||||
|
||||
<para>The <command>mknod</command> command requires four
|
||||
arguments to create a device node. You must specify the name
|
||||
of the device node, the type of device, the major number of
|
||||
the device, and the minor number of the device.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Dynamic device nodes</title>
|
||||
|
||||
<para>The device filesystem, or devfs, provides access to the
|
||||
kernel's device namespace in the global filesystem namespace.
|
||||
This eliminates the problems of potentially having a device
|
||||
driver without a static device node, or a device node without
|
||||
an installed device driver. Devfs is still a work in
|
||||
progress, but it is already working quite nicely.</para>
|
||||
</sect2>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="driverbasics-char">
|
||||
<title>Character Devices</title>
|
||||
|
||||
<para>A character device driver is one that transfers data
|
||||
directly to and from a user process. This is the most common
|
||||
type of device driver and there are plenty of simple examples in
|
||||
the source tree.</para>
|
||||
|
||||
<para>This simple example pseudo-device remembers whatever values
|
||||
you write to it and can then supply them back to you when you
|
||||
read from it.</para>
|
||||
|
||||
<programlisting>/*
|
||||
* Simple `echo' pseudo-device KLD
|
||||
*
|
||||
* Murray Stokely
|
||||
*/
|
||||
|
||||
#define MIN(a,b) (((a) < (b)) ? (a) : (b))
|
||||
|
||||
#include <sys/types.h>
|
||||
#include <sys/module.h>
|
||||
#include <sys/systm.h> /* uprintf */
|
||||
#include <sys/errno.h>
|
||||
#include <sys/param.h> /* defines used in kernel.h */
|
||||
#include <sys/kernel.h> /* types used in module initialization */
|
||||
#include <sys/conf.h> /* cdevsw struct */
|
||||
#include <sys/uio.h> /* uio struct */
|
||||
#include <sys/malloc.h>
|
||||
|
||||
#define BUFFERSIZE 256
|
||||
|
||||
/* Function prototypes */
|
||||
d_open_t echo_open;
|
||||
d_close_t echo_close;
|
||||
d_read_t echo_read;
|
||||
d_write_t echo_write;
|
||||
|
||||
/* Character device entry points */
|
||||
static struct cdevsw echo_cdevsw = {
|
||||
echo_open,
|
||||
echo_close,
|
||||
echo_read,
|
||||
echo_write,
|
||||
noioctl,
|
||||
nopoll,
|
||||
nommap,
|
||||
nostrategy,
|
||||
"echo",
|
||||
33, /* reserved for lkms - /usr/src/sys/conf/majors */
|
||||
nodump,
|
||||
nopsize,
|
||||
D_TTY,
|
||||
-1
|
||||
};
|
||||
|
||||
typedef struct s_echo {
|
||||
char msg[BUFFERSIZE];
|
||||
int len;
|
||||
} t_echo;
|
||||
|
||||
/* vars */
|
||||
static dev_t sdev;
|
||||
static int len;
|
||||
static int count;
|
||||
static t_echo *echomsg;
|
||||
|
||||
MALLOC_DECLARE(M_ECHOBUF);
|
||||
MALLOC_DEFINE(M_ECHOBUF, "echobuffer", "buffer for echo module");
|
||||
|
||||
/*
|
||||
* This function acts is called by the kld[un]load(2) system calls to
|
||||
* determine what actions to take when a module is loaded or unloaded.
|
||||
*/
|
||||
|
||||
static int
|
||||
echo_loader(struct module *m, int what, void *arg)
|
||||
{
|
||||
int err = 0;
|
||||
|
||||
switch (what) {
|
||||
case MOD_LOAD: /* kldload */
|
||||
sdev = make_dev(<literal>&</literal>echo_cdevsw,
|
||||
0,
|
||||
UID_ROOT,
|
||||
GID_WHEEL,
|
||||
0600,
|
||||
"echo");
|
||||
/* kmalloc memory for use by this driver */
|
||||
/* malloc(256,M_ECHOBUF,M_WAITOK); */
|
||||
MALLOC(echomsg, t_echo *, sizeof(t_echo), M_ECHOBUF, M_WAITOK);
|
||||
printf("Echo device loaded.\n");
|
||||
break;
|
||||
case MOD_UNLOAD:
|
||||
destroy_dev(sdev);
|
||||
FREE(echomsg,M_ECHOBUF);
|
||||
printf("Echo device unloaded.\n");
|
||||
break;
|
||||
default:
|
||||
err = EINVAL;
|
||||
break;
|
||||
}
|
||||
return(err);
|
||||
}
|
||||
|
||||
int
|
||||
echo_open(dev_t dev, int oflags, int devtype, struct proc *p)
|
||||
{
|
||||
int err = 0;
|
||||
|
||||
uprintf("Opened device \"echo\" successfully.\n");
|
||||
return(err);
|
||||
}
|
||||
|
||||
int
|
||||
echo_close(dev_t dev, int fflag, int devtype, struct proc *p)
|
||||
{
|
||||
uprintf("Closing device \"echo.\"\n");
|
||||
return(0);
|
||||
}
|
||||
|
||||
/*
|
||||
* The read function just takes the buf that was saved via
|
||||
* echo_write() and returns it to userland for accessing.
|
||||
* uio(9)
|
||||
*/
|
||||
|
||||
int
|
||||
echo_read(dev_t dev, struct uio *uio, int ioflag)
|
||||
{
|
||||
int err = 0;
|
||||
int amt;
|
||||
|
||||
/* How big is this read operation? Either as big as the user wants,
|
||||
or as big as the remaining data */
|
||||
amt = MIN(uio->uio_resid, (echomsg->len - uio->uio_offset > 0) ? echomsg->len - uio->uio_offset : 0);
|
||||
if ((err = uiomove(echomsg->msg + uio->uio_offset,amt,uio)) != 0) {
|
||||
uprintf("uiomove failed!\n");
|
||||
}
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
/*
|
||||
* echo_write takes in a character string and saves it
|
||||
* to buf for later accessing.
|
||||
*/
|
||||
|
||||
int
|
||||
echo_write(dev_t dev, struct uio *uio, int ioflag)
|
||||
{
|
||||
int err = 0;
|
||||
|
||||
/* Copy the string in from user memory to kernel memory */
|
||||
err = copyin(uio->uio_iov->iov_base, echomsg->msg, MIN(uio->uio_iov->iov_len,BUFFERSIZE));
|
||||
|
||||
/* Now we need to null terminate */
|
||||
*(echomsg->msg + MIN(uio->uio_iov->iov_len,BUFFERSIZE)) = 0;
|
||||
/* Record the length */
|
||||
echomsg->len = MIN(uio->uio_iov->iov_len,BUFFERSIZE);
|
||||
|
||||
if (err != 0) {
|
||||
uprintf("Write failed: bad address!\n");
|
||||
}
|
||||
|
||||
count++;
|
||||
return(err);
|
||||
}
|
||||
|
||||
DEV_MODULE(echo,echo_loader,NULL);</programlisting>
|
||||
|
||||
<para>To install this driver you will first need to make a node on
|
||||
your filesystem with a command such as:</para>
|
||||
|
||||
<screen>&prompt.root; <userinput>mknod /dev/echo c 33 0</userinput></screen>
|
||||
|
||||
<para>With this driver loaded you should now be able to type
|
||||
something like:</para>
|
||||
|
||||
<screen>&prompt.root; <userinput>echo -n "Test Data" > /dev/echo</userinput>
|
||||
&prompt.root; <userinput>cat /dev/echo</userinput>
|
||||
Test Data</screen>
|
||||
|
||||
<para>Real hardware devices in the next chapter..</para>
|
||||
|
||||
<para>Additional Resources
|
||||
<itemizedlist>
|
||||
<listitem><simpara><ulink
|
||||
url="http://www.daemonnews.org/200010/blueprints.html">Dynamic
|
||||
Kernel Linker (KLD) Facility Programming Tutorial</ulink> -
|
||||
<ulink url="http://www.daemonnews.org/">Daemonnews</ulink> October 2000</simpara></listitem>
|
||||
<listitem><simpara><ulink
|
||||
url="http://www.daemonnews.org/200007/newbus-intro.html">How
|
||||
to Write Kernel Drivers with NEWBUS</ulink> - <ulink
|
||||
url="http://www.daemonnews.org/">Daemonnews</ulink> July
|
||||
2000</simpara></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="driverbasics-net">
|
||||
<title>Network Drivers</title>
|
||||
|
||||
<para>Drivers for network devices do not use device nodes in order
|
||||
to be accessed. Their selection is based on other decisions
|
||||
made inside the kernel and instead of calling open(), use of a
|
||||
network device is generally introduced by using the system call
|
||||
socket(2).</para>
|
||||
|
||||
<para>man ifnet(), loopback device, Bill Paul's drivers,
|
||||
etc..</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
</chapter>
|
||||
|
||||
<!--
|
||||
Local Variables:
|
||||
mode: sgml
|
||||
sgml-declaration: "../chapter.decl"
|
||||
sgml-indent-data: t
|
||||
sgml-omittag: nil
|
||||
sgml-always-quote-attributes: t
|
||||
sgml-parent-document: ("../book.sgml" "part" "chapter")
|
||||
End:
|
||||
-->
|
File diff suppressed because it is too large
Load diff
|
@ -1,597 +0,0 @@
|
|||
<!--
|
||||
The FreeBSD Documentation Project
|
||||
|
||||
$FreeBSD$
|
||||
-->
|
||||
|
||||
<chapter id="jail">
|
||||
<chapterinfo>
|
||||
<author>
|
||||
<firstname>Evan</firstname>
|
||||
<surname>Sarmiento</surname>
|
||||
<affiliation>
|
||||
<address><email>evms@cs.bu.edu</email></address>
|
||||
</affiliation>
|
||||
</author>
|
||||
<copyright>
|
||||
<year>2001</year>
|
||||
<holder role="mailto:evms@cs.bu.edu">Evan Sarmiento</holder>
|
||||
</copyright>
|
||||
</chapterinfo>
|
||||
<title>The Jail Subsystem</title>
|
||||
|
||||
<para>On most UNIX systems, root has omnipotent power. This promotes
|
||||
insecurity. If an attacker were to gain root on a system, he would
|
||||
have every function at his fingertips. In FreeBSD there are
|
||||
sysctls which dilute the power of root, in order to minimize the
|
||||
damage caused by an attacker. Specifically, one of these functions
|
||||
is called secure levels. Similarly, another function which is
|
||||
present from FreeBSD 4.0 and onward, is a utility called
|
||||
&man.jail.8;. <application>Jail</application> chroots an
|
||||
environment and sets certain restrictions on processes which are
|
||||
forked from within. For example, a jailed process cannot affect
|
||||
processes outside of the jail, utilize certain system calls, or
|
||||
inflict any damage on the main computer.</para>
|
||||
|
||||
<para><application>Jail</application> is becoming the new security
|
||||
model. People are running potentially vulnerable servers such as
|
||||
Apache, BIND, and sendmail within jails, so that if an attacker
|
||||
gains root within the <application>Jail</application>, it is only
|
||||
an annoyance, and not a devastation. This article focuses on the
|
||||
internals (source code) of <application>Jail</application>.
|
||||
It will also suggest improvements upon the jail code base which
|
||||
are already being worked on. If you are looking for a how-to on
|
||||
setting up a <application>Jail</application>, I suggest you look
|
||||
at my other article in Sys Admin Magazine, May 2001, entitled
|
||||
"Securing FreeBSD using <application>Jail</application>."</para>
|
||||
|
||||
<sect1 id="jail-arch">
|
||||
<title>Architecture</title>
|
||||
|
||||
<para>
|
||||
<application>Jail</application> consists of two realms: the
|
||||
user-space program, jail, and the code implemented within the
|
||||
kernel: the <literal>jail()</literal> system call and associated
|
||||
restrictions. I will be discussing the user-space program and
|
||||
then how jail is implemented within the kernel.</para>
|
||||
|
||||
<sect2>
|
||||
<title>Userland code</title>
|
||||
|
||||
<para>The source for the user-land jail is located in
|
||||
<filename>/usr/src/usr.sbin/jail</filename>, consisting of
|
||||
one file, <filename>jail.c</filename>. The program takes these
|
||||
arguments: the path of the jail, hostname, ip address, and the
|
||||
command to be executed.</para>
|
||||
|
||||
<sect3>
|
||||
<title>Data Structures</title>
|
||||
|
||||
<para>In <filename>jail.c</filename>, the first thing I would
|
||||
note is the declaration of an important structure
|
||||
<literal>struct jail j</literal>; which was included from
|
||||
<filename>/usr/include/sys/jail.h</filename>.</para>
|
||||
|
||||
<para>The definition of the jail structure is:</para>
|
||||
|
||||
<programlisting><filename>/usr/include/sys/jail.h</filename>:
|
||||
|
||||
struct jail {
|
||||
u_int32_t version;
|
||||
char *path;
|
||||
char *hostname;
|
||||
u_int32_t ip_number;
|
||||
};</programlisting>
|
||||
|
||||
<para>As you can see, there is an entry for each of the
|
||||
arguments passed to the jail program, and indeed, they are
|
||||
set during it's execution.</para>
|
||||
|
||||
<programlisting><filename>/usr/src/usr.sbin/jail.c</filename>
|
||||
j.version = 0;
|
||||
j.path = argv[1];
|
||||
j.hostname = argv[2];</programlisting>
|
||||
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>Networking</title>
|
||||
|
||||
<para>One of the arguments passed to the Jail program is an IP
|
||||
address with which the jail can be accessed over the
|
||||
network. Jail translates the ip address given into network
|
||||
byte order and then stores it in j (the jail structure).</para>
|
||||
|
||||
<programlisting><filename>/usr/src/usr.sbin/jail/jail.c</filename>:
|
||||
struct in.addr in;
|
||||
...
|
||||
i = inet.aton(argv[3], <![CDATA[&in]]>);
|
||||
...
|
||||
j.ip_number = ntohl(in.s.addr);</programlisting>
|
||||
|
||||
<para>The
|
||||
<citerefentry><refentrytitle>inet_aton</refentrytitle><manvolnum>3</manvolnum></citerefentry>
|
||||
function "interprets the specified character string as an
|
||||
Internet address, placing the address into the structure
|
||||
provided." The ip number node in the jail structure is set
|
||||
only when the ip address placed onto the in structure by
|
||||
inet aton is translated into network byte order by
|
||||
<function>ntohl()</function>.</para>
|
||||
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>Jailing The Process</title>
|
||||
|
||||
<para>Finally, the userland program jails the process, and
|
||||
executes the command specified. Jail now becomes an
|
||||
imprisoned process itself and forks a child process which
|
||||
then executes the command given using &man.execv.3;</para>
|
||||
|
||||
<programlisting><filename>/usr/src/sys/usr.sbin/jail/jail.c</filename>
|
||||
i = jail(<![CDATA[&j]]>);
|
||||
...
|
||||
i = execv(argv[4], argv + 4);</programlisting>
|
||||
|
||||
<para>As you can see, the jail function is being called, and
|
||||
its argument is the jail structure which has been filled
|
||||
with the arguments given to the program. Finally, the
|
||||
program you specify is executed. I will now discuss how Jail
|
||||
is implemented within the kernel.</para>
|
||||
</sect3>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Kernel Space</title>
|
||||
|
||||
<para>We will now be looking at the file
|
||||
<filename>/usr/src/sys/kern/kern_jail.c</filename>. This is
|
||||
the file where the jail system call, appropriate sysctls, and
|
||||
networking functions are defined.</para>
|
||||
|
||||
<sect3>
|
||||
<title>sysctls</title>
|
||||
|
||||
<para>In <filename>kern_jail.c</filename>, the following
|
||||
sysctls are defined:</para>
|
||||
|
||||
<programlisting><filename>/usr/src/sys/kern/kern_jail.c:</filename>
|
||||
|
||||
int jail_set_hostname_allowed = 1;
|
||||
SYSCTL_INT(_jail, OID_AUTO, set_hostname_allowed, CTLFLAG_RW,
|
||||
<![CDATA[&jail]]>_set_hostname_allowed, 0,
|
||||
"Processes in jail can set their hostnames");
|
||||
|
||||
int jail_socket_unixiproute_only = 1;
|
||||
SYSCTL_INT(_jail, OID_AUTO, socket_unixiproute_only, CTLFLAG_RW,
|
||||
<![CDATA[&jail]]>_socket_unixiproute_only, 0,
|
||||
"Processes in jail are limited to creating UNIX/IPv4/route sockets only
|
||||
");
|
||||
|
||||
int jail_sysvipc_allowed = 0;
|
||||
SYSCTL_INT(_jail, OID_AUTO, sysvipc_allowed, CTLFLAG_RW,
|
||||
<![CDATA[&jail]]>_sysvipc_allowed, 0,
|
||||
"Processes in jail can use System V IPC primitives");</programlisting>
|
||||
|
||||
<para>Each of these sysctls can be accessed by the user
|
||||
through the sysctl program. Throughout the kernel, these
|
||||
specific sysctls are recognized by their name. For example,
|
||||
the name of the first sysctl is
|
||||
<literal>jail.set.hostname.allowed</literal>.</para>
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>&man.jail.2; system call</title>
|
||||
|
||||
<para>Like all system calls, the &man.jail.2; system call takes
|
||||
two arguments, <literal>struct proc *p</literal> and
|
||||
<literal>struct jail_args
|
||||
*uap</literal>. <literal>p</literal> is a pointer to a proc
|
||||
structure which describes the calling process. In this
|
||||
context, uap is a pointer to a structure which specifies the
|
||||
arguments given to &man.jail.2; from the userland program
|
||||
<filename>jail.c</filename>. When I described the userland
|
||||
program before, you saw that the &man.jail.2; system call was
|
||||
given a jail structure as its own argument.</para>
|
||||
|
||||
<programlisting><filename>/usr/src/sys/kern/kern_jail.c:</filename>
|
||||
int
|
||||
jail(p, uap)
|
||||
struct proc *p;
|
||||
struct jail_args /* {
|
||||
syscallarg(struct jail *) jail;
|
||||
} */ *uap;</programlisting>
|
||||
|
||||
<para>Therefore, <literal>uap->jail</literal> would access the
|
||||
jail structure which was passed to the system call. Next,
|
||||
the system call copies the jail structure into kernel space
|
||||
using the <literal>copyin()</literal>
|
||||
function. <literal>copyin()</literal> takes three arguments:
|
||||
the data which is to be copied into kernel space,
|
||||
<literal>uap->jail</literal>, where to store it,
|
||||
<literal>j</literal> and the size of the storage. The jail
|
||||
structure <literal>uap->jail</literal> is copied into kernel
|
||||
space and stored in another jail structure,
|
||||
<literal>j</literal>.</para>
|
||||
|
||||
<programlisting><filename>/usr/src/sys/kern/kern_jail.c: </filename>
|
||||
error = copyin(uap->jail, <![CDATA[&j]]>, sizeof j);</programlisting>
|
||||
|
||||
<para>There is another important structure defined in
|
||||
jail.h. It is the prison structure
|
||||
(<literal>pr</literal>). The prison structure is used
|
||||
exclusively within kernel space. The &man.jail.2; system call
|
||||
copies everything from the jail structure onto the prison
|
||||
structure. Here is the definition of the prison structure.</para>
|
||||
|
||||
<programlisting><filename>/usr/include/sys/jail.h</filename>:
|
||||
struct prison {
|
||||
int pr_ref;
|
||||
char pr_host[MAXHOSTNAMELEN];
|
||||
u_int32_t pr_ip;
|
||||
void *pr_linux;
|
||||
};</programlisting>
|
||||
|
||||
<para>The jail() system call then allocates memory for a
|
||||
pointer to a prison structure and copies data between the two
|
||||
structures.</para>
|
||||
|
||||
<programlisting><filename>/usr/src/sys/kern/kern_jail.c</filename>:
|
||||
MALLOC(pr, struct prison *, sizeof *pr , M_PRISON, M_WAITOK);
|
||||
bzero((caddr_t)pr, sizeof *pr);
|
||||
error = copyinstr(j.hostname, <![CDATA[&pr->pr_host]]>, sizeof pr->pr_host, 0);
|
||||
if (error)
|
||||
goto bail;</programlisting>
|
||||
|
||||
<para>Finally, the jail system call chroots the path
|
||||
specified. The chroot function is given two arguments. The
|
||||
first is p, which represents the calling process, the second
|
||||
is a pointer to the structure chroot args. The structure
|
||||
chroot args contains the path which is to be chrooted. As
|
||||
you can see, the path specified in the jail structure is
|
||||
copied to the chroot args structure and used.</para>
|
||||
|
||||
<programlisting><filename>/usr/src/sys/kern/kern_jail.c</filename>:
|
||||
ca.path = j.path;
|
||||
error = chroot(p, <![CDATA[&ca]]>);</programlisting>
|
||||
|
||||
<para>These next three lines in the source are very important,
|
||||
as they specify how the kernel recognizes a process as
|
||||
jailed. Each process on a Unix system is described by its
|
||||
own proc structure. You can see the whole proc structure in
|
||||
<filename>/usr/include/sys/proc.h</filename>. For example,
|
||||
the p argument in any system call is actually a pointer to
|
||||
that process' proc structure, as stated before. The proc
|
||||
structure contains nodes which can describe the owner's
|
||||
identity (<literal>p_cred</literal>), the process resource
|
||||
limits (<literal>p_limit</literal>), and so on. In the
|
||||
definition of the process structure, there is a pointer to a
|
||||
prison structure. (<literal>p_prison</literal>).</para>
|
||||
|
||||
<programlisting><filename>/usr/include/sys/proc.h: </filename>
|
||||
struct proc {
|
||||
...
|
||||
struct prison *p_prison;
|
||||
...
|
||||
};</programlisting>
|
||||
|
||||
<para>In <filename>kern_jail.c</filename>, the function then
|
||||
copies the pr structure, which is filled with all the
|
||||
information from the original jail structure, over to the
|
||||
<literal>p->p_prison</literal> structure. It then does a
|
||||
bitwise OR of <literal>p->p_flag</literal> with the constant
|
||||
<literal>P_JAILED</literal>, meaning that the calling
|
||||
process is now recognized as jailed. The parent process of
|
||||
each process, forked within the jail, is the program jail
|
||||
itself, as it calls the &man.jail.2; system call. When the
|
||||
program is executed through execve, it inherits the
|
||||
properties of its parents proc structure, therefore it has
|
||||
the <literal>p->p_flag</literal> set, and the
|
||||
<literal>p->p_prison</literal> structure is filled.</para>
|
||||
|
||||
<programlisting><filename>/usr/src/sys/kern/kern_jail.c</filename>
|
||||
p->p.prison = pr;
|
||||
p->p.flag |= P.JAILED;</programlisting>
|
||||
|
||||
<para>When a process is forked from a parent process, the
|
||||
&man.fork.2; system call deals differently with imprisoned
|
||||
processes. In the fork system call, there are two pointers
|
||||
to a <literal>proc</literal> structure <literal>p1</literal>
|
||||
and <literal>p2</literal>. <literal>p1</literal> points to
|
||||
the parent's <literal>proc</literal> structure and p2 points
|
||||
to the child's unfilled <literal>proc</literal>
|
||||
structure. After copying all relevant data between the
|
||||
structures, &man.fork.2; checks if the structure
|
||||
<literal>p->p_prison</literal> is filled on
|
||||
<literal>p2</literal>. If it is, it increments the
|
||||
<literal>pr.ref</literal> by one, and sets the
|
||||
<literal>p_flag</literal> to one on the child process.</para>
|
||||
|
||||
<programlisting><filename>/usr/src/sys/kern/kern_fork.c</filename>:
|
||||
if (p2->p_prison) {
|
||||
p2->p_prison->pr_ref++;
|
||||
p2->p_flag |= P_JAILED;
|
||||
}</programlisting>
|
||||
|
||||
</sect3>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="jail-restrictions">
|
||||
<title>Restrictions</title>
|
||||
|
||||
<para>Throughout the kernel there are access restrictions relating
|
||||
to jailed processes. Usually, these restrictions only check if
|
||||
the process is jailed, and if so, returns an error. For
|
||||
example:</para>
|
||||
|
||||
<programlisting>if (p->p_prison)
|
||||
return EPERM;</programlisting>
|
||||
|
||||
<sect2>
|
||||
<title>SysV IPC</title>
|
||||
|
||||
<para>System V IPC is based on messages. Processes can send each
|
||||
other these messages which tell them how to act. The functions
|
||||
which deal with messages are: <literal>msgsys</literal>,
|
||||
<literal>msgctl</literal>, <literal>msgget</literal>,
|
||||
<literal>msgsend</literal> and <literal>msgrcv</literal>.
|
||||
Earlier, I mentioned that there were certain sysctls you could
|
||||
turn on or off in order to affect the behavior of Jail. One of
|
||||
these sysctls was <literal>jail_sysvipc_allowed</literal>. On
|
||||
most systems, this sysctl is set to 0. If it were set to 1, it
|
||||
would defeat the whole purpose of having a jail; privleged
|
||||
users from within the jail would be able to affect processes
|
||||
outside of the environment. The difference between a message
|
||||
and a signal is that the message only consists of the signal
|
||||
number.</para>
|
||||
|
||||
<para><filename>/usr/src/sys/kern/sysv_msg.c</filename>:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem> <para>&man.msgget.3;: msgget returns (and possibly
|
||||
creates) a message descriptor that designates a message queue
|
||||
for use in other system calls.</para></listitem>
|
||||
|
||||
<listitem> <para>&man.msgctl.3;: Using this function, a process
|
||||
can query the status of a message
|
||||
descriptor.</para></listitem>
|
||||
|
||||
<listitem> <para>&man.msgsnd.3;: msgsnd sends a message to a
|
||||
process.</para></listitem>
|
||||
|
||||
<listitem> <para>&man.msgrcv.3;: a process receives messages using
|
||||
this function</para></listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
<para>In each of these system calls, there is this
|
||||
conditional:</para>
|
||||
|
||||
<programlisting><filename>/usr/src/sys/kern/sysv msg.c</filename>:
|
||||
if (!jail.sysvipc.allowed && p->p_prison != NULL)
|
||||
return (ENOSYS);</programlisting>
|
||||
|
||||
<para>Semaphore system calls allow processes to synchronize
|
||||
execution by doing a set of operations atomically on a set of
|
||||
semaphores. Basically semaphores provide another way for
|
||||
processes lock resources. However, process waiting on a
|
||||
semaphore, that is being used, will sleep until the resources
|
||||
are relinquished. The following semaphore system calls are
|
||||
blocked inside a jail: <literal>semsys</literal>,
|
||||
<literal>semget</literal>, <literal>semctl</literal> and
|
||||
<literal>semop</literal>.</para>
|
||||
|
||||
<para><filename>/usr/src/sys/kern/sysv_sem.c</filename>:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>&man.semctl.2;<literal>(id, num, cmd, arg)</literal>:
|
||||
Semctl does the specified cmd on the semaphore queue
|
||||
indicated by id.</para></listitem>
|
||||
|
||||
<listitem>
|
||||
<para>&man.semget.2;<literal>(key, nsems, flag)</literal>:
|
||||
Semget creates an array of semaphores, corresponding to
|
||||
key.</para>
|
||||
|
||||
<para><literal>Key and flag take on the same meaning as they
|
||||
do in msgget.</literal></para></listitem>
|
||||
|
||||
<listitem><para>&man.semop.2;<literal>(id, ops, num)</literal>:
|
||||
Semop does the set of semaphore operations in the array of
|
||||
structures ops, to the set of semaphores identified by
|
||||
id.</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>System V IPC allows for processes to share
|
||||
memory. Processes can communicate directly with each other by
|
||||
sharing parts of their virtual address space and then reading
|
||||
and writing data stored in the shared memory. These system
|
||||
calls are blocked within a jailed environment: <literal>shmdt,
|
||||
shmat, oshmctl, shmctl, shmget</literal>, and
|
||||
<literal>shmsys</literal>.</para>
|
||||
|
||||
<para><filename>/usr/src/sys/kern/sysv shm.c</filename>:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para>&man.shmctl.2;<literal>(id, cmd, buf)</literal>:
|
||||
shmctl does various control operations on the shared memory
|
||||
region identified by id.</para></listitem>
|
||||
|
||||
<listitem><para>&man.shmget.2;<literal>(key, size,
|
||||
flag)</literal>: shmget accesses or creates a shared memory
|
||||
region of size bytes.</para></listitem>
|
||||
|
||||
<listitem><para>&man.shmat.2;<literal>(id, addr, flag)</literal>:
|
||||
shmat attaches a shared memory region identified by id to the
|
||||
address space of a process.</para></listitem>
|
||||
|
||||
<listitem><para>&man.shmdt.2;<literal>(addr)</literal>: shmdt
|
||||
detaches the shared memory region previously attached at
|
||||
addr.</para></listitem>
|
||||
|
||||
</itemizedlist>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Sockets</title>
|
||||
|
||||
<para>Jail treats the &man.socket.2; system call and related
|
||||
lower-level socket functions in a special manner. In order to
|
||||
determine whether a certain socket is allowed to be created,
|
||||
it first checks to see if the sysctl
|
||||
<literal>jail.socket.unixiproute.only</literal> is set. If
|
||||
set, sockets are only allowed to be created if the family
|
||||
specified is either <literal>PF_LOCAL</literal>,
|
||||
<literal>PF_INET</literal> or
|
||||
<literal>PF_ROUTE</literal>. Otherwise, it returns an
|
||||
error.</para>
|
||||
|
||||
<programlisting><filename>/usr/src/sys/kern/uipc_socket.c</filename>:
|
||||
int socreate(dom, aso, type, proto, p)
|
||||
...
|
||||
register struct protosw *prp;
|
||||
...
|
||||
{
|
||||
if (p->p_prison && jail_socket_unixiproute_only &&
|
||||
prp->pr_domain->dom_family != PR_LOCAL && prp->pr_domain->dom_family != PF_INET
|
||||
&& prp->pr_domain->dom_family != PF_ROUTE)
|
||||
return (EPROTONOSUPPORT);
|
||||
...
|
||||
}</programlisting>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Berkeley Packet Filter</title>
|
||||
|
||||
<para>The Berkeley Packet Filter provides a raw interface to
|
||||
data link layers in a protocol independent fashion. The
|
||||
function <literal>bpfopen()</literal> opens an Ethernet
|
||||
device. There is a conditional which disallows any jailed
|
||||
processes from accessing this function.</para>
|
||||
|
||||
<programlisting><filename>/usr/src/sys/net/bpf.c</filename>:
|
||||
static int bpfopen(dev, flags, fmt, p)
|
||||
...
|
||||
{
|
||||
if (p->p_prison)
|
||||
return (EPERM);
|
||||
...
|
||||
}</programlisting>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Protocols</title>
|
||||
|
||||
<para>There are certain protocols which are very common, such as
|
||||
TCP, UDP, IP and ICMP. IP and ICMP are on the same level: the
|
||||
network layer 2. There are certain precautions which are
|
||||
taken in order to prevent a jailed process from binding a
|
||||
protocol to a certain port only if the <literal>nam</literal>
|
||||
parameter is set. nam is a pointer to a sockaddr structure,
|
||||
which describes the address on which to bind the service. A
|
||||
more exact definition is that sockaddr "may be used as a
|
||||
template for reffering to the identifying tag and length of
|
||||
each address"[2]. In the function in
|
||||
<literal>pcbbind</literal>, <literal>sin</literal> is a
|
||||
pointer to a sockaddr.in structure, which contains the port,
|
||||
address, length and domain family of the socket which is to be
|
||||
bound. Basically, this disallows any processes from jail to be
|
||||
able to specify the domain family.</para>
|
||||
|
||||
<programlisting><filename>/usr/src/sys/kern/netinet/in_pcb.c</filename>:
|
||||
int in.pcbbind(int, nam, p)
|
||||
...
|
||||
struct sockaddr *nam;
|
||||
struct proc *p;
|
||||
{
|
||||
...
|
||||
struct sockaddr.in *sin;
|
||||
...
|
||||
if (nam) {
|
||||
sin = (struct sockaddr.in *)nam;
|
||||
...
|
||||
if (sin->sin_addr.s_addr != INADDR_ANY)
|
||||
if (prison.ip(p, 0, <![CDATA[&sin]]>->sin.addr.s_addr))
|
||||
return (EINVAL);
|
||||
....
|
||||
}
|
||||
...
|
||||
}</programlisting>
|
||||
|
||||
<para>You might be wondering what function
|
||||
<literal>prison_ip()</literal> does. prison.ip is given three
|
||||
arguments, the current process (represented by
|
||||
<literal>p</literal>), any flags, and an ip address. It
|
||||
returns 1 if the ip address belongs to a jail or 0 if it does
|
||||
not. As you can see from the code, if it is indeed an ip
|
||||
address belonging to a jail, the protcol is not allowed to
|
||||
bind to a certain port.</para>
|
||||
|
||||
<programlisting><filename>/usr/src/sys/kern/kern_jail.c:</filename>
|
||||
int prison_ip(struct proc *p, int flag, u_int32_t *ip) {
|
||||
u_int32_t tmp;
|
||||
|
||||
if (!p->p_prison)
|
||||
return (0);
|
||||
if (flag)
|
||||
tmp = *ip;
|
||||
else tmp = ntohl (*ip);
|
||||
|
||||
if (tmp == INADDR_ANY) {
|
||||
if (flag)
|
||||
*ip = p->p_prison->pr_ip;
|
||||
else *ip = htonl(p->p_prison->pr_ip);
|
||||
return (0);
|
||||
}
|
||||
|
||||
if (p->p_prison->pr_ip != tmp)
|
||||
return (1);
|
||||
return (0);
|
||||
}</programlisting>
|
||||
|
||||
<para>Jailed users are not allowed to bind services to an ip
|
||||
which does not belong to the jail. The restriction is also
|
||||
written within the function <literal>in_pcbbind</literal>:</para>
|
||||
|
||||
<programlisting><filename>/usr/src/sys/net inet/in_pcb.c</filename>
|
||||
if (nam) {
|
||||
...
|
||||
lport = sin->sin.port;
|
||||
... if (lport) {
|
||||
...
|
||||
if (p && p->p_prison)
|
||||
prison = 1;
|
||||
if (prison &&
|
||||
prison_ip(p, 0, <![CDATA[&sin]]>->sin_addr.s_addr))
|
||||
return (EADDRNOTAVAIL);</programlisting>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Filesystem</title>
|
||||
|
||||
<para>Even root users within the jail are not allowed to set any
|
||||
file flags, such as immutable, append, and no unlink flags, if
|
||||
the securelevel is greater than 0.</para>
|
||||
|
||||
<programlisting>/usr/src/sys/ufs/ufs/ufs_vnops.c:
|
||||
int ufs.setattr(ap)
|
||||
...
|
||||
{
|
||||
if ((cred->cr.uid == 0) && (p->prison == NULL)) {
|
||||
if ((ip->i_flags
|
||||
& (SF_NOUNLINK | SF_IMMUTABLE | SF_APPEND)) &&
|
||||
securelevel > 0)
|
||||
return (EPERM);
|
||||
}</programlisting>
|
||||
|
||||
</sect2>
|
||||
|
||||
</sect1>
|
||||
|
||||
</chapter>
|
||||
|
|
@ -1,298 +0,0 @@
|
|||
<!--
|
||||
The FreeBSD Documentation Project
|
||||
|
||||
$FreeBSD$
|
||||
-->
|
||||
|
||||
<chapter id="kernel-objects">
|
||||
<title>Kernel Objects</title>
|
||||
|
||||
<para>Kernel Objects, or <firstterm>Kobj</firstterm> provides an
|
||||
object-oriented C programming system for the kernel. As such the
|
||||
data being operated on carries the description of how to operate
|
||||
on it. This allows operations to be added and removed from an
|
||||
interface at run time and without breaking binary
|
||||
compatibility.</para>
|
||||
|
||||
<sect1 id="kernel-objects-term">
|
||||
<title>Terminology</title>
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>Object</term>
|
||||
<listitem><para>A set of data - data structure - data
|
||||
allocation.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Method</term>
|
||||
<listitem>
|
||||
<para>An operation - function.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Class</term>
|
||||
<listitem>
|
||||
<para>One or more methods.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Interface</term>
|
||||
<listitem>
|
||||
<para>A standard set of one or more methods.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="kernel-objects-operation">
|
||||
<title>Kobj Operation</title>
|
||||
|
||||
<para>Kobj works by generating descriptions of methods. Each
|
||||
description holds a unique id as well as a default function. The
|
||||
description's address is used to uniquely identify the method
|
||||
within a class' method table.</para>
|
||||
|
||||
<para>A class is built by creating a method table associating one
|
||||
or more functions with method descriptions. Before use the class
|
||||
is compiled. The compilation allocates a cache and associates it
|
||||
with the class. A unique id is assigned to each method
|
||||
description within the method table of the class if not already
|
||||
done so by another referencing class compilation. For every
|
||||
method to be used a function is generated by script to qualify
|
||||
arguments and automatically reference the method description for
|
||||
a lookup. The generated function looks up the method by using
|
||||
the unique id associated with the method description as a hash
|
||||
into the cache associated with the object's class. If the method
|
||||
is not cached the generated function proceeds to use the class'
|
||||
table to find the method. If the method is found then the
|
||||
associated function within the class is used; otherwise, the
|
||||
default function associated with the method description is
|
||||
used.</para>
|
||||
|
||||
<para>These indirections can be visualized as the
|
||||
following:</para>
|
||||
|
||||
<programlisting>object->cache<->class</programlisting>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="kernel-objects-using">
|
||||
<title>Using Kobj</title>
|
||||
|
||||
<sect2>
|
||||
<title>Structures</title>
|
||||
|
||||
<programlisting>struct kobj_method</programlisting>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Functions</title>
|
||||
|
||||
<programlisting>void kobj_class_compile(kobj_class_t cls);
|
||||
void kobj_class_compile_static(kobj_class_t cls, kobj_ops_t ops);
|
||||
void kobj_class_free(kobj_class_t cls);
|
||||
kobj_t kobj_create(kobj_class_t cls, struct malloc_type *mtype, int mflags);
|
||||
void kobj_init(kobj_t obj, kobj_class_t cls);
|
||||
void kobj_delete(kobj_t obj, struct malloc_type *mtype);</programlisting>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Macros</title>
|
||||
|
||||
<programlisting>KOBJ_CLASS_FIELDS
|
||||
KOBJ_FIELDS
|
||||
DEFINE_CLASS(name, methods, size)
|
||||
KOBJMETHOD(NAME, FUNC)</programlisting>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Headers</title>
|
||||
|
||||
<programlisting><sys/param.h>
|
||||
<sys/kobj.h></programlisting>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Creating an interface template</title>
|
||||
|
||||
<para>The first step in using Kobj is to create an
|
||||
Interface. Creating the interface involves creating a template
|
||||
that the script
|
||||
<filename>src/sys/kern/makeobjops.pl</filename> can use to
|
||||
generate the header and code for the method declarations and
|
||||
method lookup functions.</para>
|
||||
|
||||
<para>Within this template the following keywords are used:
|
||||
<literal>#include</literal>, <literal>INTERFACE</literal>,
|
||||
<literal>CODE</literal>, <literal>METHOD</literal>,
|
||||
<literal>STATICMETHOD</literal>, and
|
||||
<literal>DEFAULT</literal>.</para>
|
||||
|
||||
<para>The <literal>#include</literal> statement and what follows
|
||||
it is copied verbatim to the head of the generated code
|
||||
file.</para>
|
||||
|
||||
<para>For example:</para>
|
||||
|
||||
<programlisting>#include <sys/foo.h></programlisting>
|
||||
|
||||
<para>The <literal>INTERFACE</literal> keyword is used to define
|
||||
the interface name. This name is concatenated with each method
|
||||
name as [interface name]_[method name]. Its syntax is
|
||||
INTERFACE [interface name];.</para>
|
||||
|
||||
<para>For example:</para>
|
||||
|
||||
<programlisting>INTERFACE foo;</programlisting>
|
||||
|
||||
<para>The <literal>CODE</literal> keyword copies its arguments
|
||||
verbatim into the code file. Its syntax is
|
||||
<literal>CODE { [whatever] };</literal></para>
|
||||
|
||||
<para>For example:</para>
|
||||
|
||||
<programlisting>CODE {
|
||||
struct foo * foo_alloc_null(struct bar *)
|
||||
{
|
||||
return NULL;
|
||||
}
|
||||
};</programlisting>
|
||||
|
||||
<para>The <literal>METHOD</literal> keyword describes a method. Its syntax is
|
||||
<literal>METHOD [return type] [method name] { [object [,
|
||||
arguments]] };</literal></para>
|
||||
|
||||
<para>For example:</para>
|
||||
|
||||
<programlisting>METHOD int bar {
|
||||
struct object *;
|
||||
struct foo *;
|
||||
struct bar;
|
||||
};</programlisting>
|
||||
|
||||
<para>The <literal>DEFAULT</literal> keyword may follow the
|
||||
<literal>METHOD</literal> keyword. It extends the
|
||||
<literal>METHOD</literal> key word to include the default
|
||||
function for method. The extended syntax is
|
||||
<literal>METHOD [return type] [method name] {
|
||||
[object; [other arguments]] }DEFAULT [default
|
||||
function];</literal></para>
|
||||
|
||||
<para>For example:</para>
|
||||
|
||||
<programlisting>METHOD int bar {
|
||||
struct object *;
|
||||
struct foo *;
|
||||
int bar;
|
||||
} DEFAULT foo_hack;</programlisting>
|
||||
|
||||
<para>The <literal>STATICMETHOD</literal> keyword is used like
|
||||
the <literal>METHOD</literal> keyword except the kobj data is not
|
||||
at the head of the object structure so casting to kobj_t would
|
||||
be incorrect. Instead <literal>STATICMETHOD</literal> relies on the Kobj data being
|
||||
referenced as 'ops'. This is also useful for calling
|
||||
methods directly out of a class's method table.</para>
|
||||
|
||||
<para>Other complete examples:</para>
|
||||
|
||||
<programlisting>src/sys/kern/bus_if.m
|
||||
src/sys/kern/device_if.m</programlisting>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Creating a Class</title>
|
||||
|
||||
<para>The second step in using Kobj is to create a class. A
|
||||
class consists of a name, a table of methods, and the size of
|
||||
objects if Kobj's object handling facilities are used. To
|
||||
create the class use the macro
|
||||
<function>DEFINE_CLASS()</function>. To create the method
|
||||
table create an array of kobj_method_t terminated by a NULL
|
||||
entry. Each non-NULL entry may be created using the macro
|
||||
<function>KOBJMETHOD()</function>.</para>
|
||||
|
||||
<para>For example:</para>
|
||||
|
||||
<programlisting>DEFINE_CLASS(fooclass, foomethods, sizeof(struct foodata));
|
||||
|
||||
kobj_method_t foomethods[] = {
|
||||
KOBJMETHOD(bar_doo, foo_doo),
|
||||
KOBJMETHOD(bar_foo, foo_foo),
|
||||
{ NULL, NULL}
|
||||
};</programlisting>
|
||||
|
||||
<para>The class must be <quote>compiled</quote>. Depending on
|
||||
the state of the system at the time that the class is to be
|
||||
initialized a statically allocated cache, <quote>ops
|
||||
table</quote> have to be used. This can be accomplished by
|
||||
declaring a <structname>struct kobj_ops</structname> and using
|
||||
<function>kobj_class_compile_static();</function> otherwise,
|
||||
<function>kobj_class_compile()</function> should be used.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Creating an Object</title>
|
||||
|
||||
<para>The third step in using Kobj involves how to define the
|
||||
object. Kobj object creation routines assume that Kobj data is
|
||||
at the head of an object. If this in not appropriate you will
|
||||
have to allocate the object yourself and then use
|
||||
<function>kobj_init()</function> on the Kobj portion of it;
|
||||
otherwise, you may use <function>kobj_create()</function> to
|
||||
allocate and initialize the Kobj portion of the object
|
||||
automatically. <function>kobj_init()</function> may also be
|
||||
used to change the class that an object uses.</para>
|
||||
|
||||
<para>To integrate Kobj into the object you should use the macro
|
||||
KOBJ_FIELDS.</para>
|
||||
|
||||
<para>For example</para>
|
||||
|
||||
<programlisting>struct foo_data {
|
||||
KOBJ_FIELDS;
|
||||
foo_foo;
|
||||
foo_bar;
|
||||
};</programlisting>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Calling Methods</title>
|
||||
|
||||
<para>The last step in using Kobj is to simply use the generated
|
||||
functions to use the desired method within the object's
|
||||
class. This is as simple as using the interface name and the
|
||||
method name with a few modifications. The interface name
|
||||
should be concatenated with the method name using a '_'
|
||||
between them, all in upper case.</para>
|
||||
|
||||
<para>For example, if the interface name was foo and the method
|
||||
was bar then the call would be:</para>
|
||||
|
||||
<programlisting>[return value = ] FOO_BAR(object [, other parameters]);</programlisting>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Cleaning Up</title>
|
||||
|
||||
<para>When an object allocated through
|
||||
<function>kobj_create()</function> is no longer needed
|
||||
<function>kobj_delete()</function> may be called on it, and
|
||||
when a class is no longer being used
|
||||
<function>kobj_class_free()</function> may be called on it.</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<!--
|
||||
Local Variables:
|
||||
mode: sgml
|
||||
sgml-declaration: "../chapter.decl"
|
||||
sgml-indent-data: t
|
||||
sgml-omittag: nil
|
||||
sgml-always-quote-attributes: t
|
||||
sgml-parent-document: ("../book.sgml" "part" "chapter")
|
||||
End:
|
||||
-->
|
|
@ -1,313 +0,0 @@
|
|||
<!--
|
||||
The FreeBSD Documentation Project
|
||||
The FreeBSD SMP Next Generation Project
|
||||
|
||||
$FreeBSD$
|
||||
-->
|
||||
|
||||
<chapter id="locking">
|
||||
<title>Locking Notes</title>
|
||||
|
||||
<para><emphasis>This chapter is maintained by the FreeBSD SMP Next
|
||||
Generation Project. Please direct any comments or suggestions
|
||||
to its &a.smp;.</emphasis></para>
|
||||
|
||||
|
||||
<para>This document outlines the locking used in the FreeBSD kernel
|
||||
to permit effective multi-processing within the kernel. Locking
|
||||
can be achieved via several means. Data structures can be
|
||||
protected by mutexes or &man.lockmgr.9; locks. A few variables
|
||||
are protected simply by always using atomic operations to access
|
||||
them.</para>
|
||||
|
||||
<sect1 id="locking-mutexes">
|
||||
<title>Mutexes</title>
|
||||
|
||||
<para>A mutex is simply a lock used to guarantee mutual exclusion.
|
||||
Specifically, a mutex may only be owned by one entity at a time.
|
||||
If another entity wishes to obtain a mutex that is already
|
||||
owned, it must wait until the mutex is released. In the FreeBSD
|
||||
kernel, mutexes are owned by processes.</para>
|
||||
|
||||
<para>Mutexes may be recursively acquired, but they are intended
|
||||
to be held for a short period of time. Specifically, one may
|
||||
not sleep while holding a mutex. If you need to hold a lock
|
||||
across a sleep, use a &man.lockmgr.9; lock.</para>
|
||||
|
||||
<para>Each mutex has several properties of interest:</para>
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>Variable Name</term>
|
||||
<listitem>
|
||||
<para>The name of the <type>struct mtx</type> variable in
|
||||
the kernel source.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Logical Name</term>
|
||||
<listitem>
|
||||
<para>The name of the mutex assigned to it by
|
||||
<function>mtx_init</function>. This name is displayed in
|
||||
KTR trace messages and witness errors and warnings and is
|
||||
used to distinguish mutexes in the witness code.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Type</term>
|
||||
<listitem>
|
||||
<para>The type of the mutex in terms of the
|
||||
<constant>MTX_*</constant> flags. The meaning for each
|
||||
flag is related to its meaning as documented in
|
||||
&man.mutex.9;.</para>
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term><constant>MTX_DEF</constant></term>
|
||||
<listitem>
|
||||
<para>A sleep mutex</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><constant>MTX_SPIN</constant></term>
|
||||
<listitem>
|
||||
<para>A spin mutex</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><constant>MTX_RECURSE</constant></term>
|
||||
<listitem>
|
||||
<para>This mutex is allowed to recurse.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Protectees</term>
|
||||
<listitem>
|
||||
<para>A list of data structures or data structure members
|
||||
that this entry protects. For data structure members, the
|
||||
name will be in the form of
|
||||
<structname/structure name/.<structfield/member name/.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Dependent Functions</term>
|
||||
<listitem>
|
||||
<para>Functions that can only be called if this mutex is
|
||||
held.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
|
||||
<table frame="all" colsep="1" rowsep="1" pgwide="1">
|
||||
<title>Mutex List</title>
|
||||
|
||||
<tgroup cols="5">
|
||||
<thead>
|
||||
<row>
|
||||
<entry>Variable Name</entry>
|
||||
<entry>Logical Name</entry>
|
||||
<entry>Type</entry>
|
||||
<entry>Protectees</entry>
|
||||
<entry>Dependent Functions</entry>
|
||||
</row>
|
||||
</thead>
|
||||
|
||||
<!-- The scheduler lock -->
|
||||
<tbody>
|
||||
<row>
|
||||
<entry>sched_lock</entry>
|
||||
<entry><quote>sched lock</quote></entry>
|
||||
<entry>
|
||||
<constant>MTX_SPIN</constant> |
|
||||
<constant>MTX_RECURSE</constant>
|
||||
</entry>
|
||||
<entry>
|
||||
<varname>_gmonparam</varname>,
|
||||
<varname>cnt.v_swtch</varname>,
|
||||
<varname>cp_time</varname>,
|
||||
<varname>curpriority</varname>,
|
||||
<structname/mtx/.<structfield/mtx_blocked/,
|
||||
<structname/mtx/.<structfield/mtx_contested/,
|
||||
<structname/proc/.<structfield/p_procq/,
|
||||
<structname/proc/.<structfield/p_slpq/,
|
||||
<structname/proc/.<structfield/p_sflag/
|
||||
<structname/proc/.<structfield/p_stat/,
|
||||
<structname/proc/.<structfield/p_estcpu/,
|
||||
<structname/proc/.<structfield/p_cpticks/
|
||||
<structname/proc/.<structfield/p_pctcpu/,
|
||||
<structname/proc/.<structfield/p_wchan/,
|
||||
<structname/proc/.<structfield/p_wmesg/,
|
||||
<structname/proc/.<structfield/p_swtime/,
|
||||
<structname/proc/.<structfield/p_slptime/,
|
||||
<structname/proc/.<structfield/p_runtime/,
|
||||
<structname/proc/.<structfield/p_uu/,
|
||||
<structname/proc/.<structfield/p_su/,
|
||||
<structname/proc/.<structfield/p_iu/,
|
||||
<structname/proc/.<structfield/p_uticks/,
|
||||
<structname/proc/.<structfield/p_sticks/,
|
||||
<structname/proc/.<structfield/p_iticks/,
|
||||
<structname/proc/.<structfield/p_oncpu/,
|
||||
<structname/proc/.<structfield/p_lastcpu/,
|
||||
<structname/proc/.<structfield/p_rqindex/,
|
||||
<structname/proc/.<structfield/p_heldmtx/,
|
||||
<structname/proc/.<structfield/p_blocked/,
|
||||
<structname/proc/.<structfield/p_mtxname/,
|
||||
<structname/proc/.<structfield/p_contested/,
|
||||
<structname/proc/.<structfield/p_priority/,
|
||||
<structname/proc/.<structfield/p_usrpri/,
|
||||
<structname/proc/.<structfield/p_nativepri/,
|
||||
<structname/proc/.<structfield/p_nice/,
|
||||
<structname/proc/.<structfield/p_rtprio/,
|
||||
<varname>pscnt</varname>,
|
||||
<varname>slpque</varname>,
|
||||
<varname>itqueuebits</varname>,
|
||||
<varname>itqueues</varname>,
|
||||
<varname>rtqueuebits</varname>,
|
||||
<varname>rtqueues</varname>,
|
||||
<varname>queuebits</varname>,
|
||||
<varname>queues</varname>,
|
||||
<varname>idqueuebits</varname>,
|
||||
<varname>idqueues</varname>,
|
||||
<varname>switchtime</varname>,
|
||||
<varname>switchticks</varname>
|
||||
</entry>
|
||||
<entry>
|
||||
<function>setrunqueue</function>,
|
||||
<function>remrunqueue</function>,
|
||||
<function>mi_switch</function>,
|
||||
<function>chooseproc</function>,
|
||||
<function>schedclock</function>,
|
||||
<function>resetpriority</function>,
|
||||
<function>updatepri</function>,
|
||||
<function>maybe_resched</function>,
|
||||
<function>cpu_switch</function>,
|
||||
<function>cpu_throw</function>,
|
||||
<function>need_resched</function>,
|
||||
<function>resched_wanted</function>,
|
||||
<function>clear_resched</function>,
|
||||
<function>aston</function>,
|
||||
<function>astoff</function>,
|
||||
<function>astpending</function>,
|
||||
<function>calcru</function>,
|
||||
<function>proc_compare</function>
|
||||
</entry>
|
||||
</row>
|
||||
|
||||
<!-- The vm86 pcb lock -->
|
||||
<row>
|
||||
<entry>vm86pcb_lock</entry>
|
||||
<entry><quote>vm86pcb lock</quote></entry>
|
||||
<entry>
|
||||
<constant>MTX_DEF</constant>
|
||||
</entry>
|
||||
<entry>
|
||||
<varname>vm86pcb</varname>
|
||||
</entry>
|
||||
<entry>
|
||||
<function>vm86_bioscall</function>
|
||||
</entry>
|
||||
</row>
|
||||
|
||||
<!-- Giant -->
|
||||
<row>
|
||||
<entry>Giant</entry>
|
||||
<entry><quote>Giant</quote></entry>
|
||||
<entry>
|
||||
<constant>MTX_DEF</constant> |
|
||||
<constant>MTX_RECURSE</constant>
|
||||
</entry>
|
||||
<entry>nearly everything</entry>
|
||||
<entry>lots</entry>
|
||||
</row>
|
||||
|
||||
<!-- The callout lock -->
|
||||
<row>
|
||||
<entry>callout_lock</entry>
|
||||
<entry><quote>callout lock</quote></entry>
|
||||
<entry>
|
||||
<constant>MTX_SPIN</constant> |
|
||||
<constant>MTX_RECURSE</constant>
|
||||
</entry>
|
||||
<entry>
|
||||
<varname>callfree</varname>,
|
||||
<varname>callwheel</varname>,
|
||||
<varname>nextsoftcheck</varname>,
|
||||
<structname/proc/.<structfield/p_itcallout/,
|
||||
<structname/proc/.<structfield/p_slpcallout/,
|
||||
<varname>softticks</varname>,
|
||||
<varname>ticks</varname>
|
||||
</entry>
|
||||
<entry>
|
||||
</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="locking-sx">
|
||||
<title>Shared Exclusive Locks</title>
|
||||
|
||||
<para>These locks provide basic reader-writer type functionality
|
||||
and may be held by a sleeping process. Currently they are
|
||||
backed by &man.lockmgr.9;.</para>
|
||||
|
||||
<table>
|
||||
<title>Shared Exclusive Lock List</title>
|
||||
|
||||
<tgroup cols="2">
|
||||
<thead>
|
||||
<row>
|
||||
<entry>Variable Name</entry>
|
||||
<entry>Protectees</entry>
|
||||
</row>
|
||||
</thead>
|
||||
<tbody>
|
||||
<row>
|
||||
<entry><varname>allproc_lock</varname></entry>
|
||||
<entry>
|
||||
<varname>allproc</varname>
|
||||
<varname>zombproc</varname>
|
||||
<varname>pidhashtbl</varname>
|
||||
<structname/proc/.<structfield/p_list/
|
||||
<structname/proc/.<structfield/p_hash/
|
||||
<varname>nextpid</varname>
|
||||
</entry>
|
||||
<entry><varname>proctree_lock</varname></entry>
|
||||
<entry>
|
||||
<structname/proc/.<structfield/p_children/
|
||||
<structname/proc/.<structfield/p_sibling/
|
||||
</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="locking-atomic">
|
||||
<title>Atomically Protected Variables</title>
|
||||
|
||||
<para>An atomically protected variable is a special variable that
|
||||
is not protected by an explicit lock. Instead, all data
|
||||
accesses to the variables use special atomic operations as
|
||||
described in &man.atomic.9;. Very few variables are treated
|
||||
this way, although other synchronization primitives such as
|
||||
mutexes are implemented with atomically protected
|
||||
variables.</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para><structname/mtx/.<structfield/mtx_lock/</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</sect1>
|
||||
</chapter>
|
|
@ -1,110 +0,0 @@
|
|||
<!-- $FreeBSD$ -->
|
||||
|
||||
<!ENTITY mac.mpo "mpo">
|
||||
<!ENTITY mac.thead '
|
||||
<colspec colname="first" colwidth="0">
|
||||
<colspec colwidth="0">
|
||||
<colspec colname="last" colwidth="0">
|
||||
|
||||
<thead>
|
||||
<row>
|
||||
<entry>Parameter</entry>
|
||||
<entry>Description</entry>
|
||||
<entry>Locking</entry>
|
||||
</row>
|
||||
</thead>
|
||||
'>
|
||||
|
||||
<!ENTITY mac.externalize.paramdefs '
|
||||
<paramdef>struct label *<parameter>label</parameter></paramdef>
|
||||
<paramdef>char *<parameter>element_name</parameter></paramdef>
|
||||
<paramdef>struct sbuf *<parameter>sb</parameter></paramdef>
|
||||
<paramdef>int <parameter>*claimed</parameter></paramdef>
|
||||
'>
|
||||
|
||||
<!ENTITY mac.externalize.tbody '
|
||||
<tbody>
|
||||
<row>
|
||||
<entry><parameter>label</parameter></entry>
|
||||
<entry>Label to be externalized</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry><parameter>element_name</parameter>
|
||||
<entry>Name of the policy whose label should be externalized</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry><parameter>sb</parameter>
|
||||
<entry>String buffer to be filled with a text representation of
|
||||
label</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry><parameter>claimed</parameter></entry>
|
||||
<entry>Should be incremented when <parameter>element_data</parameter>
|
||||
can be filled in.</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
'>
|
||||
|
||||
<!ENTITY mac.externalize.para "
|
||||
<para>Produce an externalized label based on the label structure passed.
|
||||
An externalized label consists of a text representation of the label
|
||||
contents that can be used with userland applications and read by the
|
||||
user. Currently, all policies' <function>externalize</function> entry
|
||||
points will be called, so the implementation should check the contents
|
||||
of <parameter>element_name</parameter> before attempting to fill in
|
||||
<parameter>sb</parameter>. If
|
||||
<parameter>element_name</parameter> does not match the name of your
|
||||
policy, simply return <returnvalue>0</returnvalue>. Only return nonzero
|
||||
if an error occurs while externalizing the label data. Once the policy
|
||||
fills in <parameter>element_data</parameter>, <varname>*claimed</varname>
|
||||
should be incremented.</para>
|
||||
">
|
||||
|
||||
<!ENTITY mac.internalize.paramdefs '
|
||||
<paramdef>struct label *<parameter>label</parameter></paramdef>
|
||||
<paramdef>char *<parameter>element_name</parameter></paramdef>
|
||||
<paramdef>char *<parameter>element_data</parameter></paramdef>
|
||||
<paramdef>int *<parameter>claimed</parameter></paramdef>
|
||||
'>
|
||||
|
||||
<!ENTITY mac.internalize.tbody '
|
||||
<tbody>
|
||||
<row>
|
||||
<entry><parameter>label</parameter></entry>
|
||||
<entry>Label to be filled in</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry><parameter>element_name</parameter></entry>
|
||||
<entry>Name of the policy whose label should be internalized</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry><parameter>element_data</parameter></entry>
|
||||
<entry>Text data to be internalized</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry><parameter>claimed</parameter></entry>
|
||||
<entry>Should be incremented when data can be successfully
|
||||
internalized.</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
'>
|
||||
|
||||
<!ENTITY mac.internalize.para "
|
||||
<para>Produce an internal label structure based on externalized label data
|
||||
in text format. Currently, all policies' <function>internalize</function>
|
||||
entry points are called when internalization is requested, so the
|
||||
implementation should compare the contents of
|
||||
<parameter>element_name</parameter> to its own name in order to be sure
|
||||
it should be internalizing the data in <parameter>element_data</parameter>.
|
||||
Just as in the <function>externalize</function> entry points, the entry
|
||||
point should return <returnvalue>0</returnvalue> if
|
||||
<parameter>element_name</parameter> does not match its own name, or when
|
||||
data can successfully be internalized, in which case
|
||||
<varname>*claimed</varname> should be incremented.</para>
|
||||
">
|
File diff suppressed because it is too large
Load diff
|
@ -1,360 +0,0 @@
|
|||
<!--
|
||||
The FreeBSD Documentation Project
|
||||
$FreeBSD$
|
||||
|
||||
Originally by: Jeroen Ruigrok van der Warven
|
||||
Date: newbus-draft.txt,v 1.8 2001/01/25 08:01:08
|
||||
Copyright (c) 2000 Jeroen Ruigrok van der Warven (asmodai@wxs.nl)
|
||||
Copyright (c) 2002 Hiten Mahesh Pandya (hiten@uk.FreeBSD.org)
|
||||
|
||||
Future Additions:
|
||||
|
||||
o Expand the information about device_t
|
||||
o Add information about the bus_* functions.
|
||||
o Add information about bus specific (e.g. PCI) functions.
|
||||
o Add a reference section for additional information.
|
||||
o Add more newbus related structures and typedefs.
|
||||
o Add a 'Terminology' section.
|
||||
o Add information on resource manager functions, busspace
|
||||
manager functions, newbus events related functions.
|
||||
o More cleanup ... !
|
||||
|
||||
Provided under the FreeBSD Documentation License.
|
||||
-->
|
||||
<chapter id="newbus">
|
||||
<chapterinfo>
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Jeroen</firstname>
|
||||
<surname>Ruigrok van der Werven (asmodai)</surname>
|
||||
<affiliation><address><email>asmodai@FreeBSD.org</email></address>
|
||||
</affiliation>
|
||||
<contrib>Written by </contrib>
|
||||
</author>
|
||||
<author>
|
||||
<firstname>Hiten</firstname>
|
||||
<surname>Pandya</surname>
|
||||
<affiliation><address><email>hiten@uk.FreeBSD.org</email></address>
|
||||
</affiliation>
|
||||
</author>
|
||||
</authorgroup>
|
||||
</chapterinfo>
|
||||
<title>Newbus</title>
|
||||
|
||||
<para><emphasis>Special thanks to Matthew N. Dodd, Warner Losh, Bill Paul,
|
||||
Doug Rabson, Mike Smith, Peter Wemm and Scott Long</emphasis>.</para>
|
||||
|
||||
<para>This chapter explains the Newbus device framework in detail.</para>
|
||||
<sect1 id="devdrivers">
|
||||
<title>Device Drivers</title>
|
||||
<sect2>
|
||||
<title>Purpose of a Device Driver</title>
|
||||
<para>A device driver is a software component which provides the
|
||||
interface between the kernel's generic view of a peripheral
|
||||
(e.g. disk, network adapter) and the actual implementation of the
|
||||
peripheral. The <emphasis>device driver interface (DDI)</emphasis> is
|
||||
the defined interface between the kernel and the device driver component.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Types of Device Drivers</title>
|
||||
<para>There used to be days in &unix;, and thus FreeBSD, in which there
|
||||
were four types of devices defined:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para>block device drivers</para></listitem>
|
||||
<listitem><para>character device drivers</para></listitem>
|
||||
<listitem><para>network device drivers</para></listitem>
|
||||
<listitem><para>pseudo-device drivers</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para><emphasis>Block devices</emphasis> performed in way that used
|
||||
fixed size blocks [of data]. This type of driver depended on the
|
||||
so called <emphasis>buffer cache</emphasis>, which had the purpose
|
||||
to cache accessed blocks of data in a dedicated part of the memory.
|
||||
Often this buffer cache was based on write-behind, which meant that when
|
||||
data was modified in memory it got synced to disk whenever the system
|
||||
did its periodical disk flushing, thus optimizing writes.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Character devices</title>
|
||||
<para>However, in the versions of FreeBSD 4.0 and onward the
|
||||
distinction between block and character devices became non-existent.
|
||||
</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="newbus-overview">
|
||||
<!--
|
||||
Real title:
|
||||
Newbus, Busspace and the Resource Manager, an Explanation of the Possibilities
|
||||
-->
|
||||
<title>Overview of Newbus</title>
|
||||
<para><emphasis>Newbus</emphasis> is the implementation of a new bus
|
||||
architecture based on abstraction layers which saw its introduction in
|
||||
FreeBSD 3.0 when the Alpha port was imported into the source tree. It was
|
||||
not until 4.0 before it became the default system to use for device
|
||||
drivers. Its goals are to provide a more object oriented means of
|
||||
interconnecting the various busses and devices which a host system
|
||||
provides to the <emphasis>Operating System</emphasis>.</para>
|
||||
|
||||
<para>Its main features include amongst others:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para>dynamic attaching</para></listitem>
|
||||
<listitem><para>easy modularization of drivers</para></listitem>
|
||||
<listitem><para>pseudo-busses</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>One of the most prominent changes is the migration from the flat and
|
||||
ad-hoc system to a device tree lay-out.</para>
|
||||
|
||||
<para>At the top level resides the <emphasis><quote>root</quote></emphasis>
|
||||
device which is the parent to hang all other devices on. For each
|
||||
architecture, there is typically a single child of <quote>root</quote>
|
||||
which has such things as <emphasis>host-to-PCI bridges</emphasis>, etc.
|
||||
attached to it. For x86, this <quote>root</quote> device is the
|
||||
<emphasis><quote>nexus</quote></emphasis> device and for Alpha, various
|
||||
different different models of Alpha have different top-level devices
|
||||
corresponding to the different hardware chipsets, including
|
||||
<emphasis>lca</emphasis>, <emphasis>apecs</emphasis>,
|
||||
<emphasis>cia</emphasis> and <emphasis>tsunami</emphasis>.</para>
|
||||
|
||||
<para>A device in the Newbus context represents a single hardware entity
|
||||
in the system. For instance each PCI device is represented by a Newbus
|
||||
device. Any device in the system can have children; a device which has
|
||||
children is often called a <emphasis><quote>bus</quote></emphasis>.
|
||||
Examples of common busses in the system are ISA and PCI which manage lists
|
||||
of devices attached to ISA and PCI busses respectively.</para>
|
||||
|
||||
<para>Often, a connection between different kinds of bus is represented by
|
||||
a <emphasis><quote>bridge</quote></emphasis> device which normally has one
|
||||
child for the attached bus. An example of this is a
|
||||
<emphasis>PCI-to-PCI bridge</emphasis> which is represented by a device
|
||||
<emphasis><devicename>pcibN</devicename></emphasis> on the parent PCI bus
|
||||
and has a child <emphasis><devicename>pciN</devicename></emphasis> for the
|
||||
attached bus. This layout simplifies the implementation of the PCI bus
|
||||
tree, allowing common code to be used for both top-level and bridged
|
||||
busses.</para>
|
||||
|
||||
<para>Each device in the Newbus architecture asks its parent to map its
|
||||
resources. The parent then asks its own parent until the nexus is
|
||||
reached. So, basically the nexus is the only part of the Newbus system
|
||||
which knows about all resources.</para>
|
||||
|
||||
<tip><para>An ISA device might want to map its IO port at
|
||||
<literal>0x230</literal>, so it asks its parent, in this case the ISA
|
||||
bus. The ISA bus hands it over to the PCI-to-ISA bridge which in its turn
|
||||
asks the PCI bus, which reaches the host-to-PCI bridge and finally the
|
||||
nexus. The beauty of this transition upwards is that there is room to
|
||||
translate the requests. For example, the <literal>0x230</literal> IO port
|
||||
request might become memory-mapped at <literal>0xb0000230</literal> on a
|
||||
<acronym>MIPS</acronym> box by the PCI bridge.</para></tip>
|
||||
|
||||
<para>Resource allocation can be controlled at any place in the device
|
||||
tree. For instance on many Alpha platforms, ISA interrupts are managed
|
||||
separately from PCI interrupts and resource allocations for ISA interrupts
|
||||
are managed by the Alpha's ISA bus device. On IA-32, ISA and PCI
|
||||
interrupts are both managed by the top-level nexus device. For both
|
||||
ports, memory and port address space is managed by a single entity - nexus
|
||||
for IA-32 and the relevant chipset driver on Alpha (e.g. CIA or tsunami).
|
||||
</para>
|
||||
|
||||
<para>In order to normalize access to memory and port mapped resources,
|
||||
Newbus integrates the <literal>bus_space</literal> APIs from NetBSD.
|
||||
These provide a single API to replace inb/outb and direct memory
|
||||
reads/writes. The advantage of this is that a single driver can easily
|
||||
use either memory-mapped registers or port-mapped registers
|
||||
(some hardware supports both).</para>
|
||||
|
||||
<para>This support is integrated into the resource allocation mechanism.
|
||||
When a resource is allocated, a driver can retrieve the associated
|
||||
<structfield>bus_space_tag_t</structfield> and
|
||||
<structfield>bus_space_handle_t</structfield> from the resource.</para>
|
||||
|
||||
<para>Newbus also allows for definitions of interface methods in files
|
||||
dedicated to this purpose. These are the <filename>.m</filename> files
|
||||
that are found under the <filename>src/sys</filename> hierarchy.</para>
|
||||
|
||||
<para>The core of the Newbus system is an extensible
|
||||
<quote>object-based programming</quote> model. Each device in the system
|
||||
has a table of methods which it supports. The system and other devices
|
||||
uses those methods to control the device and request services. The
|
||||
different methods supported by a device are defined by a number of
|
||||
<quote>interfaces</quote>. An <quote>interface</quote> is simply a group
|
||||
of related methods which can be implemented by a device.</para>
|
||||
|
||||
<para>In the Newbus system, the methods for a device are provided by the
|
||||
various device drivers in the system. When a device is attached to a
|
||||
driver during <emphasis>auto-configuration</emphasis>, it uses the method
|
||||
table declared by the driver. A device can later
|
||||
<emphasis>detach</emphasis> from its driver and
|
||||
<emphasis>re-attach</emphasis> to a new driver with a new method table.
|
||||
This allows dynamic replacement of drivers which can be useful for driver
|
||||
development.</para>
|
||||
|
||||
<para>The interfaces are described by an interface definition language
|
||||
similar to the language used to define vnode operations for file systems.
|
||||
The interface would be stored in a methods file (which would normally named
|
||||
<filename>foo_if.m</filename>).</para>
|
||||
|
||||
<example>
|
||||
<title>Newbus Methods</title>
|
||||
<programlisting>
|
||||
# Foo subsystem/driver (a comment...)
|
||||
|
||||
INTERFACE foo
|
||||
|
||||
METHOD int doit {
|
||||
device_t dev;
|
||||
};
|
||||
|
||||
# DEFAULT is the method that will be used, if a method was not
|
||||
# provided via: DEVMETHOD()
|
||||
|
||||
METHOD void doit_to_child {
|
||||
device_t dev;
|
||||
driver_t child;
|
||||
} DEFAULT doit_generic_to_child;
|
||||
</programlisting>
|
||||
</example>
|
||||
|
||||
<para>When this interface is compiled, it generates a header file
|
||||
<quote><filename>foo_if.h</filename></quote> which contains function
|
||||
declarations:</para>
|
||||
|
||||
<programlisting>
|
||||
int FOO_DOIT(device_t dev);
|
||||
int FOO_DOIT_TO_CHILD(device_t dev, device_t child);
|
||||
</programlisting>
|
||||
|
||||
<para>A source file, <quote><filename>foo_if.c</filename></quote> is
|
||||
also created to accompany the automatically generated header file; it
|
||||
contains implementations of those functions which look up the location
|
||||
of the relevant functions in the object's method table and call that
|
||||
function.</para>
|
||||
|
||||
<para>The system defines two main interfaces. The first fundamental
|
||||
interface is called <emphasis><quote>device</quote></emphasis> and
|
||||
includes methods which are relevant to all devices. Methods in the
|
||||
<emphasis><quote>device</quote></emphasis> interface include
|
||||
<emphasis><quote>probe</quote></emphasis>,
|
||||
<emphasis><quote>attach</quote></emphasis> and
|
||||
<emphasis><quote>detach</quote></emphasis> to control detection of
|
||||
hardware and <emphasis><quote>shutdown</quote></emphasis>,
|
||||
<emphasis><quote>suspend</quote></emphasis> and
|
||||
<emphasis><quote>resume</quote></emphasis> for critical event
|
||||
notification.</para>
|
||||
|
||||
<para>The second, more complex interface is
|
||||
<emphasis><quote>bus</quote></emphasis>. This interface contains
|
||||
methods suitable for devices which have children, including methods to
|
||||
access bus specific per-device information
|
||||
<footnote><para>&man.bus.generic.read.ivar.9; and
|
||||
&man.bus.generic.write.ivar.9;</para></footnote>, event notification
|
||||
(<emphasis><literal>child_detached</literal></emphasis>,
|
||||
<emphasis><literal>driver_added</literal></emphasis>) and resource
|
||||
management (<emphasis><literal>alloc_resource</literal></emphasis>,
|
||||
<emphasis><literal>activate_resource</literal></emphasis>,
|
||||
<emphasis><literal>deactivate_resource</literal></emphasis>,
|
||||
<emphasis><literal>release_resource</literal></emphasis>).</para>
|
||||
|
||||
<para>Many methods in the <quote>bus</quote> interface are performing
|
||||
services for some child of the bus device. These methods would normally
|
||||
use the first two arguments to specify the bus providing the service
|
||||
and the child device which is requesting the service. To simplify
|
||||
driver code, many of these methods have accessor functions which
|
||||
lookup the parent and call a method on the parent. For instance the
|
||||
method
|
||||
<literal>BUS_TEARDOWN_INTR(device_t dev, device_t child, ...)</literal>
|
||||
can be called using the function
|
||||
<literal>bus_teardown_intr(device_t child, ...)</literal>.</para>
|
||||
|
||||
<para>Some bus types in the system define additional interfaces to
|
||||
provide access to bus-specific functionality. For instance, the PCI
|
||||
bus driver defines the <quote>pci</quote> interface which has two
|
||||
methods <emphasis><literal>read_config</literal></emphasis> and
|
||||
<emphasis><literal>write_config</literal></emphasis> for accessing the
|
||||
configuration registers of a PCI device.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="newbus-api">
|
||||
<title>Newbus API</title>
|
||||
<para>As the Newbus API is huge, this section makes some effort at
|
||||
documenting it. More information to come in the next revision of this
|
||||
document.</para>
|
||||
|
||||
<sect2>
|
||||
<title>Important locations in the source hierarchy</title>
|
||||
|
||||
<para><filename>src/sys/[arch]/[arch]</filename> - Kernel code for a
|
||||
specific machine architecture resides in this directory. for example,
|
||||
the <literal>i386</literal> architecture, or the
|
||||
<literal>SPARC64</literal> architecture.</para>
|
||||
|
||||
<para><filename>src/sys/dev/[bus]</filename> - device support for a
|
||||
specific <literal>[bus]</literal> resides in this directory.</para>
|
||||
|
||||
<para><filename>src/sys/dev/pci</filename> - PCI bus support code
|
||||
resides in this directory.</para>
|
||||
|
||||
<para><filename>src/sys/[isa|pci]</filename> - PCI/ISA device drivers
|
||||
reside in this directory. The PCI/ISA bus support code used to exist
|
||||
in this directory in FreeBSD version <literal>4.0</literal>.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Important structures and type definitions</title>
|
||||
<para><literal>devclass_t</literal> - This is a type definition of a
|
||||
pointer to a <literal>struct devclass</literal>.</para>
|
||||
|
||||
<para><literal>device_method_t</literal> - This is same as
|
||||
<literal>kobj_method_t</literal> (see
|
||||
<filename>src/sys/kobj.h</filename>).</para>
|
||||
|
||||
<para><literal>device_t</literal> - This is a type definition of a
|
||||
pointer to a <literal>struct device</literal>.
|
||||
<literal>device_t</literal> represents a device in the system. It is
|
||||
a kernel object. See <filename>src/sys/sys/bus_private.h</filename>
|
||||
for implementation details.</para>
|
||||
|
||||
<para><literal>driver_t</literal> - This is a type definition which,
|
||||
references <literal>struct driver</literal>. The
|
||||
<literal>driver</literal> struct is a class of the
|
||||
<literal>device</literal> kernel object; it also holds data private
|
||||
to for the driver.</para>
|
||||
|
||||
<figure>
|
||||
<title><emphasis>driver_t</emphasis> implementation</title>
|
||||
<programlisting>
|
||||
struct driver {
|
||||
KOBJ_CLASS_FIELDS;
|
||||
void *priv; /* driver private data */
|
||||
};
|
||||
</programlisting>
|
||||
</figure>
|
||||
|
||||
<para>A <literal>device_state_t</literal> type, which is
|
||||
an enumeration, <literal>device_state</literal>. It contains
|
||||
the possible states of a Newbus device before and after the
|
||||
autoconfiguration process.</para>
|
||||
|
||||
<figure>
|
||||
<title>Device states<emphasis>device_state_t</emphasis></title>
|
||||
<programlisting>
|
||||
/*
|
||||
* src/sys/sys/bus.h
|
||||
*/
|
||||
typedef enum device_state {
|
||||
DS_NOTPRESENT, /* not probed or probe failed */
|
||||
DS_ALIVE, /* probe succeeded */
|
||||
DS_ATTACHED, /* attach method called */
|
||||
DS_BUSY /* device is open */
|
||||
} device_state_t;
|
||||
</programlisting>
|
||||
</figure>
|
||||
</sect2>
|
||||
</sect1>
|
||||
</chapter>
|
|
@ -1,337 +0,0 @@
|
|||
<!--
|
||||
The FreeBSD Documentation Project
|
||||
|
||||
$FreeBSD$
|
||||
-->
|
||||
|
||||
<chapter id="pccard">
|
||||
<title>PC Card</title>
|
||||
|
||||
<para>This chapter will talk about the FreeBSD mechanisms for
|
||||
writing a device driver for a PC Card or CardBus device. However,
|
||||
at the present time, it just documents how to add a driver to an
|
||||
existing pccard driver.</para>
|
||||
|
||||
<sect1 id="pccard-adddev">
|
||||
<title>Adding a device</title>
|
||||
|
||||
<para>Adding a new device to the list of supported devices for
|
||||
pccard devices has changed form the system used through FreeBSD
|
||||
4. In prior versions, editing a file in /etc to list the device
|
||||
was necessary. Starting in FreeBSD 5.0, devices drivers know what
|
||||
devices they support. There is now a table of supported devices
|
||||
in the kernel that drivers use to attach to a device.</para>
|
||||
|
||||
<sect2 id="pccard-overview">
|
||||
<title>Overview</title>
|
||||
|
||||
<para>PC Cards are identified in one of two ways, both based on
|
||||
information in the CIS of the card. The first method is to use
|
||||
numberic manufacturer and product numbers. The second method is
|
||||
to use the human readable strings that are also contained in the
|
||||
CIS as well. The PC Card bus uses a centralized database and
|
||||
some macros to facilitate a design pattern to help the driver
|
||||
writer match devices to his driver.</para>
|
||||
|
||||
<para>There is a widespread practice of one company developing a
|
||||
reference design for a PC Card product and then selling this
|
||||
design to other companies to market. Those companies refine the
|
||||
design, market the product to their target audience or
|
||||
geographic area and put their own name plate onto the card.
|
||||
However, the refinements to the physical card typically are very
|
||||
minor, if any changes are made at all. Often, however, to
|
||||
strengthen their branding of their version of the card, these
|
||||
vendors will place their company name in the human strings in
|
||||
the CIS space, but leave the manufacturer and product ids
|
||||
unchanged.</para>
|
||||
|
||||
<param>Because of the above practice, it is a smaller work load
|
||||
for FreeBSD to use the numeric IDs. It also introduces some
|
||||
minor complications into the process of adding IDs to the
|
||||
system. One must carefully check to see who really made the
|
||||
card, especially when it appears that the vendor who made the
|
||||
card from might already have a different manufacturer id listed
|
||||
in the central database. Linksys, D-Link and NetGear are a
|
||||
number of US Manufactuers of LAN hardware that often sell the
|
||||
same design. These same designs can be sold in Japan under the
|
||||
names such as Buffalo and Corega. Yet often, these devices will
|
||||
all have the same manufacturer and product id.</param>
|
||||
|
||||
<param>The PC Card bus keeps its central database of card
|
||||
information, but not which driver is associated with them, in
|
||||
/sys/dev/pccard/pccarddevs. It also provides a set of macros
|
||||
that allow one to easily construct simple entries in the table
|
||||
the driver uses to claim devices.</param>
|
||||
|
||||
<param>Finally, some really low end divices do not contain
|
||||
manufacturer identification at all. These devices require that
|
||||
one matches them using the human readable CIS strings. While it
|
||||
would be nice if we didn't need this method as a fallback, it is
|
||||
necessary for some very low end CD-ROM players that are quite
|
||||
popular. This method should generally be avoided, but a number
|
||||
of devices are listed in this section because they were added
|
||||
prior to the recognition of the OEM nature of the PC Card
|
||||
buisiness. When adding new devices, prefer using the numberic
|
||||
method.</param>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2 id="pccard-pccarddevs">
|
||||
<title>Format of pccarddevs</title>
|
||||
|
||||
<para>There are four sections of the pccarddevs files. The
|
||||
first section lists the manufacturer numbers for those vendors
|
||||
that use them. This section is sorted in numerical order. The
|
||||
next section has all of the products that are used by these
|
||||
vendors, along with their product ID numbers and a description
|
||||
string. The description string typically isn't used (instead we
|
||||
set the device's description based on the human readable CIS,
|
||||
even if we match on the numeric version). These two sections
|
||||
are then repeated for those devices that use the string matching
|
||||
method. Finally, C-style comments are allowed anywhere in the
|
||||
file.</para>
|
||||
|
||||
<para>The first section of the file contains the vendor IDs.
|
||||
Please keep this list sorted in numeric order. Also, please
|
||||
coordinate changes to this file because we share it with
|
||||
NetBSD to help facilitate a common clearing hose for this
|
||||
information. For example:
|
||||
<programlisting>vendor FUJITSU 0x0004 Fujitsu Corporation
|
||||
vendor NETGEAR_2 0x000b Netgear
|
||||
vendor PANASONIC 0x0032 Matsushita Electric Industrial Co.
|
||||
vendor SANDISK 0x0045 Sandisk Corporation
|
||||
</programlisting>
|
||||
shows the first few vendor ids. Chances are very good that the
|
||||
NETGEAR_2 entry is really an OEM that NETGEAR purchased cards
|
||||
from and the author of support for those cards was unaware at
|
||||
the time that Netgear was using someone else's id. These
|
||||
entries are fairly straight forward. There's the vendor keyword
|
||||
used to denote the kind of line that this is. There's the name
|
||||
of the vendor. This name will be repated later in the
|
||||
pccarddevs file, as well as used in the driver's match tables,
|
||||
so keep it short and a valid C identifier. There's a numeric
|
||||
ID, in hex, for the manufacturer. Do not add IDs of the form
|
||||
0xffffffff or 0xffff because these are reserved ids (the former
|
||||
is 'no id set' while the latter is sometimes seen in extremely
|
||||
poor quality cards to try to indicate 'none). Finally there's a
|
||||
string description of the company that makes the card. This is
|
||||
string is not used in FreeBSD for anything but commentary
|
||||
purposes.
|
||||
|
||||
<para>The second section of the file contains the products.
|
||||
As you can see in the following example:
|
||||
<programlisting>/* Allied Telesis K.K. */
|
||||
product ALLIEDTELESIS LA_PCM 0x0002 Allied Telesis LA-PCM
|
||||
|
||||
/* Archos */
|
||||
product ARCHOS ARC_ATAPI 0x0043 MiniCD
|
||||
</programlisting>
|
||||
the format is similar to the vendor lines. There is the product
|
||||
keyword. Then there is the vendor name, repeated from above.
|
||||
This is followed by the product name, which is used by the
|
||||
driver and should be a valid C identifier, but may also start
|
||||
with a number. There's then the product id for this card, in
|
||||
hex. As with the vendors, there's the same convention for
|
||||
0xffffffff and 0xffff. Finally, there's a string description of
|
||||
the device itself. This string typically is not used in
|
||||
FreeBSD, since FreeBSD's pccard bus driver will construct a
|
||||
string from the human readable CIS entries, but can be used in
|
||||
the rare cases where this is somehow insufficient. The products
|
||||
are in alphabetical order by manufacturer, then numerical order by
|
||||
product id. They have a C comment before each manufacturer's
|
||||
entries and there is a blank line between entries.</para>
|
||||
|
||||
<para>The third section is like the previous vendor section, but
|
||||
with all of the manufacturer numeric ids as -1. -1 means 'match
|
||||
anything you find' in the FreeBSD pccard bus code. Since these
|
||||
are C identifiers, their names must be unique. Otherwise the
|
||||
format is identical to the first section of the file.</para>
|
||||
|
||||
<para>The final section contains the entries for those cards
|
||||
that we must match with string entries. This sections' format
|
||||
is a little different than the neric section:
|
||||
<programlisting>product ADDTRON AWP100 { "Addtron", "AWP-100&spWireless&spPCMCIA", "Version&sp01.02", NULL }
|
||||
product ALLIEDTELESIS WR211PCM { "Allied&spTelesis&spK.K.", "WR211PCM", NULL, NULL } Allied Telesis WR211PCM
|
||||
</programlisting>
|
||||
We have the familiar product keyword, followed by the vendor
|
||||
name followed by the card name, just as in the second section of
|
||||
the file. However, then we deviate from that format. There is
|
||||
a {} grouping, followed by a number of strings. These strings
|
||||
correspond to the vendor, product and extra information that is
|
||||
defined in a CIS_INFO tuple. These strings are filtered by the
|
||||
program that generates pccarddevs.h to replace &sp with a
|
||||
real space. NULL entries mean that that part of the entry
|
||||
should be ignored. In the example I've picked, there's a bad
|
||||
entry. It shouldn't contain the version number in it unless
|
||||
that's critical for the operatin of the card. Sometimes vendors
|
||||
will have many different versions of the card in the field that
|
||||
all work, in which case that information only makes it harder
|
||||
for someone with a similar card to use it with FreeBSD.
|
||||
Sometimes it is necessary when a vendor wishes to sell many
|
||||
different parts under the same brand due to market
|
||||
considerations (availability, price, and so forth). Then it can
|
||||
be critical to disambiguating the card in those rare cases where
|
||||
the vendor kept the same manufacturer/product pair. Regular
|
||||
expression matching is not available at this time.</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2 id="pccard-probe">
|
||||
<title>Sample probe routine</title>
|
||||
|
||||
<para>To understand how to add a device to list of supported
|
||||
devices, one must understand the probe and/or match routines
|
||||
that many drivers have. It is complicated a little in FreeBSD
|
||||
5.x because there is a compatibility layer for OLDCARD present
|
||||
as well. Since only the window-dressing is different, I'll be
|
||||
presenting an lidealized version.</para>
|
||||
|
||||
<programlisting>static const struct pccard_product wi_pccard_products[] = {
|
||||
PCMCIA_CARD(3COM, 3CRWE737A, 0),
|
||||
PCMCIA_CARD(BUFFALO, WLI_PCM_S11, 0),
|
||||
PCMCIA_CARD(BUFFALO, WLI_CF_S11G, 0),
|
||||
PCMCIA_CARD(TDK, LAK_CD011WL, 0),
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
static int
|
||||
wi_pccard_probe(dev)
|
||||
device_t dev;
|
||||
{
|
||||
const struct pccard_product *pp;
|
||||
|
||||
if ((pp = pccard_product_lookup(dev, wi_pccard_products,
|
||||
sizeof(wi_pccard_products[0]), NULL)) != NULL) {
|
||||
if (pp->pp_name != NULL)
|
||||
device_set_desc(dev, pp->pp_name);
|
||||
return (0);
|
||||
}
|
||||
return (ENXIO);
|
||||
}
|
||||
</programlisting>
|
||||
|
||||
<para>Here we have a simple pccard probe routine that matches a
|
||||
few devices. As stated above, the name may vary (if it isn't
|
||||
<function>foo_pccard_probe()</function> it will be
|
||||
<function>foo_pccard_match()</function>). The function
|
||||
<function>pccard_product_lookup()</function> is a generalized
|
||||
function that walks the table and returns a pointer to the
|
||||
first entry that it matches. Some drivers may use this
|
||||
mechanism to convey addtional information about some cards to
|
||||
the rest of the driver, so there may be some variance in the
|
||||
table. The only requirement is that if you have a
|
||||
different table, the first element of the structure you have a
|
||||
table of be a struct pccard_product.</para>
|
||||
|
||||
<para>Looking at the table wi_pccard_products, one notices that
|
||||
all the entries are of the form PCMCIA_CARD(foo, bar, baz).
|
||||
The foo part is the manufacturer id from pccarddevs. The bar
|
||||
part is the product. The baz is the expected function number
|
||||
that for this card. Many pccards can have multiple functions,
|
||||
and some way to disambiguate function 1 from function 0 is
|
||||
needed. You may see PCMCIA_CARD_D, which includes the device
|
||||
description from the pccarddevs file. You may also see
|
||||
PCMCIA_CARD2 and PCMCIA_CARD2_D which are used when you need
|
||||
to match CIS both CIS strings and manufacturer numbers, in the
|
||||
'use the default descrition' and 'take the descrition from
|
||||
pccarddevs' flavors.</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2 id="pccard-add">
|
||||
<title>Putting it all together</title>
|
||||
|
||||
<para>So, to add a new device, one must do the following steps.
|
||||
First, one must obtain the identification information from the
|
||||
device. The easiest way to do this is to insert the device into
|
||||
a PC Card or CF slot and issue devinfo -v. You'll likely see
|
||||
something like:
|
||||
<programlisting> cbb1 pnpinfo vendor=0x104c device=0xac51 subvendor=0x1265 subdevice=0x0300 class=0x060700 at slot=10 function=1
|
||||
cardbus1
|
||||
pccard1
|
||||
unknown pnpinfo manufacturer=0x026f product=0x030c cisvendor="BUFFALO" cisproduct="WLI2-CF-S11" function_type=6 at function=0
|
||||
</programlisting>
|
||||
as part of the output. The manufacturer and product are the
|
||||
numeric IDs for this product. While the cisvendor and
|
||||
cisproduct are the strings that are present in the CIS that
|
||||
describe this product.</para>
|
||||
|
||||
<para>Since we first want to prefer the
|
||||
numeric option, first try to construct an entry based on that.
|
||||
The above card has been slightly fictionalized for the purpose
|
||||
of this example. The vendor is BUFFALO, which we see already
|
||||
has an entry:
|
||||
<programlisting>vendor BUFFALO 0x026f BUFFALO (Melco Corporation)
|
||||
</programlisting>
|
||||
so we're good there. Looking for an entry for this card, we do
|
||||
not find one. Instead we find:
|
||||
<programlisting>/* BUFFALO */
|
||||
product BUFFALO WLI_PCM_S11 0x0305 BUFFALO AirStation 11Mbps WLAN
|
||||
product BUFFALO LPC_CF_CLT 0x0307 BUFFALO LPC-CF-CLT
|
||||
product BUFFALO LPC3_CLT 0x030a BUFFALO LPC3-CLT Ethernet Adapter
|
||||
product BUFFALO WLI_CF_S11G 0x030b BUFFALO AirStation 11Mbps CF WLAN
|
||||
</programlisting>
|
||||
we can just add
|
||||
<programlisting>product BUFFALO WLI2_CF_S11G 0x030c BUFFALO AirStation ultra 802.11b CF
|
||||
</programlisting>
|
||||
to pccarddevs. Presently, there is a manual step to regenerate
|
||||
the pccarddevs.h file used to convey these identifiers to the
|
||||
the client driver. The following steps must be done before you
|
||||
can use them in the driver:
|
||||
<programlisting>cd src/sys/dev/pccard
|
||||
make -f Makefile.pccarddevs
|
||||
</programlisting>
|
||||
</para>
|
||||
|
||||
<para>Once these steps are complete, you can add the card to the
|
||||
driver. That is a simple operation of adding one line:
|
||||
<programlisting>static const struct pccard_product wi_pccard_products[] = {
|
||||
PCMCIA_CARD(3COM, 3CRWE737A, 0),
|
||||
PCMCIA_CARD(BUFFALO, WLI_PCM_S11, 0),
|
||||
PCMCIA_CARD(BUFFALO, WLI_CF_S11G, 0),
|
||||
+ PCMCIA_CARD(BUFFALO, WLI_CF2_S11G, 0),
|
||||
PCMCIA_CARD(TDK, LAK_CD011WL, 0),
|
||||
{ NULL }
|
||||
};
|
||||
</programlisting>
|
||||
Note that I've included a '+' in the line before the line that I
|
||||
added, but that is simply to highlight the line. Do not add it
|
||||
to the eactual driver. Once you've added the line, you can
|
||||
recompile your kernel or module and try to see if it recognizes
|
||||
the device. If it does and works, please submit a patch. If it
|
||||
doesn't work, please figure out what is needed to make it work
|
||||
and submit a patch. If it didn't recgonize it at all, you have
|
||||
done something wrong and should recheck each step.</para>
|
||||
|
||||
<para>If you are a FreeBSD src committer, and everything appears
|
||||
to be working, then you can commit the changes to the tree.
|
||||
However, there are some minor tricky things that you need to
|
||||
worry about. First, you must commit the pccarddevs file to the
|
||||
tree first. After you have done that, you must regenerate
|
||||
pccarddevs.h after the commit of pccarddevs and commit that as a
|
||||
second commit (this is to make sure that the right $FreeBSD$ tag
|
||||
is in the latter file). Finally, you need to commit the
|
||||
additions to the driver.</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2 id="pccard-pr">
|
||||
<title>Submitting a new device</title>
|
||||
|
||||
<para>Many people send entries for new devices to the author
|
||||
directly. Please do not do this. Please submit them as a PR
|
||||
and send the author the PR number for his records. This makes
|
||||
sure that entries aren't lost. When submitting a PR, it is
|
||||
unnecessary to include the pccardevs.h diffs in the patch, since
|
||||
those will be regenerated. It is necessary to include a
|
||||
descrition of the device, as well as the patches to the client
|
||||
driver. If you don't know the name, use OEM99 as the name, and
|
||||
the author will adjust OEM99 accordingly after investigation.
|
||||
Committers should not commit OEM99, but instead find the highest
|
||||
OEM entry and commit one more than that.</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
</sect1>
|
||||
|
||||
</chapter>
|
|
@ -1,378 +0,0 @@
|
|||
<!--
|
||||
The FreeBSD Documentation Project
|
||||
|
||||
$FreeBSD$
|
||||
-->
|
||||
|
||||
<chapter id="pci">
|
||||
<title>PCI Devices</title>
|
||||
|
||||
<para>This chapter will talk about the FreeBSD mechanisms for
|
||||
writing a device driver for a device on a PCI bus.</para>
|
||||
|
||||
<sect1 id="pci-probe">
|
||||
<title>Probe and Attach</title>
|
||||
|
||||
<para>Information here about how the PCI bus code iterates through
|
||||
the unattached devices and see if a newly loaded kld will attach
|
||||
to any of them.</para>
|
||||
|
||||
<programlisting>/*
|
||||
* Simple KLD to play with the PCI functions.
|
||||
*
|
||||
* Murray Stokely
|
||||
*/
|
||||
|
||||
#define MIN(a,b) (((a) < (b)) ? (a) : (b))
|
||||
|
||||
#include <sys/types.h>
|
||||
#include <sys/module.h>
|
||||
#include <sys/systm.h> /* uprintf */
|
||||
#include <sys/errno.h>
|
||||
#include <sys/param.h> /* defines used in kernel.h */
|
||||
#include <sys/kernel.h> /* types used in module initialization */
|
||||
#include <sys/conf.h> /* cdevsw struct */
|
||||
#include <sys/uio.h> /* uio struct */
|
||||
#include <sys/malloc.h>
|
||||
#include <sys/bus.h> /* structs, prototypes for pci bus stuff */
|
||||
|
||||
#include <pci/pcivar.h> /* For get_pci macros! */
|
||||
|
||||
/* Function prototypes */
|
||||
d_open_t mypci_open;
|
||||
d_close_t mypci_close;
|
||||
d_read_t mypci_read;
|
||||
d_write_t mypci_write;
|
||||
|
||||
/* Character device entry points */
|
||||
|
||||
static struct cdevsw mypci_cdevsw = {
|
||||
.d_open = mypci_open,
|
||||
.d_close = mypci_close,
|
||||
.d_read = mypci_read,
|
||||
.d_write = mypci_write,
|
||||
.d_name = "mypci",
|
||||
};
|
||||
|
||||
/* vars */
|
||||
static dev_t sdev;
|
||||
|
||||
/* We're more interested in probe/attach than with
|
||||
open/close/read/write at this point */
|
||||
|
||||
int
|
||||
mypci_open(dev_t dev, int oflags, int devtype, struct proc *p)
|
||||
{
|
||||
int err = 0;
|
||||
|
||||
uprintf("Opened device \"mypci\" successfully.\n");
|
||||
return(err);
|
||||
}
|
||||
|
||||
int
|
||||
mypci_close(dev_t dev, int fflag, int devtype, struct proc *p)
|
||||
{
|
||||
int err=0;
|
||||
|
||||
uprintf("Closing device \"mypci.\"\n");
|
||||
return(err);
|
||||
}
|
||||
|
||||
int
|
||||
mypci_read(dev_t dev, struct uio *uio, int ioflag)
|
||||
{
|
||||
int err = 0;
|
||||
|
||||
uprintf("mypci read!\n");
|
||||
return err;
|
||||
}
|
||||
|
||||
int
|
||||
mypci_write(dev_t dev, struct uio *uio, int ioflag)
|
||||
{
|
||||
int err = 0;
|
||||
|
||||
uprintf("mypci write!\n");
|
||||
return(err);
|
||||
}
|
||||
|
||||
/* PCI Support Functions */
|
||||
|
||||
/*
|
||||
* Return identification string if this is device is ours.
|
||||
*/
|
||||
static int
|
||||
mypci_probe(device_t dev)
|
||||
{
|
||||
uprintf("MyPCI Probe\n"
|
||||
"Vendor ID : 0x%x\n"
|
||||
"Device ID : 0x%x\n",pci_get_vendor(dev),pci_get_device(dev));
|
||||
|
||||
if (pci_get_vendor(dev) == 0x11c1) {
|
||||
uprintf("We've got the Winmodem, probe successful!\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
return ENXIO;
|
||||
}
|
||||
|
||||
/* Attach function is only called if the probe is successful */
|
||||
|
||||
static int
|
||||
mypci_attach(device_t dev)
|
||||
{
|
||||
uprintf("MyPCI Attach for : deviceID : 0x%x\n",pci_get_vendor(dev));
|
||||
sdev = make_dev(<literal>&</literal>mypci_cdevsw,
|
||||
0,
|
||||
UID_ROOT,
|
||||
GID_WHEEL,
|
||||
0600,
|
||||
"mypci");
|
||||
uprintf("Mypci device loaded.\n");
|
||||
return ENXIO;
|
||||
}
|
||||
|
||||
/* Detach device. */
|
||||
|
||||
static int
|
||||
mypci_detach(device_t dev)
|
||||
{
|
||||
uprintf("Mypci detach!\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Called during system shutdown after sync. */
|
||||
|
||||
static int
|
||||
mypci_shutdown(device_t dev)
|
||||
{
|
||||
uprintf("Mypci shutdown!\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Device suspend routine.
|
||||
*/
|
||||
static int
|
||||
mypci_suspend(device_t dev)
|
||||
{
|
||||
uprintf("Mypci suspend!\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Device resume routine.
|
||||
*/
|
||||
|
||||
static int
|
||||
mypci_resume(device_t dev)
|
||||
{
|
||||
uprintf("Mypci resume!\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
static device_method_t mypci_methods[] = {
|
||||
/* Device interface */
|
||||
DEVMETHOD(device_probe, mypci_probe),
|
||||
DEVMETHOD(device_attach, mypci_attach),
|
||||
DEVMETHOD(device_detach, mypci_detach),
|
||||
DEVMETHOD(device_shutdown, mypci_shutdown),
|
||||
DEVMETHOD(device_suspend, mypci_suspend),
|
||||
DEVMETHOD(device_resume, mypci_resume),
|
||||
|
||||
{ 0, 0 }
|
||||
};
|
||||
|
||||
static driver_t mypci_driver = {
|
||||
"mypci",
|
||||
mypci_methods,
|
||||
0,
|
||||
/* sizeof(struct mypci_softc), */
|
||||
};
|
||||
|
||||
static devclass_t mypci_devclass;
|
||||
|
||||
DRIVER_MODULE(mypci, pci, mypci_driver, mypci_devclass, 0, 0);</programlisting>
|
||||
|
||||
<para>Additional Resources
|
||||
<itemizedlist>
|
||||
<listitem><simpara><ulink url="http://www.pcisig.org/">PCI
|
||||
Special Interest Group</ulink></simpara></listitem>
|
||||
|
||||
<listitem><simpara>PCI System Architecture, Fourth Edition by
|
||||
Tom Shanley, et al.</simpara></listitem>
|
||||
|
||||
</itemizedlist>
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="pci-bus">
|
||||
<title>Bus Resources</title>
|
||||
|
||||
<para>FreeBSD provides an object-oriented mechanism for requesting
|
||||
resources from a parent bus. Almost all devices will be a child
|
||||
member of some sort of bus (PCI, ISA, USB, SCSI, etc) and these
|
||||
devices need to acquire resources from their parent bus (such as
|
||||
memory segments, interrupt lines, or DMA channels).</para>
|
||||
|
||||
<sect2>
|
||||
<title>Base Address Registers</title>
|
||||
|
||||
<para>To do anything particularly useful with a PCI device you
|
||||
will need to obtain the <emphasis>Base Address
|
||||
Registers</emphasis> (BARs) from the PCI Configuration space.
|
||||
The PCI-specific details of obtaining the BAR are abstracted in
|
||||
the <function>bus_alloc_resource()</function> function.</para>
|
||||
|
||||
<para>For example, a typical driver might have something similar
|
||||
to this in the <function>attach()</function> function:</para>
|
||||
|
||||
<programlisting> sc->bar0id = 0x10;
|
||||
sc->bar0res = bus_alloc_resource(dev, SYS_RES_MEMORY, &(sc->bar0id),
|
||||
0, ~0, 1, RF_ACTIVE);
|
||||
if (sc->bar0res == NULL) {
|
||||
uprintf("Memory allocation of PCI base register 0 failed!\n");
|
||||
error = ENXIO;
|
||||
goto fail1;
|
||||
}
|
||||
|
||||
sc->bar1id = 0x14;
|
||||
sc->bar1res = bus_alloc_resource(dev, SYS_RES_MEMORY, &(sc->bar1id),
|
||||
0, ~0, 1, RF_ACTIVE);
|
||||
if (sc->bar1res == NULL) {
|
||||
uprintf("Memory allocation of PCI base register 1 failed!\n");
|
||||
error = ENXIO;
|
||||
goto fail2;
|
||||
}
|
||||
sc->bar0_bt = rman_get_bustag(sc->bar0res);
|
||||
sc->bar0_bh = rman_get_bushandle(sc->bar0res);
|
||||
sc->bar1_bt = rman_get_bustag(sc->bar1res);
|
||||
sc->bar1_bh = rman_get_bushandle(sc->bar1res);
|
||||
|
||||
</programlisting>
|
||||
|
||||
<para>Handles for each base address register are kept in the
|
||||
<structname>softc</structname> structure so that they can be
|
||||
used to write to the device later.</para>
|
||||
|
||||
<para>These handles can then be used to read or write from the
|
||||
device registers with the <function>bus_space_*</function>
|
||||
functions. For example, a driver might contain a shorthand
|
||||
function to read from a board specific register like this:</para>
|
||||
|
||||
<programlisting>uint16_t
|
||||
board_read(struct ni_softc *sc, uint16_t address) {
|
||||
return bus_space_read_2(sc->bar1_bt, sc->bar1_bh, address);
|
||||
}
|
||||
</programlisting>
|
||||
|
||||
<para>Similarly, one could write to the registers with:</para>
|
||||
|
||||
<programlisting>void
|
||||
board_write(struct ni_softc *sc, uint16_t address, uint16_t value) {
|
||||
bus_space_write_2(sc->bar1_bt, sc->bar1_bh, address, value);
|
||||
}
|
||||
</programlisting>
|
||||
|
||||
<para>These functions exist in 8bit, 16bit, and 32bit versions
|
||||
and you should use
|
||||
<function>bus_space_{read|write}_{1|2|4}</function>
|
||||
accordingly.</para>
|
||||
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Interrupts</title>
|
||||
|
||||
<para>Interrupts are allocated from the object-oriented bus code
|
||||
in a way similar to the memory resources. First an IRQ
|
||||
resource must be allocated from the parent bus, and then the
|
||||
interrupt handler must be setup to deal with this IRQ.</para>
|
||||
|
||||
<para>Again, a sample from a device
|
||||
<function>attach()</function> function says more than
|
||||
words.</para>
|
||||
|
||||
<programlisting>/* Get the IRQ resource */
|
||||
|
||||
sc->irqid = 0x0;
|
||||
sc->irqres = bus_alloc_resource(dev, SYS_RES_IRQ, &(sc->irqid),
|
||||
0, ~0, 1, RF_SHAREABLE | RF_ACTIVE);
|
||||
if (sc->irqres == NULL) {
|
||||
uprintf("IRQ allocation failed!\n");
|
||||
error = ENXIO;
|
||||
goto fail3;
|
||||
}
|
||||
|
||||
/* Now we should setup the interrupt handler */
|
||||
|
||||
error = bus_setup_intr(dev, sc->irqres, INTR_TYPE_MISC,
|
||||
my_handler, sc, &(sc->handler));
|
||||
if (error) {
|
||||
printf("Couldn't set up irq\n");
|
||||
goto fail4;
|
||||
}
|
||||
|
||||
sc->irq_bt = rman_get_bustag(sc->irqres);
|
||||
sc->irq_bh = rman_get_bushandle(sc->irqres);
|
||||
</programlisting>
|
||||
|
||||
<para>Some care must be taken in the detach routine of the
|
||||
driver. You must quiess the device's interrupt stream, and
|
||||
remove the interrupt hanlder. Once
|
||||
<function>bus_space_teardown_intr()</function> has returned, you
|
||||
know that your interrupt handler will no longer be called, and
|
||||
that all threads that might have been this interrupt handler
|
||||
have returned. Depending on the locking strategy of your
|
||||
driver, you will also need to be careful with what locks you
|
||||
hold when you do this to avoid deadlock.</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>DMA</title>
|
||||
<para>This section is obsolete, and present only for historical
|
||||
reasons. The proper methods for dealing with these issues is to
|
||||
use the <function>bus_space_dma*()</function> functions instead.
|
||||
This paragraph can be removed when this section is updated to reflect
|
||||
that usage. However, at the moment, the API is in a bit of
|
||||
flux, so once that settles down, it would be good to update this
|
||||
section to reflect that.</para>
|
||||
|
||||
<para>On the PC, peripherals that want to do bus-mastering DMA
|
||||
must deal with physical addresses. This is a problem since
|
||||
FreeBSD uses virtual memory and deals almost exclusively with
|
||||
virtual addresses. Fortunately, there is a function,
|
||||
<function>vtophys()</function> to help.</para>
|
||||
|
||||
<programlisting>#include <vm/vm.h>
|
||||
#include <vm/pmap.h>
|
||||
|
||||
#define vtophys(virtual_address) (...)
|
||||
</programlisting>
|
||||
|
||||
<para>The solution is a bit different on the alpha however, and
|
||||
what we really want is a function called
|
||||
<function>vtobus()</function>.</para>
|
||||
|
||||
<programlisting>#if defined(__alpha__)
|
||||
#define vtobus(va) alpha_XXX_dmamap((vm_offset_t)va)
|
||||
#else
|
||||
#define vtobus(va) vtophys(va)
|
||||
#endif
|
||||
</programlisting>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Deallocating Resources</title>
|
||||
|
||||
<para>It is very important to deallocate all of the resources
|
||||
that were allocated during <function>attach()</function>.
|
||||
Care must be taken to deallocate the correct stuff even on a
|
||||
failure condition so that the system will remain usable while
|
||||
your driver dies.</para>
|
||||
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
</chapter>
|
File diff suppressed because it is too large
Load diff
|
@ -1,690 +0,0 @@
|
|||
<!--
|
||||
The FreeBSD Documentation Project
|
||||
|
||||
$FreeBSD$
|
||||
-->
|
||||
|
||||
<chapter id="oss">
|
||||
<chapterinfo>
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Jean-Francois</firstname>
|
||||
<surname>Dockes</surname>
|
||||
<contrib>Contributed by </contrib>
|
||||
</author>
|
||||
</authorgroup>
|
||||
<!-- 23 November 2001 -->
|
||||
</chapterinfo>
|
||||
|
||||
<title>Sound subsystem</title>
|
||||
|
||||
<sect1 id="oss-intro">
|
||||
<title>Introduction</title>
|
||||
|
||||
<para>The FreeBSD sound subsystem cleanly separates generic sound
|
||||
handling issues from device-specific ones. This makes it easier
|
||||
to add support for new hardware.</para>
|
||||
|
||||
<para>The &man.pcm.4; framework is the central piece of the sound
|
||||
subsystem. It mainly implements the following elements:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>A system call interface (read, write, ioctls) to
|
||||
digitized sound and mixer functions. The ioctl command set
|
||||
is compatible with the legacy <emphasis>OSS</emphasis> or
|
||||
<emphasis>Voxware</emphasis> interface, allowing common
|
||||
multimedia applications to be ported without
|
||||
modification.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Common code for processing sound data (format
|
||||
conversions, virtual channels).</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>A uniform software interface to hardware-specific audio
|
||||
interface modules.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Additional support for some common hardware interfaces
|
||||
(ac97), or shared hardware-specific code (ex: ISA DMA
|
||||
routines).</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>The support for specific sound cards is implemented by
|
||||
hardware-specific drivers, which provide channel and mixer interfaces
|
||||
to plug into the generic <devicename>pcm</devicename> code.</para>
|
||||
|
||||
<para>In this chapter, the term <devicename>pcm</devicename> will
|
||||
refer to the central, common part of the sound driver, as
|
||||
opposed to the hardware-specific modules.</para>
|
||||
|
||||
<para>The prospective driver writer will of course want to start
|
||||
from an existing module and use the code as the ultimate
|
||||
reference. But, while the sound code is nice and clean, it is
|
||||
also mostly devoid of comments. This document tries to give an
|
||||
overview of the framework interface and answer some questions
|
||||
that may arise while adapting the existing code.</para>
|
||||
|
||||
<para>As an alternative, or in addition to starting from a working
|
||||
example, you can find a commented driver template at
|
||||
<ulink url="http://people.FreeBSD.org/~cg/template.c">
|
||||
http://people.FreeBSD.org/~cg/template.c</ulink></para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="oss-files">
|
||||
<title>Files</title>
|
||||
|
||||
<para>All the relevant code currently (FreeBSD 4.4) lives in
|
||||
<filename>/usr/src/sys/dev/sound/</filename>, except for the
|
||||
public ioctl interface definitions, found in
|
||||
<filename>/usr/src/sys/sys/soundcard.h</filename></para>
|
||||
|
||||
<para>Under <filename>/usr/src/sys/dev/sound/</filename>, the
|
||||
<filename>pcm/</filename> directory holds the central code,
|
||||
while the <filename>isa/</filename> and
|
||||
<filename>pci/</filename> directories have the drivers for ISA
|
||||
and PCI boards.</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="pcm-probe-and-attach">
|
||||
<title>Probing, attaching, etc.</title>
|
||||
|
||||
<para>Sound drivers probe and attach in almost the same way as any
|
||||
hardware driver module. You might want to look at the <link
|
||||
linkend="isa-driver"> ISA</link> or <link
|
||||
linkend="pci">PCI</link> specific sections of the handbook for
|
||||
more information.</para>
|
||||
|
||||
<para>However, sound drivers differ in some ways:</para>
|
||||
|
||||
<itemizedlist>
|
||||
|
||||
<listitem>
|
||||
<para>They declare themselves as <devicename>pcm</devicename>
|
||||
class devices, with a <structname>struct
|
||||
snddev_info</structname> device private structure:</para>
|
||||
|
||||
<programlisting> static driver_t xxx_driver = {
|
||||
"pcm",
|
||||
xxx_methods,
|
||||
sizeof(struct snddev_info)
|
||||
};
|
||||
|
||||
DRIVER_MODULE(snd_xxxpci, pci, xxx_driver, pcm_devclass, 0, 0);
|
||||
MODULE_DEPEND(snd_xxxpci, snd_pcm, PCM_MINVER, PCM_PREFVER,PCM_MAXVER);</programlisting>
|
||||
|
||||
<para>Most sound drivers need to store additional private
|
||||
information about their device. A private data structure is
|
||||
usually allocated in the attach routine. Its address is
|
||||
passed to <devicename>pcm</devicename> by the calls to
|
||||
<function>pcm_register()</function> and
|
||||
<function>mixer_init()</function>.
|
||||
<devicename>pcm</devicename> later passes back this address
|
||||
as a parameter in calls to the sound driver
|
||||
interfaces.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>The sound driver attach routine should declare its MIXER
|
||||
or AC97 interface to <devicename>pcm</devicename> by calling
|
||||
<function>mixer_init()</function>. For a MIXER interface,
|
||||
this causes in turn a call to <link linkend="xxxmixer-init">
|
||||
<function>xxxmixer_init()</function></link>.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>The sound driver attach routine declares its general
|
||||
CHANNEL configuration to <devicename>pcm</devicename> by
|
||||
calling <function>pcm_register(dev, sc, nplay,
|
||||
nrec)</function>, where <varname>sc</varname> is the address
|
||||
for the device data structure, used in further calls from
|
||||
<devicename>pcm</devicename>, and <varname>nplay</varname>
|
||||
and <varname>nrec</varname> are the number of play and
|
||||
record channels.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>The sound driver attach routine declares each of its
|
||||
channel objects by calls to
|
||||
<function>pcm_addchan()</function>. This sets up the
|
||||
channel glue in <devicename>pcm</devicename> and causes in
|
||||
turn a call to
|
||||
<link linkend="xxxchannel-init">
|
||||
<function>xxxchannel_init()</function></link>.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>The sound driver detach routine should call
|
||||
<function>pcm_unregister()</function> before releasing its
|
||||
resources.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>There are two possible methods to handle non-PnP devices:</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>Use a <function>device_identify()</function> method
|
||||
(example: <filename>sound/isa/es1888.c</filename>). The
|
||||
<function>device_identify()</function> method probes for the
|
||||
hardware at known addresses and, if it finds a supported
|
||||
device, creates a new pcm device which is then passed to
|
||||
probe/attach.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Use a custom kernel configuration with appropriate hints
|
||||
for pcm devices (example:
|
||||
<filename>sound/isa/mss.c</filename>).</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para><devicename>pcm</devicename> drivers should implement
|
||||
<function>device_suspend</function>,
|
||||
<function>device_resume</function> and
|
||||
<function>device_shutdown</function> routines, so that power
|
||||
management and module unloading function correctly.</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="oss-interfaces">
|
||||
<title>Interfaces</title>
|
||||
|
||||
<para>The interface between the <devicename>pcm</devicename> core
|
||||
and the sound drivers is defined in terms of <link
|
||||
linkend="kernel-objects">kernel objects</link>.</para>
|
||||
|
||||
<para>There are two main interfaces that a sound driver will
|
||||
usually provide: <emphasis>CHANNEL</emphasis> and either
|
||||
<emphasis>MIXER</emphasis> or <emphasis>AC97</emphasis>.</para>
|
||||
|
||||
<para>The <emphasis>AC97</emphasis> interface is a very small
|
||||
hardware access (register read/write) interface, implemented by
|
||||
drivers for hardware with an AC97 codec. In this case, the
|
||||
actual MIXER interface is provided by the shared AC97 code in
|
||||
<devicename>pcm</devicename>.</para>
|
||||
|
||||
<sect2>
|
||||
<title>The CHANNEL interface</title>
|
||||
|
||||
<sect3>
|
||||
<title>Common notes for function parameters</title>
|
||||
|
||||
<para>Sound drivers usually have a private data structure to
|
||||
describe their device, and one structure for each play and
|
||||
record data channel that it supports.</para>
|
||||
|
||||
<para>For all CHANNEL interface functions, the first parameter
|
||||
is an opaque pointer.</para>
|
||||
|
||||
<para>The second parameter is a pointer to the private
|
||||
channel data structure, except for
|
||||
<function>channel_init()</function> which has a pointer to the
|
||||
private device structure (and returns the channel pointer
|
||||
for further use by <devicename>pcm</devicename>).</para>
|
||||
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>Overview of data transfer operations</title>
|
||||
|
||||
<para>For sound data transfers, the
|
||||
<devicename>pcm</devicename> core and the sound drivers
|
||||
communicate through a shared memory area, described by a
|
||||
<structname>struct snd_dbuf</structname>.</para>
|
||||
|
||||
<para><structname>struct snd_dbuf</structname> is private to
|
||||
<devicename>pcm</devicename>, and sound drivers obtain
|
||||
values of interest by calls to accessor functions
|
||||
(<function>sndbuf_getxxx()</function>).</para>
|
||||
|
||||
<para>The shared memory area has a size of
|
||||
<function>sndbuf_getsize()</function> and is divided into
|
||||
fixed size blocks of <function>sndbuf_getblksz()</function>
|
||||
bytes.</para>
|
||||
|
||||
<para>When playing, the general transfer mechanism is as
|
||||
follows (reverse the idea for recording):</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para><devicename>pcm</devicename> initially fills up the
|
||||
buffer, then calls the sound driver's <link
|
||||
linkend="channel-trigger">
|
||||
<function>xxxchannel_trigger()</function></link>
|
||||
function with a parameter of PCMTRIG_START.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>The sound driver then arranges to repeatedly
|
||||
transfer the whole memory area
|
||||
(<function>sndbuf_getbuf()</function>,
|
||||
<function>sndbuf_getsize()</function>) to the device, in
|
||||
blocks of <function>sndbuf_getblksz()</function> bytes.
|
||||
It calls back the <function>chn_intr()</function>
|
||||
<devicename>pcm</devicename> function for each
|
||||
transferred block (this will typically happen at
|
||||
interrupt time).</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para><function>chn_intr()</function> arranges to copy new
|
||||
data to the area that was transferred to the device (now
|
||||
free), and make appropriate updates to the
|
||||
<structname>snd_dbuf</structname> structure.</para>
|
||||
</listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
</sect3>
|
||||
|
||||
<sect3 id="xxxchannel-init">
|
||||
<title>channel_init</title>
|
||||
|
||||
<para><function>xxxchannel_init()</function> is called to
|
||||
initialize each of the play or record channels. The calls
|
||||
are initiated from the sound driver attach routine. (See
|
||||
the <link linkend="pcm-probe-and-attach">probe and attach
|
||||
section</link>).</para>
|
||||
|
||||
<programlisting> static void *
|
||||
xxxchannel_init(kobj_t obj, void *data,
|
||||
struct snd_dbuf *b, struct pcm_channel *c, int dir)<co id="co-chinit-params">
|
||||
{
|
||||
struct xxx_info *sc = data;
|
||||
struct xxx_chinfo *ch;
|
||||
...
|
||||
return ch;<co id="co-chinit-return">
|
||||
}</programlisting>
|
||||
|
||||
<calloutlist>
|
||||
|
||||
<callout arearefs="co-chinit-params">
|
||||
<para><varname>b</varname> is the address for the channel
|
||||
<structname>struct snd_dbuf</structname>. It should be
|
||||
initialized in the function by calling
|
||||
<function>sndbuf_alloc()</function>. The buffer size to
|
||||
use is normally a small multiple of the 'typical' unit
|
||||
transfer size for your device.</para>
|
||||
|
||||
<para><varname>c</varname> is the
|
||||
<devicename>pcm</devicename> channel control structure
|
||||
pointer. This is an opaque object. The function should
|
||||
store it in the local channel structure, to be used in
|
||||
later calls to <devicename>pcm</devicename> (ie:
|
||||
<function>chn_intr(c)</function>).</para>
|
||||
|
||||
<para><varname>dir</varname> indicates the channel
|
||||
direction (<literal>PCMDIR_PLAY</literal> or
|
||||
<literal>PCMDIR_REC</literal>).</para>
|
||||
</callout>
|
||||
|
||||
<callout arearefs="co-chinit-return">
|
||||
<para>The function should return a pointer to the private
|
||||
area used to control this channel. This will be passed
|
||||
as a parameter to other channel interface calls.</para>
|
||||
</callout>
|
||||
|
||||
</calloutlist>
|
||||
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>channel_setformat</title>
|
||||
|
||||
<para><function>xxxchannel_setformat()</function> should set
|
||||
up the hardware for the specified channel for the specified
|
||||
sound format.</para>
|
||||
|
||||
<programlisting> static int
|
||||
xxxchannel_setformat(kobj_t obj, void *data, u_int32_t format)<co id="co-chsetformat-params">
|
||||
{
|
||||
struct xxx_chinfo *ch = data;
|
||||
...
|
||||
return 0;
|
||||
}</programlisting>
|
||||
|
||||
<calloutlist>
|
||||
<callout arearefs="co-chsetformat-params">
|
||||
<para><varname>format</varname> is specified as an
|
||||
<literal>AFMT_XXX value</literal>
|
||||
(<filename>soundcard.h</filename>).</para>
|
||||
</callout>
|
||||
|
||||
</calloutlist>
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>channel_setspeed</title>
|
||||
|
||||
<para><function>xxxchannel_setspeed()</function> sets up the
|
||||
channel hardware for the specified sampling speed, and
|
||||
returns the possibly adjusted speed.</para>
|
||||
|
||||
<programlisting> static int
|
||||
xxxchannel_setspeed(kobj_t obj, void *data, u_int32_t speed)
|
||||
{
|
||||
struct xxx_chinfo *ch = data;
|
||||
...
|
||||
return speed;
|
||||
}</programlisting>
|
||||
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>channel_setblocksize</title>
|
||||
|
||||
<para><function>xxxchannel_setblocksize()</function> sets the
|
||||
block size, which is the size of unit transactions between
|
||||
<devicename>pcm</devicename> and the sound driver, and
|
||||
between the sound driver and the device. Typically, this
|
||||
would be the number of bytes transferred before an interrupt
|
||||
occurs. During a transfer, the sound driver should call
|
||||
<devicename>pcm</devicename>'s
|
||||
<function>chn_intr()</function> every time this size has
|
||||
been transferred.</para>
|
||||
|
||||
<para>Most sound drivers only take note of the block size
|
||||
here, to be used when an actual transfer will be
|
||||
started.</para>
|
||||
|
||||
<programlisting> static int
|
||||
xxxchannel_setblocksize(kobj_t obj, void *data, u_int32_t blocksize)
|
||||
{
|
||||
struct xxx_chinfo *ch = data;
|
||||
...
|
||||
return blocksize;<co id="co-chsetblocksize-return">
|
||||
}</programlisting>
|
||||
|
||||
<calloutlist>
|
||||
<callout arearefs="co-chsetblocksize-return">
|
||||
<para>The function returns the possibly adjusted block
|
||||
size. In case the block size is indeed changed,
|
||||
<function>sndbuf_resize()</function> should be called to
|
||||
adjust the buffer.</para>
|
||||
|
||||
</callout>
|
||||
</calloutlist>
|
||||
|
||||
</sect3>
|
||||
|
||||
<sect3 id="channel-trigger">
|
||||
<title>channel_trigger</title>
|
||||
|
||||
<para><function>xxxchannel_trigger()</function> is called by
|
||||
<devicename>pcm</devicename> to control data transfer
|
||||
operations in the driver.</para>
|
||||
|
||||
<programlisting> static int
|
||||
xxxchannel_trigger(kobj_t obj, void *data, int go)<co id="co-chtrigger-params">
|
||||
{
|
||||
struct xxx_chinfo *ch = data;
|
||||
...
|
||||
return 0;
|
||||
}</programlisting>
|
||||
|
||||
<calloutlist>
|
||||
<callout arearefs="co-chtrigger-params">
|
||||
<para><varname>go</varname> defines the action for the
|
||||
current call. The possible values are:</para>
|
||||
<itemizedlist>
|
||||
|
||||
<listitem>
|
||||
<para><literal>PCMTRIG_START</literal>: the driver
|
||||
should start a data transfer from or to the channel
|
||||
buffer. If needed, the buffer base and size can be
|
||||
retrieved through
|
||||
<function>sndbuf_getbuf()</function> and
|
||||
<function>sndbuf_getsize()</function>.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para><literal>PCMTRIG_EMLDMAWR</literal> /
|
||||
<literal>PCMTRIG_EMLDMARD</literal>: this tells the
|
||||
driver that the input or output buffer may have been
|
||||
updated. Most drivers just ignore these
|
||||
calls.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para><literal>PCMTRIG_STOP</literal> /
|
||||
<literal>PCMTRIG_ABORT</literal>: the driver should
|
||||
stop the current transfer.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
</callout>
|
||||
</calloutlist>
|
||||
|
||||
<note><para>If the driver uses ISA DMA,
|
||||
<function>sndbuf_isadma()</function> should be called before
|
||||
performing actions on the device, and will take care of the
|
||||
DMA chip side of things.</para>
|
||||
</note>
|
||||
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>channel_getptr</title>
|
||||
|
||||
<para><function>xxxchannel_getptr()</function> returns the
|
||||
current offset in the transfer buffer. This will typically
|
||||
be called by <function>chn_intr()</function>, and this is how
|
||||
<devicename>pcm</devicename> knows where it can transfer
|
||||
new data.</para>
|
||||
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>channel_free</title>
|
||||
|
||||
<para><function>xxxchannel_free()</function> is called to free
|
||||
up channel resources, for example when the driver is
|
||||
unloaded, and should be implemented if the channel data
|
||||
structures are dynamically allocated or if
|
||||
<function>sndbuf_alloc()</function> was not used for buffer
|
||||
allocation.</para>
|
||||
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>channel_getcaps</title>
|
||||
|
||||
<programlisting> struct pcmchan_caps *
|
||||
xxxchannel_getcaps(kobj_t obj, void *data)
|
||||
{
|
||||
return &xxx_caps;<co id="co-chgetcaps-return">
|
||||
}</programlisting>
|
||||
|
||||
<calloutlist>
|
||||
|
||||
<callout arearefs="co-chgetcaps-return">
|
||||
<para>The routine returns a pointer to a (usually
|
||||
statically-defined) <structname>pcmchan_caps</structname>
|
||||
structure (defined in
|
||||
<filename>sound/pcm/channel.h</filename>. The structure holds
|
||||
the minimum and maximum sampling frequencies, and the
|
||||
accepted sound formats. Look at any sound driver for an
|
||||
example.</para>
|
||||
</callout>
|
||||
|
||||
</calloutlist>
|
||||
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>More functions</title>
|
||||
|
||||
<para><function>channel_reset()</function>,
|
||||
<function>channel_resetdone()</function>, and
|
||||
<function>channel_notify()</function> are for special purposes
|
||||
and should not be implemented in a driver without discussing
|
||||
it with the authorities (&a.cg;).</para>
|
||||
|
||||
<para><function>channel_setdir()</function> is deprecated.</para>
|
||||
</sect3>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>The MIXER interface</title>
|
||||
|
||||
<sect3 id="xxxmixer-init">
|
||||
<title>mixer_init</title>
|
||||
|
||||
<para><function>xxxmixer_init()</function> initializes the
|
||||
hardware and tells <devicename>pcm</devicename> what mixer
|
||||
devices are available for playing and recording</para>
|
||||
|
||||
<programlisting> static int
|
||||
xxxmixer_init(struct snd_mixer *m)
|
||||
{
|
||||
struct xxx_info *sc = mix_getdevinfo(m);
|
||||
u_int32_t v;
|
||||
|
||||
[Initialize hardware]
|
||||
|
||||
[Set appropriate bits in v for play mixers]<co id="co-mxini-sd">
|
||||
mix_setdevs(m, v);
|
||||
[Set appropriate bits in v for record mixers]
|
||||
mix_setrecdevs(m, v)
|
||||
|
||||
return 0;
|
||||
}</programlisting>
|
||||
|
||||
<calloutlist>
|
||||
<callout arearefs="co-mxini-sd">
|
||||
<para>Set bits in an integer value and call
|
||||
<function>mix_setdevs()</function> and
|
||||
<function>mix_setrecdevs()</function> to tell
|
||||
<devicename>pcm</devicename> what devices exist.</para>
|
||||
</callout>
|
||||
</calloutlist>
|
||||
|
||||
<para>Mixer bits definitions can be found in
|
||||
<filename>soundcard.h</filename>
|
||||
(<literal>SOUND_MASK_XXX</literal> values and
|
||||
<literal>SOUND_MIXER_XXX</literal> bit shifts).</para>
|
||||
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>mixer_set</title>
|
||||
|
||||
<para><function>xxxmixer_set()</function> sets the volume
|
||||
level for one mixer device.</para>
|
||||
|
||||
<programlisting> static int
|
||||
xxxmixer_set(struct snd_mixer *m, unsigned dev,
|
||||
unsigned left, unsigned right)<co id="co-mxset-params">
|
||||
{
|
||||
struct sc_info *sc = mix_getdevinfo(m);
|
||||
[set volume level]
|
||||
return left | (right << 8);<co id="co-mxset-return">
|
||||
}</programlisting>
|
||||
|
||||
<calloutlist>
|
||||
<callout arearefs="co-mxset-params">
|
||||
<para>The device is specified as a SOUND_MIXER_XXX
|
||||
value</para> <para>The volume values are specified in
|
||||
range [0-100]. A value of zero should mute the
|
||||
device.</para>
|
||||
</callout>
|
||||
|
||||
<callout arearefs="co-mxset-return">
|
||||
<para>As the hardware levels probably won't match the
|
||||
input scale, and some rounding will occur, the routine
|
||||
returns the actual level values (in range 0-100) as
|
||||
shown.</para>
|
||||
</callout>
|
||||
</calloutlist>
|
||||
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>mixer_setrecsrc</title>
|
||||
|
||||
<para><function>xxxmixer_setrecsrc()</function> sets the
|
||||
recording source device.</para>
|
||||
|
||||
<programlisting> static int
|
||||
xxxmixer_setrecsrc(struct snd_mixer *m, u_int32_t src)<co id="co-mxsr-params">
|
||||
{
|
||||
struct xxx_info *sc = mix_getdevinfo(m);
|
||||
|
||||
[look for non zero bit(s) in src, set up hardware]
|
||||
|
||||
[update src to reflect actual action]
|
||||
return src;<co id="co-mxsr-return">
|
||||
}</programlisting>
|
||||
|
||||
<calloutlist>
|
||||
<callout arearefs="co-mxsr-params">
|
||||
<para>The desired recording devices are specified as a
|
||||
bit field</para>
|
||||
</callout>
|
||||
|
||||
<callout arearefs="co-mxsr-return">
|
||||
<para>The actual devices set for recording are returned.
|
||||
Some drivers can only set one device for recording. The
|
||||
function should return -1 if an error occurs.</para>
|
||||
</callout>
|
||||
</calloutlist>
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>mixer_uninit, mixer_reinit</title>
|
||||
|
||||
<para><function>xxxmixer_uninit()</function> should ensure
|
||||
that all sound is muted and if possible mixer hardware
|
||||
should be powered down </para>
|
||||
|
||||
<para><function>xxxmixer_reinit()</function> should ensure
|
||||
that the mixer hardware is powered up and any settings not
|
||||
controlled by <function>mixer_set()</function> or
|
||||
<function>mixer_setrecsrc()</function> are restored.</para>
|
||||
|
||||
</sect3>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>The AC97 interface</title>
|
||||
|
||||
<para>The <emphasis>AC97</emphasis> interface is implemented
|
||||
by drivers with an AC97 codec. It only has three methods:</para>
|
||||
|
||||
<itemizedlist>
|
||||
|
||||
<listitem><para><function>xxxac97_init()</function> returns
|
||||
the number of ac97 codecs found.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem><para><function>ac97_read()</function> and
|
||||
<function>ac97_write()</function> read or write a specified
|
||||
register.</para>
|
||||
</listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
<para>The <emphasis>AC97</emphasis> interface is used by the
|
||||
AC97 code in <devicename>pcm</devicename> to perform higher
|
||||
level operations. Look at
|
||||
<filename>sound/pci/maestro3.c</filename> or many others under
|
||||
<filename>sound/pci/</filename> for an example.</para>
|
||||
|
||||
</sect2>
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<!--
|
||||
Local Variables:
|
||||
mode: sgml
|
||||
sgml-declaration: "../chapter.decl"
|
||||
sgml-indent-data: t
|
||||
sgml-omittag: nil
|
||||
sgml-always-quote-attributes: t
|
||||
sgml-parent-document: ("../book.sgml" "part" "chapter")
|
||||
End:
|
||||
-->
|
|
@ -1,161 +0,0 @@
|
|||
<!--
|
||||
The FreeBSD Documentation Project
|
||||
|
||||
$FreeBSD$
|
||||
-->
|
||||
|
||||
<chapter id="sysinit">
|
||||
<title>The Sysinit Framework</title>
|
||||
|
||||
<para>Sysinit is the framework for a generic call sort and dispatch
|
||||
mechanism. FreeBSD currently uses it for the dynamic
|
||||
initialization of the kernel. Sysinit allows FreeBSD's kernel
|
||||
subsystems to be reordered, and added, removed, and replaced at
|
||||
kernel link time when the kernel or one of its modules is loaded
|
||||
without having to edit a statically ordered initialization routing
|
||||
and recompile the kernel. This system also allows kernel modules,
|
||||
currently called <firstterm>KLD's</firstterm>, to be separately
|
||||
compiled, linked, and initialized at boot time and loaded even
|
||||
later while the system is already running. This is accomplished
|
||||
using the <quote>kernel linker</quote> and <quote>linker
|
||||
sets</quote>.</para>
|
||||
|
||||
<sect1 id="sysinit-term">
|
||||
<title>Terminology</title>
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>Linker Set</term>
|
||||
<listitem>
|
||||
<para>A linker technique in which the linker gathers
|
||||
statically declared data throughout a program's source files
|
||||
into a single contiguously addressable unit of
|
||||
data.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="sysinit-operation">
|
||||
<title>Sysinit Operation</title>
|
||||
|
||||
<para>Sysinit relies on the ability of the linker to take static
|
||||
data declared at multiple locations throughout a program's
|
||||
source and group it together as a single contiguous chunk of
|
||||
data. This linker technique is called a <quote>linker
|
||||
set</quote>. Sysinit uses two linker sets to maintain two data
|
||||
sets containing each consumer's call order, function, and a
|
||||
pointer to the data to pass to that function.</para>
|
||||
|
||||
<para>Sysinit uses two priorities when ordering the functions for
|
||||
execution. The first priority is a subsystem ID giving an
|
||||
overall order Sysinit's dispatch of functions. Current predeclared
|
||||
ID's are in <filename><sys/kernel.h></filename> in the enum
|
||||
list <literal>sysinit_sub_id</literal>. The second priority used
|
||||
is an element order within the subsystem. Current predeclared
|
||||
subsystem element orders are in
|
||||
<filename><sys/kernel.h></filename> in the enum list
|
||||
<literal>sysinit_elem_order</literal>.</para>
|
||||
|
||||
<para>There are currently two uses for Sysinit. Function dispatch
|
||||
at system startup and kernel module loads, and function dispatch
|
||||
at system shutdown and kernel module unload.</para>
|
||||
</sect1>
|
||||
|
||||
|
||||
<sect1 id="sysinit-using">
|
||||
<title>Using Sysinit</title>
|
||||
|
||||
<sect2>
|
||||
<title>Interface</title>
|
||||
|
||||
<sect3>
|
||||
<title>Headers</title>
|
||||
|
||||
<programlisting><sys/kernel.h></programlisting>
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title>Macros</title>
|
||||
|
||||
<programlisting>SYSINIT(uniquifier, subsystem, order, func, ident)
|
||||
SYSUNINIT(uniquifier, subsystem, order, func, ident)</programlisting>
|
||||
</sect3>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Startup</title>
|
||||
|
||||
<para>The <literal>SYSINIT()</literal> macro creates the
|
||||
necessary sysinit data in Sysinit's startup data set for
|
||||
Sysinit to sort and dispatch a function at system startup and
|
||||
module load. <literal>SYSINIT()</literal> takes a uniquifier
|
||||
that Sysinit uses identify the particular function dispatch
|
||||
data, the subsystem order, the subsystem element order, the
|
||||
function to call, and the data to pass the function. All
|
||||
functions must take a constant pointer argument.
|
||||
</para>
|
||||
|
||||
<para>For example:</para>
|
||||
|
||||
<programlisting>#include <sys/kernel.h>
|
||||
|
||||
void foo_null(void *unused)
|
||||
{
|
||||
foo_doo();
|
||||
}
|
||||
SYSINIT(foo_null, SI_SUB_FOO, SI_ORDER_FOO, NULL);
|
||||
|
||||
struct foo foo_voodoo = {
|
||||
FOO_VOODOO;
|
||||
}
|
||||
|
||||
void foo_arg(void *vdata)
|
||||
{
|
||||
struct foo *foo = (struct foo *)vdata;
|
||||
foo_data(foo);
|
||||
}
|
||||
SYSINIT(foo_arg, SI_SUB_FOO, SI_ORDER_FOO, foo_voodoo);
|
||||
</programlisting>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Shutdown</title>
|
||||
|
||||
<para>The <literal>SYSUNINIT()</literal> macro behaves similarly
|
||||
to the <literal>SYSINIT()</literal> macro except that it adds
|
||||
the Sysinit data to Sysinit's shutdown data set.</para>
|
||||
|
||||
<para>For example:</para>
|
||||
|
||||
<programlisting>#include <sys/kernel.h>
|
||||
|
||||
void foo_cleanup(void *unused)
|
||||
{
|
||||
foo_kill();
|
||||
}
|
||||
SYSUNINIT(foo_cleanup, SI_SUB_FOO, SI_ORDER_FOO, NULL);
|
||||
|
||||
struct foo_stack foo_stack = {
|
||||
FOO_STACK_VOODOO;
|
||||
}
|
||||
|
||||
void foo_flush(void *vdata)
|
||||
{
|
||||
}
|
||||
SYSUNINIT(foo_flush, SI_SUB_FOO, SI_ORDER_FOO, foo_stack);
|
||||
</programlisting>
|
||||
</sect2>
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<!--
|
||||
Local Variables:
|
||||
mode: sgml
|
||||
sgml-declaration: "../chapter.decl"
|
||||
sgml-indent-data: t
|
||||
sgml-omittag: nil
|
||||
sgml-always-quote-attributes: t
|
||||
sgml-parent-document: ("../book.sgml" "part" "chapter")
|
||||
End:
|
||||
-->
|
|
@ -1,623 +0,0 @@
|
|||
<!--
|
||||
The FreeBSD Documentation Project
|
||||
|
||||
$FreeBSD$
|
||||
-->
|
||||
|
||||
<chapter id="usb">
|
||||
<title>USB Devices</title>
|
||||
|
||||
<para><emphasis>This chapter was written by &a.nhibma;. Modifications made for
|
||||
the handbook by &a.murray;.</emphasis></para>
|
||||
|
||||
<sect1 id="usb-intro">
|
||||
<title>Introduction</title>
|
||||
|
||||
<para>The Universal Serial Bus (USB) is a new way of attaching
|
||||
devices to personal computers. The bus architecture features
|
||||
two-way communication and has been developed as a response to
|
||||
devices becoming smarter and requiring more interaction with the
|
||||
host. USB support is included in all current PC chipsets and is
|
||||
therefore available in all recently built PCs. Apple's
|
||||
introduction of the USB-only iMac has been a major incentive for
|
||||
hardware manufacturers to produce USB versions of their devices.
|
||||
The future PC specifications specify that all legacy connectors
|
||||
on PCs should be replaced by one or more USB connectors,
|
||||
providing generic plug and play capabilities. Support for USB
|
||||
hardware was available at a very early stage in NetBSD and was
|
||||
developed by Lennart Augustsson for the NetBSD project. The
|
||||
code has been ported to FreeBSD and we are currently maintaining
|
||||
a shared code base. For the implementation of the USB subsystem
|
||||
a number of features of USB are important.</para>
|
||||
|
||||
<para><emphasis>Lennart Augustsson has done most of the implementation of
|
||||
the USB support for the NetBSD project. Many thanks for this
|
||||
incredible amount of work. Many thanks also to Ardy and Dirk for
|
||||
their comments and proofreading of this paper.</emphasis></para>
|
||||
|
||||
<itemizedlist>
|
||||
|
||||
<listitem><para>Devices connect to ports on the computer
|
||||
directly or on devices called hubs, forming a treelike device
|
||||
structure.</para></listitem>
|
||||
|
||||
<listitem><para>The devices can be connected and disconnected at
|
||||
run time.</para></listitem>
|
||||
|
||||
<listitem><para>Devices can suspend themselves and trigger
|
||||
resumes of the host system</para></listitem>
|
||||
|
||||
<listitem><para>As the devices can be powered from the bus, the
|
||||
host software has to keep track of power budgets for each
|
||||
hub.</para></listitem>
|
||||
|
||||
<listitem><para>Different quality of service requirements by the
|
||||
different device types together with the maximum of 126
|
||||
devices that can be connected to the same bus, require proper
|
||||
scheduling of transfers on the shared bus to take full
|
||||
advantage of the 12Mbps bandwidth available. (over 400Mbps
|
||||
with USB 2.0)</para></listitem>
|
||||
|
||||
<listitem><para>Devices are intelligent and contain easily
|
||||
accessible information about themselves</para></listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
<para>The development of drivers for the USB subsystem and devices
|
||||
connected to it is supported by the specifications that have
|
||||
been developed and will be developed. These specifications are
|
||||
publicly available from the USB home pages. Apple has been very
|
||||
strong in pushing for standards based drivers, by making drivers
|
||||
for the generic classes available in their operating system
|
||||
MacOS and discouraging the use of separate drivers for each new
|
||||
device. This chapter tries to collate essential information for a
|
||||
basic understanding of the present implementation of the USB
|
||||
stack in FreeBSD/NetBSD. It is recommended however to read it
|
||||
together with the relevant specifications mentioned in the
|
||||
references below.</para>
|
||||
|
||||
<sect2>
|
||||
<title>Structure of the USB Stack</title>
|
||||
|
||||
<para>The USB support in FreeBSD can be split into three
|
||||
layers. The lowest layer contains the host controller driver,
|
||||
providing a generic interface to the hardware and its scheduling
|
||||
facilities. It supports initialisation of the hardware,
|
||||
scheduling of transfers and handling of completed and/or failed
|
||||
transfers. Each host controller driver implements a virtual hub
|
||||
providing hardware independent access to the registers
|
||||
controlling the root ports on the back of the machine.</para>
|
||||
|
||||
<para>The middle layer handles the device connection and
|
||||
disconnection, basic initialisation of the device, driver
|
||||
selection, the communication channels (pipes) and does
|
||||
resource management. This services layer also controls the
|
||||
default pipes and the device requests transferred over
|
||||
them.</para>
|
||||
|
||||
<para>The top layer contains the individual drivers supporting
|
||||
specific (classes of) devices. These drivers implement the
|
||||
protocol that is used over the pipes other than the default
|
||||
pipe. They also implement additional functionality to make the
|
||||
device available to other parts of the kernel or userland. They
|
||||
use the USB driver interface (USBDI) exposed by the services
|
||||
layer.</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="usb-hc">
|
||||
<title>Host Controllers</title>
|
||||
|
||||
<para>The host controller (HC) controls the transmission of
|
||||
packets on the bus. Frames of 1 millisecond are used. At the
|
||||
start of each frame the host controller generates a Start of
|
||||
Frame (SOF) packet.</para>
|
||||
|
||||
<para>The SOF packet is used to synchronise to the start of the
|
||||
frame and to keep track of the frame number. Within each frame
|
||||
packets are transferred, either from host to device (out) or
|
||||
from device to host (in). Transfers are always initiated by the
|
||||
host (polled transfers). Therefore there can only be one host
|
||||
per USB bus. Each transfer of a packet has a status stage in
|
||||
which the recipient of the data can return either ACK
|
||||
(acknowledge reception), NAK (retry), STALL (error condition) or
|
||||
nothing (garbled data stage, device not available or
|
||||
disconnected). Section 8.5 of the <ulink
|
||||
url="http://www.usb.org/developers/docs.html">USB
|
||||
specification</ulink> explains the details of packets in more
|
||||
detail. Four different types of transfers can occur on a USB
|
||||
bus: control, bulk, interrupt and isochronous. The types of
|
||||
transfers and their characteristics are described below (`Pipes'
|
||||
subsection).</para>
|
||||
|
||||
<para>Large transfers between the device on the USB bus and the
|
||||
device driver are split up into multiple packets by the host
|
||||
controller or the HC driver.</para>
|
||||
|
||||
<para>Device requests (control transfers) to the default endpoints
|
||||
are special. They consist of two or three phases: SETUP, DATA
|
||||
(optional) and STATUS. The set-up packet is sent to the
|
||||
device. If there is a data phase, the direction of the data
|
||||
packet(s) is given in the set-up packet. The direction in the
|
||||
status phase is the opposite of the direction during the data
|
||||
phase, or IN if there was no data phase. The host controller
|
||||
hardware also provides registers with the current status of the
|
||||
root ports and the changes that have occurred since the last
|
||||
reset of the status change register. Access to these registers
|
||||
is provided through a virtualised hub as suggested in the USB
|
||||
specification [ 2]. The virtual hub must comply with the hub
|
||||
device class given in chapter 11 of that specification. It must
|
||||
provide a default pipe through which device requests can be sent
|
||||
to it. It returns the standard andhub class specific set of
|
||||
descriptors. It should also provide an interrupt pipe that
|
||||
reports changes happening at its ports. There are currently two
|
||||
specifications for host controllers available: <ulink
|
||||
url="http://developer.intel.com/design/USB/UHCI11D.htm">Universal
|
||||
Host Controller Interface</ulink> (UHCI; Intel) and <ulink
|
||||
url="http://www.compaq.com/productinfo/development/openhci.html">Open
|
||||
Host Controller Interface</ulink> (OHCI; Compaq, Microsoft,
|
||||
National Semiconductor). The UHCI specification has been
|
||||
designed to reduce hardware complexity by requiring the host
|
||||
controller driver to supply a complete schedule of the transfers
|
||||
for each frame. OHCI type controllers are much more independent
|
||||
by providing a more abstract interface doing alot of work
|
||||
themselves. </para>
|
||||
|
||||
<sect2>
|
||||
<title>UHCI</title>
|
||||
|
||||
<para>The UHCI host controller maintains a framelist with 1024
|
||||
pointers to per frame data structures. It understands two
|
||||
different data types: transfer descriptors (TD) and queue
|
||||
heads (QH). Each TD represents a packet to be communicated to
|
||||
or from a device endpoint. QHs are a means to groupTDs (and
|
||||
QHs) together.</para>
|
||||
|
||||
<para>Each transfer consists of one or more packets. The UHCI
|
||||
driver splits large transfers into multiple packets. For every
|
||||
transfer, apart from isochronous transfers, a QH is
|
||||
allocated. For every type of transfer these QHs are collected
|
||||
at a QH for that type. Isochronous transfers have to be
|
||||
executed first because of the fixed latency requirement and
|
||||
are directly referred to by the pointer in the framelist. The
|
||||
last isochronous TD refers to the QH for interrupt transfers
|
||||
for that frame. All QHs for interrupt transfers point at the
|
||||
QH for control transfers, which in turn points at the QH for
|
||||
bulk transfers. The following diagram gives a graphical
|
||||
overview of this:</para>
|
||||
|
||||
<para>This results in the following schedule being run in each
|
||||
frame. After fetching the pointer for the current frame from
|
||||
the framelist the controller first executes the TDs for all
|
||||
the isochronous packets in that frame. The last of these TDs
|
||||
refers to the QH for the interrupt transfers for
|
||||
thatframe. The host controller will then descend from that QH
|
||||
to the QHs for the individual interrupt transfers. After
|
||||
finishing that queue, the QH for the interrupt transfers will
|
||||
refer the controller to the QH for all control transfers. It
|
||||
will execute all the subqueues scheduled there, followed by
|
||||
all the transfers queued at the bulk QH. To facilitate the
|
||||
handling of finished or failed transfers different types of
|
||||
interrupts are generated by the hardware at the end of each
|
||||
frame. In the last TD for a transfer the Interrupt-On
|
||||
Completion bit is set by the HC driver to flag an interrupt
|
||||
when the transfer has completed. An error interrupt is flagged
|
||||
if a TD reaches its maximum error count. If the short packet
|
||||
detect bit is set in a TD and less than the set packet length
|
||||
is transferred this interrupt is flagged to notify
|
||||
the controller driver of the completed transfer. It is the host
|
||||
controller driver's task to find out which transfer has
|
||||
completed or produced an error. When called the interrupt
|
||||
service routine will locate all the finished transfers and
|
||||
call their callbacks.</para>
|
||||
|
||||
<para>See for a more elaborate description the <ulink
|
||||
url="http://developer.intel.com/design/USB/UHCI11D.htm">UHCI
|
||||
specification.</ulink></para>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>OHCI</title>
|
||||
|
||||
<para>Programming an OHCI host controller is much simpler. The
|
||||
controller assumes that a set of endpoints is available, and
|
||||
is aware of scheduling priorities and the ordering of the
|
||||
types of transfers in a frame. The main data structure used by
|
||||
the host controller is the endpoint descriptor (ED) to which
|
||||
aqueue of transfer descriptors (TDs) is attached. The ED
|
||||
contains the maximum packet size allowed for an endpoint and
|
||||
the controller hardware does the splitting into packets. The
|
||||
pointers to the data buffers are updated after each transfer
|
||||
and when the start and end pointer are equal, the TD is
|
||||
retired to the done-queue. The four types of endpoints have
|
||||
their own queues. Control and bulk endpoints are queued each at
|
||||
their own queue. Interrupt EDs are queued in a tree, with the
|
||||
level in the tree defining the frequency at which they
|
||||
run.</para>
|
||||
|
||||
<para>framelist interruptisochronous control bulk</para>
|
||||
|
||||
<para>The schedule being run by the host controller in each
|
||||
frame looks as follows. The controller will first run the
|
||||
non-periodic control and bulk queues, up to a time limit set
|
||||
by the HC driver. Then the interrupt transfers for that frame
|
||||
number are run, by using the lower five bits of the frame
|
||||
number as an index into level 0 of the tree of interrupts
|
||||
EDs. At the end of this tree the isochronous EDs are connected
|
||||
and these are traversed subsequently. The isochronous TDs
|
||||
contain the frame number of the first frame the transfer
|
||||
should be run in. After all the periodic transfers have been
|
||||
run, the control and bulk queues are traversed
|
||||
again. Periodically the interrupt service routine is called to
|
||||
process the done queue and call the callbacks for each
|
||||
transfer and reschedule interrupt and isochronous
|
||||
endpoints.</para>
|
||||
|
||||
<para>See for a more elaborate description the <ulink
|
||||
url="http://www.compaq.com/productinfo/development/openhci.html">
|
||||
OHCI specification</ulink>. Services layer The middle layer
|
||||
provides access to the device in a controlled way and
|
||||
maintains resources in use by the different drivers and the
|
||||
services layer. The layer takes care of the following
|
||||
aspects:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para>The device configuration
|
||||
information</para></listitem>
|
||||
<listitem><para>The pipes to communicate with a
|
||||
device</para></listitem>
|
||||
<listitem><para>Probing and attaching and detaching form a
|
||||
device.</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="usb-dev">
|
||||
<title>USB Device Information</title>
|
||||
|
||||
<sect2>
|
||||
<title>Device configuration information</title>
|
||||
|
||||
<para>Each device provides different levels of configuration
|
||||
information. Each device has one or more configurations, of
|
||||
which one is selected during probe/attach. A configuration
|
||||
provides power and bandwidth requirements. Within each
|
||||
configuration there can be multiple interfaces. A device
|
||||
interface is a collection of endpoints. For example USB
|
||||
speakers can have an interface for the audio data (Audio
|
||||
Class) and an interface for the knobs, dials and buttons (HID
|
||||
Class). All interfaces in a configuration are active at the
|
||||
same time and can be attached to by different drivers. Each
|
||||
interface can have alternates, providing different quality of
|
||||
service parameters. In for example cameras this is used to
|
||||
provide different frame sizes and numbers of frames per
|
||||
second.</para>
|
||||
|
||||
<para>Within each interface 0 or more endpoints can be
|
||||
specified. Endpoints are the unidirectional access points for
|
||||
communicating with a device. They provide buffers to
|
||||
temporarily store incoming or outgoing data from the
|
||||
device. Each endpoint has a unique address within
|
||||
a configuration, the endpoint's number plus its direction. The
|
||||
default endpoint, endpoint 0, is not part of any interface and
|
||||
available in all configurations. It is managed by the services
|
||||
layer and not directly available to device drivers.</para>
|
||||
|
||||
<para>Level 0 Level 1 Level 2 Slot 0</para>
|
||||
<para>Slot 3 Slot 2 Slot 1</para>
|
||||
<para>(Only 4 out of 32 slots shown)</para>
|
||||
|
||||
<para>This hierarchical configuration information is described
|
||||
in the device by a standard set of descriptors (see section 9.6
|
||||
of the USB specification [ 2]). They can be requested through
|
||||
the Get Descriptor Request. The services layer caches these
|
||||
descriptors to avoid unnecessary transfers on the USB
|
||||
bus. Access to the descriptors is provided through function
|
||||
calls.</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para>Device descriptors: General information about
|
||||
the device, like Vendor, Product and Revision Id, supported
|
||||
device class, subclass and protocol if applicable, maximum
|
||||
packet size for the default endpoint, etc.</para></listitem>
|
||||
|
||||
<listitem><para>Configuration descriptors: The number of
|
||||
interfaces in this configuration, suspend and resume
|
||||
functionality supported and power
|
||||
requirements.</para></listitem>
|
||||
|
||||
<listitem><para>Interface descriptors: interface class,
|
||||
subclass and protocol if applicable, number of alternate
|
||||
settings for the interface and the number of
|
||||
endpoints.</para></listitem>
|
||||
|
||||
<listitem><para>Endpoint descriptors: Endpoint address,
|
||||
direction and type, maximum packet size supported and
|
||||
polling frequency if type is interrupt endpoint. There is no
|
||||
descriptor for the default endpoint (endpoint 0) and it is
|
||||
never counted in an interface descriptor.</para></listitem>
|
||||
|
||||
<listitem><para>String descriptors: In the other descriptors
|
||||
string indices are supplied for some fields.These can be
|
||||
used to retrieve descriptive strings, possibly in multiple
|
||||
languages.</para></listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
<para>Class specifications can add their own descriptor types
|
||||
that are available through the GetDescriptor Request.</para>
|
||||
|
||||
<para>Pipes Communication to end points on a device flows
|
||||
through so-called pipes. Drivers submit transfers to endpoints
|
||||
to a pipe and provide a callback to be called on completion or
|
||||
failure of the transfer (asynchronous transfers) or wait for
|
||||
completion (synchronous transfer). Transfers to an endpoint
|
||||
are serialised in the pipe. A transfer can either complete,
|
||||
fail or time-out (if a time-out has been set). There are two
|
||||
types of time-outs for transfers. Time-outs can happen due to
|
||||
time-out on the USBbus (milliseconds). These time-outs are
|
||||
seen as failures and can be due to disconnection of the
|
||||
device. A second form of time-out is implemented in software
|
||||
and is triggered when a transfer does not complete within a
|
||||
specified amount of time (seconds). These are caused by a
|
||||
device acknowledging negatively (NAK) the transferred
|
||||
packets. The cause for this is the device not being ready to
|
||||
receive data, buffer under- or overrun or protocol
|
||||
errors.</para>
|
||||
|
||||
<para>If a transfer over a pipe is larger than the maximum
|
||||
packet size specified in the associated endpoint descriptor,
|
||||
the host controller (OHCI) or the HC driver (UHCI) will split
|
||||
the transfer into packets of maximum packet size, with the
|
||||
last packet possibly smaller than the maximum
|
||||
packet size.</para>
|
||||
|
||||
<para>Sometimes it is not a problem for a device to return less
|
||||
data than requested. For example abulk-in-transfer to a modem
|
||||
might request 200 bytes of data, but the modem has only 5
|
||||
bytes available at that time. The driver can set the short
|
||||
packet (SPD) flag. It allows the host controller to accept a
|
||||
packet even if the amount of data transferred is less than
|
||||
requested. This flag is only valid for in-transfers, as the
|
||||
amount of data to be sent to a device is always known
|
||||
beforehand. If an unrecoverable error occurs in a device
|
||||
during a transfer the pipe is stalled. Before any more data is
|
||||
accepted or sent the driver needs to resolve the cause of the
|
||||
stall and clear the endpoint stall condition through send the
|
||||
clear endpoint halt device request over the default
|
||||
pipe. The default endpoint should never stall.</para>
|
||||
|
||||
<para>There are four different types of endpoints and
|
||||
corresponding pipes: - Control pipe / default pipe: There is
|
||||
one control pipe per device, connected to the default endpoint
|
||||
(endpoint 0). The pipe carries the device requests and
|
||||
associated data. The difference between transfers over the
|
||||
default pipe and other pipes is that the protocol for
|
||||
the transfers is described in the USB specification [ 2]. These
|
||||
requests are used to reset and configure the device. A basic
|
||||
set of commands that must be supported by each device is
|
||||
provided in chapter 9 of the USB specification [ 2]. The
|
||||
commands supported on this pipe can be extended by a device
|
||||
class specification to support additional
|
||||
functionality.</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para>Bulk pipe: This is the USB equivalent to a raw
|
||||
transmission medium.</para></listitem>
|
||||
<listitem><para>Interrupt pipe: The host sends a request for
|
||||
data to the device and if the device has nothing to send, it
|
||||
will NAK the data packet. Interrupt transfers are scheduled
|
||||
at a frequency specified when creating the
|
||||
pipe.</para></listitem>
|
||||
|
||||
<listitem><para>Isochronous pipe: These pipes are intended for
|
||||
isochronous data, for example video or audio streams, with
|
||||
fixed latency, but no guaranteed delivery. Some support for
|
||||
pipes of this type is available in the current
|
||||
implementation. Packets in control, bulk and interrupt
|
||||
transfers are retried if an error occurs during transmission
|
||||
or the device acknowledges the packet negatively (NAK) due to
|
||||
for example lack of buffer space to store the incoming
|
||||
data. Isochronous packets are however not retried in case of
|
||||
failed delivery or NAK of a packet as this might violate the
|
||||
timing constraints.</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>The availability of the necessary bandwidth is calculated
|
||||
during the creation of the pipe. Transfers are scheduled within
|
||||
frames of 1 millisecond. The bandwidth allocation within a
|
||||
frame is prescribed by the USB specification, section 5.6 [
|
||||
2]. Isochronous and interrupt transfers are allowed to consume
|
||||
up to 90% of the bandwidth within a frame. Packets for control
|
||||
and bulk transfers are scheduled after all isochronous and
|
||||
interrupt packets and will consume all the remaining
|
||||
bandwidth.</para>
|
||||
|
||||
<para>More information on scheduling of transfers and bandwidth
|
||||
reclamation can be found in chapter 5of the USB specification
|
||||
[ 2], section 1.3 of the UHCI specification [ 3] and section
|
||||
3.4.2 of the OHCI specification [4].</para>
|
||||
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="usb-devprobe">
|
||||
<title>Device probe and attach</title>
|
||||
|
||||
<para>After the notification by the hub that a new device has been
|
||||
connected, the service layer switches on the port, providing the
|
||||
device with 100 mA of current. At this point the device is in
|
||||
its default state and listening to device address 0. The
|
||||
services layer will proceed to retrieve the various descriptors
|
||||
through the default pipe. After that it will send a Set Address
|
||||
request to move the device away from the default device address
|
||||
(address 0). Multiple device drivers might be able to support
|
||||
the device. For example a modem driver might be able to support
|
||||
an ISDN TA through the AT compatibility interface. A driver for
|
||||
that specific model of the ISDN adapter might however be able to
|
||||
provide much better support for this device. To support this
|
||||
flexibility, the probes return priorities indicating their level
|
||||
of support. Support for a specific revision of a product ranks
|
||||
the highest and the generic driver the lowest priority. It might
|
||||
also be that multiple drivers could attach to one device if
|
||||
there are multiple interfaces within one configuration. Each
|
||||
driver only needs to support a subset of the interfaces.</para>
|
||||
|
||||
<para>The probing for a driver for a newly attached device checks
|
||||
first for device specific drivers. If not found, the probe code
|
||||
iterates over all supported configurations until a driver
|
||||
attaches in a configuration. To support devices with multiple
|
||||
drivers on different interfaces, the probe iterates over all
|
||||
interfaces in a configuration that have not yet been claimed by
|
||||
a driver. Configurations that exceed the power budget for the
|
||||
hub are ignored. During attach the driver should initialise the
|
||||
device to its proper state, but not reset it, as this will make
|
||||
the device disconnect itself from the bus and restart the
|
||||
probing process for it. To avoid consuming unnecessary bandwidth
|
||||
should not claim the interrupt pipe at attach time, but
|
||||
should postpone allocating the pipe until the file is opened and
|
||||
the data is actually used. When the file is closed the pipe
|
||||
should be closed again, even though the device might still be
|
||||
attached.</para>
|
||||
|
||||
<sect2>
|
||||
<title>Device disconnect and detach</title>
|
||||
|
||||
<para>A device driver should expect to receive errors during any
|
||||
transaction with the device. The design of USB supports and
|
||||
encourages the disconnection of devices at any point in
|
||||
time. Drivers should make sure that they do the right thing
|
||||
when the device disappears.</para>
|
||||
|
||||
<para>Furthermore a device that has been disconnected and
|
||||
reconnected will not be reattached at the same device
|
||||
instance. This might change in the future when more devices
|
||||
support serial numbers (see the device descriptor) or other
|
||||
means of defining an identity for a device have been
|
||||
developed.</para>
|
||||
|
||||
<para>The disconnection of a device is signaled by a hub in the
|
||||
interrupt packet delivered to the hub driver. The status
|
||||
change information indicates which port has seen a connection
|
||||
change. The device detach method for all device drivers for
|
||||
the device connected on that port are called and the structures
|
||||
cleaned up. If the port status indicates that in the mean time
|
||||
a device has been connected to that port, the procedure for
|
||||
probing and attaching the device will be started. A device
|
||||
reset will produce a disconnect-connect sequence on the hub
|
||||
and will be handled as described above.</para>
|
||||
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="usb-protocol">
|
||||
<title>USB Drivers Protocol Information</title>
|
||||
|
||||
<para>The protocol used over pipes other than the default pipe is
|
||||
undefined by the USB specification. Information on this can be
|
||||
found from various sources. The most accurate source is the
|
||||
developer's section on the USB home pages [ 1]. From these pages
|
||||
a growing number of deviceclass specifications are
|
||||
available. These specifications specify what a compliant device
|
||||
should look like from a driver perspective, basic functionality
|
||||
it needs to provide and the protocol that is to be used over the
|
||||
communication channels. The USB specification [ 2] includes the
|
||||
description of the Hub Class. A class specification for Human
|
||||
Interface Devices (HID) has been created to cater for keyboards,
|
||||
tablets, bar-code readers, buttons, knobs, switches, etc. A
|
||||
third example is the class specification for mass storage
|
||||
devices. For a full list of device classes see the developers
|
||||
section on the USB home pages [ 1].</para>
|
||||
|
||||
<para>For many devices the protocol information has not yet been
|
||||
published however. Information on the protocol being used might
|
||||
be available from the company making the device. Some companies
|
||||
will require you to sign a Non -Disclosure Agreement (NDA)
|
||||
before giving you the specifications. This in most cases
|
||||
precludes making the driver open source.</para>
|
||||
|
||||
<para>Another good source of information is the Linux driver
|
||||
sources, as a number of companies have started to provide drivers
|
||||
for Linux for their devices. It is always a good idea to contact
|
||||
the authors of those drivers for their source of
|
||||
information.</para>
|
||||
|
||||
<para>Example: Human Interface Devices The specification for the
|
||||
Human Interface Devices like keyboards, mice, tablets, buttons,
|
||||
dials,etc. is referred to in other device class specifications
|
||||
and is used in many devices.</para>
|
||||
|
||||
<para>For example audio speakers provide endpoints to the digital
|
||||
to analogue converters and possibly an extra pipe for a
|
||||
microphone. They also provide a HID endpoint in a separate
|
||||
interface for the buttons and dials on the front of the
|
||||
device. The same is true for the monitor control class. It is
|
||||
straightforward to build support for these interfaces through
|
||||
the available kernel and userland libraries together with the
|
||||
HID class driver or the generic driver. Another device that
|
||||
serves as an example for interfaces within one configuration
|
||||
driven by different device drivers is a cheap keyboard with
|
||||
built-in legacy mouse port. To avoid having the cost of
|
||||
including the hardware for a USB hub in the device,
|
||||
manufacturers combined the mouse data received from the PS/2 port
|
||||
on the back of the keyboard and the key presses from the keyboard
|
||||
into two separate interfaces in the same configuration. The
|
||||
mouse and keyboard drivers each attach to the appropriate
|
||||
interface and allocate the pipes to the two independent
|
||||
endpoints.</para>
|
||||
|
||||
<para>Example: Firmware download Many devices that have been
|
||||
developed are based on a general purpose processor with
|
||||
an additional USB core added to it. Because the development of
|
||||
drivers and firmware for USB devices is still very new, many
|
||||
devices require the downloading of the firmware after they
|
||||
have been connected.</para>
|
||||
|
||||
<para>The procedure followed is straightforward. The device
|
||||
identifies itself through a vendor and product Id. The first
|
||||
driver probes and attaches to it and downloads the firmware into
|
||||
it. After that the device soft resets itself and the driver is
|
||||
detached. After a short pause the device announces its presence
|
||||
on the bus. The device will have changed its
|
||||
vendor/product/revision Id to reflect the fact that it has been
|
||||
supplied with firmware and as a consequence a second driver will
|
||||
probe it and attach to it.</para>
|
||||
|
||||
<para>An example of these types of devices is the ActiveWire I/O
|
||||
board, based on the EZ-USB chip. For this chip a generic firmware
|
||||
downloader is available. The firmware downloaded into the
|
||||
ActiveWire board changes the revision Id. It will then perform a
|
||||
soft reset of the USB part of the EZ-USB chip to disconnect from
|
||||
the USB bus and again reconnect.</para>
|
||||
|
||||
<para>Example: Mass Storage Devices Support for mass storage
|
||||
devices is mainly built around existing protocols. The Iomega
|
||||
USB Zipdrive is based on the SCSI version of their drive. The
|
||||
SCSI commands and status messages are wrapped in blocks and
|
||||
transferred over the bulk pipes to and from the device,
|
||||
emulating a SCSI controller over the USB wire. ATAPI and UFI
|
||||
commands are supported in a similar fashion.</para>
|
||||
|
||||
<para>The Mass Storage Specification supports 2 different types of
|
||||
wrapping of the command block.The initial attempt was based on
|
||||
sending the command and status through the default pipe and
|
||||
using bulk transfers for the data to be moved between the host
|
||||
and the device. Based on experience a second approach was
|
||||
designed that was based on wrapping the command and status
|
||||
blocks and sending them over the bulk out and in endpoint. The
|
||||
specification specifies exactly what has to happen when and what
|
||||
has to be done in case an error condition is encountered. The
|
||||
biggest challenge when writing drivers for these devices is to
|
||||
fit USB based protocol into the existing support for mass storage
|
||||
devices. CAM provides hooks to do this in a fairly straight
|
||||
forward way. ATAPI is less simple as historically the IDE
|
||||
interface has never had many different appearances.</para>
|
||||
|
||||
<para>The support for the USB floppy from Y-E Data is again less
|
||||
straightforward as a new command set has been designed.</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
</chapter>
|
|
@ -1,260 +0,0 @@
|
|||
<!--
|
||||
The FreeBSD Documentation Project
|
||||
|
||||
$FreeBSD$
|
||||
-->
|
||||
|
||||
<chapter id="vm">
|
||||
<chapterinfo>
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Matthew</firstname>
|
||||
<surname>Dillon</surname>
|
||||
<contrib>Contributed by </contrib>
|
||||
</author>
|
||||
</authorgroup>
|
||||
<!-- 6 Feb 1999 -->
|
||||
</chapterinfo>
|
||||
|
||||
<title>Virtual Memory System</title>
|
||||
|
||||
<sect1 id="vm-physmem">
|
||||
<title>Management of physical
|
||||
memory—<literal>vm_page_t</literal></title>
|
||||
|
||||
<para>Physical memory is managed on a page-by-page basis through the
|
||||
<literal>vm_page_t</literal> structure. Pages of physical memory are
|
||||
categorized through the placement of their respective
|
||||
<literal>vm_page_t</literal> structures on one of several paging
|
||||
queues.</para>
|
||||
|
||||
<para>A page can be in a wired, active, inactive, cache, or free state.
|
||||
Except for the wired state, the page is typically placed in a doubly
|
||||
link list queue representing the state that it is in. Wired pages
|
||||
are not placed on any queue.</para>
|
||||
|
||||
<para>FreeBSD implements a more involved paging queue for cached and
|
||||
free pages in order to implement page coloring. Each of these states
|
||||
involves multiple queues arranged according to the size of the
|
||||
processor's L1 and L2 caches. When a new page needs to be allocated,
|
||||
FreeBSD attempts to obtain one that is reasonably well aligned from
|
||||
the point of view of the L1 and L2 caches relative to the VM object
|
||||
the page is being allocated for.</para>
|
||||
|
||||
<para>Additionally, a page may be held with a reference count or locked
|
||||
with a busy count. The VM system also implements an <quote>ultimate
|
||||
locked</quote> state for a page using the PG_BUSY bit in the page's
|
||||
flags.</para>
|
||||
|
||||
<para>In general terms, each of the paging queues operates in a LRU
|
||||
fashion. A page is typically placed in a wired or active state
|
||||
initially. When wired, the page is usually associated with a page
|
||||
table somewhere. The VM system ages the page by scanning pages in a
|
||||
more active paging queue (LRU) in order to move them to a less-active
|
||||
paging queue. Pages that get moved into the cache are still
|
||||
associated with a VM object but are candidates for immediate reuse.
|
||||
Pages in the free queue are truly free. FreeBSD attempts to minimize
|
||||
the number of pages in the free queue, but a certain minimum number of
|
||||
truly free pages must be maintained in order to accommodate page
|
||||
allocation at interrupt time.</para>
|
||||
|
||||
<para>If a process attempts to access a page that does not exist in its
|
||||
page table but does exist in one of the paging queues (such as the
|
||||
inactive or cache queues), a relatively inexpensive page reactivation
|
||||
fault occurs which causes the page to be reactivated. If the page
|
||||
does not exist in system memory at all, the process must block while
|
||||
the page is brought in from disk.</para>
|
||||
|
||||
<para>FreeBSD dynamically tunes its paging queues and attempts to
|
||||
maintain reasonable ratios of pages in the various queues as well as
|
||||
attempts to maintain a reasonable breakdown of clean vs. dirty pages.
|
||||
The amount of rebalancing that occurs depends on the system's memory
|
||||
load. This rebalancing is implemented by the pageout daemon and
|
||||
involves laundering dirty pages (syncing them with their backing
|
||||
store), noticing when pages are activity referenced (resetting their
|
||||
position in the LRU queues or moving them between queues), migrating
|
||||
pages between queues when the queues are out of balance, and so forth.
|
||||
FreeBSD's VM system is willing to take a reasonable number of
|
||||
reactivation page faults to determine how active or how idle a page
|
||||
actually is. This leads to better decisions being made as to when to
|
||||
launder or swap-out a page.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="vm-cache">
|
||||
<title>The unified buffer
|
||||
cache—<literal>vm_object_t</literal></title>
|
||||
|
||||
<para>FreeBSD implements the idea of a generic <quote>VM object</quote>.
|
||||
VM objects can be associated with backing store of various
|
||||
types—unbacked, swap-backed, physical device-backed, or
|
||||
file-backed storage. Since the filesystem uses the same VM objects to
|
||||
manage in-core data relating to files, the result is a unified buffer
|
||||
cache.</para>
|
||||
|
||||
<para>VM objects can be <emphasis>shadowed</emphasis>. That is, they
|
||||
can be stacked on top of each other. For example, you might have a
|
||||
swap-backed VM object stacked on top of a file-backed VM object in
|
||||
order to implement a MAP_PRIVATE mmap()ing. This stacking is also
|
||||
used to implement various sharing properties, including
|
||||
copy-on-write, for forked address spaces.</para>
|
||||
|
||||
<para>It should be noted that a <literal>vm_page_t</literal> can only be
|
||||
associated with one VM object at a time. The VM object shadowing
|
||||
implements the perceived sharing of the same page across multiple
|
||||
instances.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="vm-fileio">
|
||||
<title>Filesystem I/O—<literal>struct buf</literal></title>
|
||||
|
||||
<para>vnode-backed VM objects, such as file-backed objects, generally
|
||||
need to maintain their own clean/dirty info independent from the VM
|
||||
system's idea of clean/dirty. For example, when the VM system decides
|
||||
to synchronize a physical page to its backing store, the VM system
|
||||
needs to mark the page clean before the page is actually written to
|
||||
its backing store. Additionally, filesystems need to be able to map
|
||||
portions of a file or file metadata into KVM in order to operate on
|
||||
it.</para>
|
||||
|
||||
<para>The entities used to manage this are known as filesystem buffers,
|
||||
<literal>struct buf</literal>'s, or
|
||||
<literal>bp</literal>'s. When a filesystem needs to operate on a
|
||||
portion of a VM object, it typically maps part of the object into a
|
||||
struct buf and the maps the pages in the struct buf into KVM. In the
|
||||
same manner, disk I/O is typically issued by mapping portions of
|
||||
objects into buffer structures and then issuing the I/O on the buffer
|
||||
structures. The underlying vm_page_t's are typically busied for the
|
||||
duration of the I/O. Filesystem buffers also have their own notion of
|
||||
being busy, which is useful to filesystem driver code which would
|
||||
rather operate on filesystem buffers instead of hard VM pages.</para>
|
||||
|
||||
<para>FreeBSD reserves a limited amount of KVM to hold mappings from
|
||||
struct bufs, but it should be made clear that this KVM is used solely
|
||||
to hold mappings and does not limit the ability to cache data.
|
||||
Physical data caching is strictly a function of
|
||||
<literal>vm_page_t</literal>'s, not filesystem buffers. However,
|
||||
since filesystem buffers are used to placehold I/O, they do inherently
|
||||
limit the amount of concurrent I/O possible. However, as there are usually a
|
||||
few thousand filesystem buffers available, this is not usually a
|
||||
problem.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="vm-pagetables">
|
||||
<title>Mapping Page Tables—<literal>vm_map_t, vm_entry_t</literal></title>
|
||||
|
||||
<para>FreeBSD separates the physical page table topology from the VM
|
||||
system. All hard per-process page tables can be reconstructed on the
|
||||
fly and are usually considered throwaway. Special page tables such as
|
||||
those managing KVM are typically permanently preallocated. These page
|
||||
tables are not throwaway.</para>
|
||||
|
||||
<para>FreeBSD associates portions of vm_objects with address ranges in
|
||||
virtual memory through <literal>vm_map_t</literal> and
|
||||
<literal>vm_entry_t</literal> structures. Page tables are directly
|
||||
synthesized from the
|
||||
<literal>vm_map_t</literal>/<literal>vm_entry_t</literal>/
|
||||
<literal>vm_object_t</literal> hierarchy. Recall that I mentioned
|
||||
that physical pages are only directly associated with a
|
||||
<literal>vm_object</literal>; that is not quite true.
|
||||
<literal>vm_page_t</literal>'s are also linked into page tables that
|
||||
they are actively associated with. One <literal>vm_page_t</literal>
|
||||
can be linked into several <emphasis>pmaps</emphasis>, as page tables
|
||||
are called. However, the hierarchical association holds, so all
|
||||
references to the same page in the same object reference the same
|
||||
<literal>vm_page_t</literal> and thus give us buffer cache unification
|
||||
across the board.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="vm-kvm">
|
||||
<title>KVM Memory Mapping</title>
|
||||
|
||||
<para>FreeBSD uses KVM to hold various kernel structures. The single
|
||||
largest entity held in KVM is the filesystem buffer cache. That is,
|
||||
mappings relating to <literal>struct buf</literal> entities.</para>
|
||||
|
||||
<para>Unlike Linux, FreeBSD does <emphasis>not</emphasis> map all of physical memory into
|
||||
KVM. This means that FreeBSD can handle memory configurations up to
|
||||
4G on 32 bit platforms. In fact, if the mmu were capable of it,
|
||||
FreeBSD could theoretically handle memory configurations up to 8TB on
|
||||
a 32 bit platform. However, since most 32 bit platforms are only
|
||||
capable of mapping 4GB of ram, this is a moot point.</para>
|
||||
|
||||
<para>KVM is managed through several mechanisms. The main mechanism
|
||||
used to manage KVM is the <emphasis>zone allocator</emphasis>. The
|
||||
zone allocator takes a chunk of KVM and splits it up into
|
||||
constant-sized blocks of memory in order to allocate a specific type
|
||||
of structure. You can use <command>vmstat -m</command> to get an
|
||||
overview of current KVM utilization broken down by zone.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="vm-tuning">
|
||||
<title>Tuning the FreeBSD VM system</title>
|
||||
|
||||
<para>A concerted effort has been made to make the FreeBSD kernel
|
||||
dynamically tune itself. Typically you do not need to mess with
|
||||
anything beyond the <option>maxusers</option> and
|
||||
<option>NMBCLUSTERS</option> kernel config options. That is, kernel
|
||||
compilation options specified in (typically)
|
||||
<filename>/usr/src/sys/i386/conf/<replaceable>CONFIG_FILE</replaceable></filename>.
|
||||
A description of all available kernel configuration options can be
|
||||
found in <filename>/usr/src/sys/i386/conf/LINT</filename>.</para>
|
||||
|
||||
<para>In a large system configuration you may wish to increase
|
||||
<option>maxusers</option>. Values typically range from 10 to 128.
|
||||
Note that raising <option>maxusers</option> too high can cause the
|
||||
system to overflow available KVM resulting in unpredictable operation.
|
||||
It is better to leave <option>maxusers</option> at some reasonable number and add other
|
||||
options, such as <option>NMBCLUSTERS</option>, to increase specific
|
||||
resources.</para>
|
||||
|
||||
<para>If your system is going to use the network heavily, you may want
|
||||
to increase <option>NMBCLUSTERS</option>. Typical values range from
|
||||
1024 to 4096.</para>
|
||||
|
||||
<para>The <literal>NBUF</literal> parameter is also traditionally used
|
||||
to scale the system. This parameter determines the amount of KVA the
|
||||
system can use to map filesystem buffers for I/O. Note that this
|
||||
parameter has nothing whatsoever to do with the unified buffer cache!
|
||||
This parameter is dynamically tuned in 3.0-CURRENT and later kernels
|
||||
and should generally not be adjusted manually. We recommend that you
|
||||
<emphasis>not</emphasis> try to specify an <literal>NBUF</literal>
|
||||
parameter. Let the system pick it. Too small a value can result in
|
||||
extremely inefficient filesystem operation while too large a value can
|
||||
starve the page queues by causing too many pages to become wired
|
||||
down.</para>
|
||||
|
||||
<para>By default, FreeBSD kernels are not optimized. You can set
|
||||
debugging and optimization flags with the
|
||||
<literal>makeoptions</literal> directive in the kernel configuration.
|
||||
Note that you should not use <option>-g</option> unless you can
|
||||
accommodate the large (typically 7 MB+) kernels that result.</para>
|
||||
|
||||
<programlisting>makeoptions DEBUG="-g"
|
||||
makeoptions COPTFLAGS="-O -pipe"</programlisting>
|
||||
|
||||
<para>Sysctl provides a way to tune kernel parameters at run-time. You
|
||||
typically do not need to mess with any of the sysctl variables,
|
||||
especially the VM related ones.</para>
|
||||
|
||||
<para>Run time VM and system tuning is relatively straightforward.
|
||||
First, use Soft Updates on your UFS/FFS filesystems whenever possible.
|
||||
<filename>/usr/src/sys/ufs/ffs/README.softupdates</filename> contains
|
||||
instructions (and restrictions) on how to configure it.</para>
|
||||
|
||||
<para>Second, configure sufficient swap. You should have a swap
|
||||
partition configured on each physical disk, up to four, even on your
|
||||
<quote>work</quote> disks. You should have at least 2x the swap space
|
||||
as you have main memory, and possibly even more if you do not have a
|
||||
lot of memory. You should also size your swap partition based on the
|
||||
maximum memory configuration you ever intend to put on the machine so
|
||||
you do not have to repartition your disks later on. If you want to be
|
||||
able to accommodate a crash dump, your first swap partition must be at
|
||||
least as large as main memory and <filename>/var/crash</filename> must
|
||||
have sufficient free space to hold the dump.</para>
|
||||
|
||||
<para>NFS-based swap is perfectly acceptable on 4.X or later systems,
|
||||
but you must be aware that the NFS server will take the brunt of the
|
||||
paging load.</para>
|
||||
</sect1>
|
||||
|
||||
</chapter>
|
Loading…
Reference in a new issue