Make the whitespace in this file consistent; that last merge was

horrible.
This commit is contained in:
Ceri Davies 2007-05-04 12:39:58 +00:00
parent eb6784ebf1
commit 389924ee76
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=30146

View file

@ -65,20 +65,20 @@
programming, but rather some general useful information.</para></listitem>
<listitem><para>The <ulink
url="&url.books.arch-handbook;/index.html">FreeBSD
Architecture Handbook</ulink> &mdash; also from the documentation
project, contains descriptions of several low-level facilities
and procedures. The most important chapter is 13, <ulink
url="&url.books.arch-handbook;/driverbasics.html">Writing
FreeBSD device drivers</ulink>.</para></listitem>
url="&url.books.arch-handbook;/index.html">FreeBSD
Architecture Handbook</ulink> &mdash; also from the documentation
project, contains descriptions of several low-level facilities
and procedures. The most important chapter is 13, <ulink
url="&url.books.arch-handbook;/driverbasics.html">Writing
FreeBSD device drivers</ulink>.</para></listitem>
<listitem><para>The Blueprints section of <ulink
url="http://www.freebsddiary.org">FreeBSD Diary</ulink> web
site &mdash; contains several interesting articles on kernel
facilities.</para></listitem>
url="http://www.freebsddiary.org">FreeBSD Diary</ulink> web
site &mdash; contains several interesting articles on kernel
facilities.</para></listitem>
<listitem><para>The man pages in section 9 &mdash; for important
documentation on kernel functions.</para></listitem>
documentation on kernel functions.</para></listitem>
<listitem><para>The &man.geom.4; man page and <ulink
url="http://phk.freebsd.dk/pubs/">PHK's GEOM slides</ulink>
@ -86,9 +86,9 @@
subsystem.</para></listitem>
<listitem><para>Man pages &man.g.bio.9;, &man.g.event.9;, &man.g.data.9;,
&man.g.geom.9;, &man.g.provider.9; &man.g.consumer.9;, &man.g.access.9;
&amp; others linked from those, for documentation on specific
functionalities.
&man.g.geom.9;, &man.g.provider.9; &man.g.consumer.9;, &man.g.access.9;
&amp; others linked from those, for documentation on specific
functionalities.
</para></listitem>
<listitem><para>The &man.style.9; man page &mdash; for documentation on
@ -125,8 +125,8 @@
<title>Modifying a system for development</title>
<para>For any kernel programming a kernel with
<option>INVARIANTS</option> enabled is a must-have. So enter
these in your kernel configuration file:</para>
<option>INVARIANTS</option> enabled is a must-have. So enter
these in your kernel configuration file:</para>
<programlisting>options INVARIANT_SUPPORT
options INVARIANTS</programlisting>
@ -138,41 +138,41 @@ options INVARIANTS</programlisting>
options WITNESS</programlisting>
<para>For debugging crash dumps, a kernel with debug symbols is
needed:</para>
needed:</para>
<programlisting> makeoptions DEBUG=-g</programlisting>
<para>With the usual way of installing the kernel (<command>make
installkernel</command>) the debug kernel will not be
automatically installed. It is called
<filename>kernel.debug</filename> and located in
<filename>/usr/obj/usr/src/sys/KERNELNAME/</filename>. For
convenience it should be copied to
<filename>/boot/kernel/</filename>.</para>
installkernel</command>) the debug kernel will not be
automatically installed. It is called
<filename>kernel.debug</filename> and located in
<filename>/usr/obj/usr/src/sys/KERNELNAME/</filename>. For
convenience it should be copied to
<filename>/boot/kernel/</filename>.</para>
<para>Another convenience is enabling the kernel debugger so you
can examine a kernel panic when it happens. For this, enter
the following lines in your kernel configuration file:</para>
can examine a kernel panic when it happens. For this, enter
the following lines in your kernel configuration file:</para>
<programlisting>options KDB
options DDB
options KDB_TRACE</programlisting>
<para>For this to work you might need to set a sysctl (if it is
not on by default):</para>
not on by default):</para>
<programlisting> debug.debugger_on_panic=1</programlisting>
<para>Kernel panics will happen, so care should be taken with
the filesystem cache. In particular, having softupdates might
mean the latest file version could be lost if a panic occurs
before it is committed to storage. Disabling softupdates
yields a great performance hit, and still does not guarantee
data consistency. Mounting filesystem with the "sync" option
is needed for that. For a compromise, the softupdates cache delays can
be shortened. There are three sysctl's that are useful for
this (best to be set in
<filename>/etc/sysctl.conf</filename>):</para>
the filesystem cache. In particular, having softupdates might
mean the latest file version could be lost if a panic occurs
before it is committed to storage. Disabling softupdates
yields a great performance hit, and still does not guarantee
data consistency. Mounting filesystem with the "sync" option
is needed for that. For a compromise, the softupdates cache delays can
be shortened. There are three sysctl's that are useful for
this (best to be set in
<filename>/etc/sysctl.conf</filename>):</para>
<programlisting>kern.filedelay=5
kern.dirdelay=4
@ -181,35 +181,35 @@ kern.metadelay=3</programlisting>
<para>The numbers represent seconds.</para>
<para>For debugging kernel panics, kernel core dumps are
required. Since a kernel panic might make filesystems
unusable, this crash dump is first written to a raw
partition. Usually, this is the swap partition. This partition must be at
least as large as the physical RAM in the machine. On the
next boot, the dump is copied to a regular file.
This happens after filesystems are checked and mounted, and
before swap is enabled. This is controlled with two
<filename>/etc/rc.conf</filename> variables:</para>
required. Since a kernel panic might make filesystems
unusable, this crash dump is first written to a raw
partition. Usually, this is the swap partition. This partition must be at
least as large as the physical RAM in the machine. On the
next boot, the dump is copied to a regular file.
This happens after filesystems are checked and mounted, and
before swap is enabled. This is controlled with two
<filename>/etc/rc.conf</filename> variables:</para>
<programlisting>dumpdev="/dev/ad0s4b"
dumpdir="/usr/core </programlisting>
<para>The <varname>dumpdev</varname> variable specifies the swap
partition and <varname>dumpdir</varname> tells the system
where in the filesystem to relocate the core dump on reboot.</para>
partition and <varname>dumpdir</varname> tells the system
where in the filesystem to relocate the core dump on reboot.</para>
<para>Writing kernel core dumps is slow and takes a long time so
if you have lots of memory (>256M) and lots of panics it could
be frustrating to sit and wait while it is done (twice &mdash; first
to write it to swap, then to relocate it to filesystem). It is
convenient then to limit the amount of RAM the system will use
via a <filename>/boot/loader.conf</filename> tunable:</para>
if you have lots of memory (>256M) and lots of panics it could
be frustrating to sit and wait while it is done (twice &mdash; first
to write it to swap, then to relocate it to filesystem). It is
convenient then to limit the amount of RAM the system will use
via a <filename>/boot/loader.conf</filename> tunable:</para>
<programlisting> hw.physmem="256M"</programlisting>
<para>If the panics are frequent and filesystems large (or you
simply do not trust softupdates+background fsck) it is advisable
to turn background fsck off via
<filename>/etc/rc.conf</filename> variable:</para>
simply do not trust softupdates+background fsck) it is advisable
to turn background fsck off via
<filename>/etc/rc.conf</filename> variable:</para>
<programlisting> background_fsck="NO"</programlisting>
@ -233,13 +233,13 @@ dumpdir="/usr/core </programlisting>
<title>The Makefile</title>
<para>It is good practice to create
<filename>Makefile</filename>s for every nontrivial coding
project, which of course includes kernel modules.</para>
<filename>Makefile</filename>s for every nontrivial coding
project, which of course includes kernel modules.</para>
<para>Creating the <filename>Makefile</filename> is simple
thanks to an extensive set of helper routines provided by the
system. In short, here is how a minimal <filename>Makefile</filename>
looks for a kernel module:</para>
thanks to an extensive set of helper routines provided by the
system. In short, here is how a minimal <filename>Makefile</filename>
looks for a kernel module:</para>
<programlisting>SRCS=g_journal.c
KMOD=geom_journal
@ -247,10 +247,10 @@ KMOD=geom_journal
.include &lt;bsd.kmod.mk&gt;</programlisting>
<para>This <filename>Makefile</filename> (with changed filenames)
will do for any kernel module, and a GEOM class can reside in just
one kernel module. If more than one file is required, list it in the
<envar>SRCS</envar> variable, separated with whitespace from
other filenames.</para>
will do for any kernel module, and a GEOM class can reside in just
one kernel module. If more than one file is required, list it in the
<envar>SRCS</envar> variable, separated with whitespace from
other filenames.</para>
</sect2>
</sect1>
@ -261,13 +261,13 @@ KMOD=geom_journal
<title>Memory allocation</title>
<para>See &man.malloc.9;. Basic memory allocation is only
slightly different than its userland equivalent. Most
notably, <function>malloc</function>() and
<function>free</function>() accept additional parameters as is
described in the man page.</para>
slightly different than its userland equivalent. Most
notably, <function>malloc</function>() and
<function>free</function>() accept additional parameters as is
described in the man page.</para>
<para>A <quote>malloc type</quote> must be declared in the
declaration section of a source file, like this:</para>
declaration section of a source file, like this:</para>
<programlisting> static MALLOC_DEFINE(M_GJOURNAL, "gjournal data", "GEOM_JOURNAL Data");</programlisting>
@ -277,24 +277,24 @@ KMOD=geom_journal
included.</para>
<para>There is another mechanism for allocating memory, the UMA
(Universal Memory Allocator). See &man.uma.9; for details, but
it is a special type of allocator mainly used for speedy
allocation of lists comprised of same-sized items (for
example, dynamic arrays of structs).</para>
(Universal Memory Allocator). See &man.uma.9; for details, but
it is a special type of allocator mainly used for speedy
allocation of lists comprised of same-sized items (for
example, dynamic arrays of structs).</para>
</sect2>
<sect2 id="kernelprog-lists">
<title>Lists and queues</title>
<para>See &man.queue.3;. There are a LOT of cases when a list of
things needs to be maintained. Fortunately, this data
structure is implemented (in several ways) by C macros
included in the system. The most used list type is TAILQ
because it is the most flexible. It is also the one with largest
memory requirements (its elements are doubly-linked) and
also the slowest (although the speed variation is on
the order of several CPU instructions more, so it should not be
taken seriously).</para>
things needs to be maintained. Fortunately, this data
structure is implemented (in several ways) by C macros
included in the system. The most used list type is TAILQ
because it is the most flexible. It is also the one with largest
memory requirements (its elements are doubly-linked) and
also the slowest (although the speed variation is on
the order of several CPU instructions more, so it should not be
taken seriously).</para>
<para>If data retrieval speed is very important, see
&man.tree.3; and &man.hashinit.9;.</para>
@ -304,31 +304,31 @@ KMOD=geom_journal
<title>BIOs</title>
<para>Structure <structname>bio</structname> is used for any and
all Input/Output operations concerning GEOM. It basically
contains information about what device ('provider') should
satisfy the request, request type, offset, length, pointer to
a buffer, and a bunch of <quote>user-specific</quote> flags
and fields that can help implement various hacks.</para>
all Input/Output operations concerning GEOM. It basically
contains information about what device ('provider') should
satisfy the request, request type, offset, length, pointer to
a buffer, and a bunch of <quote>user-specific</quote> flags
and fields that can help implement various hacks.</para>
<para>The important thing here is that <structname>bio</structname>s
are handled asynchronously. That means that, in most parts of the code,
there is no analogue to userland's &man.read.2; and
&man.write.2; calls that do not return until a request is
done. Rather, a developer-supplied function is called as a
notification when the request gets completed (or results in
error).</para>
are handled asynchronously. That means that, in most parts of the code,
there is no analogue to userland's &man.read.2; and
&man.write.2; calls that do not return until a request is
done. Rather, a developer-supplied function is called as a
notification when the request gets completed (or results in
error).</para>
<para>The asynchronous programming model (also
called "event-driven") is somewhat harder
than the much more used imperative one used in userland
(at least it takes a
while to get used to it). In some cases the helper routines
<function>g_write_data</function>() and
<function>g_read_data</function>() can be used, but <emphasis>not
always</emphasis>. In particular, they cannot be used when
a mutex is held; for example, the GEOM topology mutex or
the internal mutex held during the <function>.start</function>() and
<function>.stop</function>() functions.</para>
called "event-driven") is somewhat harder
than the much more used imperative one used in userland
(at least it takes a
while to get used to it). In some cases the helper routines
<function>g_write_data</function>() and
<function>g_read_data</function>() can be used, but <emphasis>not
always</emphasis>. In particular, they cannot be used when
a mutex is held; for example, the GEOM topology mutex or
the internal mutex held during the <function>.start</function>() and
<function>.stop</function>() functions.</para>
</sect2>
</sect1>
@ -340,52 +340,52 @@ KMOD=geom_journal
<title>Ggate</title>
<para>If maximum performance is not needed, a much simpler way
of making a data transformation is to implement it in userland
via the ggate (GEOM gate) facility. Unfortunately, there is no
easy way to convert between, or even share code between the
two approaches.</para>
of making a data transformation is to implement it in userland
via the ggate (GEOM gate) facility. Unfortunately, there is no
easy way to convert between, or even share code between the
two approaches.</para>
</sect2>
<sect2 id="geom-class">
<title>GEOM class</title>
<para>GEOM classes are transformations on the data. These transformations
can be combined in a tree-like fashion. Instances of GEOM classes are
called <emphasis>geoms</emphasis>.</para>
can be combined in a tree-like fashion. Instances of GEOM classes are
called <emphasis>geoms</emphasis>.</para>
<para>Each GEOM class has several "class methods" that get called
when there is no geom instance available (or they are simply not
bound to a single instance):</para>
when there is no geom instance available (or they are simply not
bound to a single instance):</para>
<itemizedlist>
<listitem><para><function>.init</function> is called when GEOM
becomes aware of a GEOM class (e.g. when the kernel module
gets loaded.)</para></listitem>
becomes aware of a GEOM class (e.g. when the kernel module
gets loaded.)</para></listitem>
<listitem><para><function>.fini</function> gets called when GEOM
abandons the class (e.g. when the module gets
unloaded)</para></listitem>
<listitem><para><function>.fini</function> gets called when GEOM
abandons the class (e.g. when the module gets
unloaded)</para></listitem>
<listitem><para><function>.taste</function> is called next, once for
each provider the system has available. If applicable, this
function will usually create and start a geom
instance.</para></listitem>
<listitem><para><function>.taste</function> is called next, once for
each provider the system has available. If applicable, this
function will usually create and start a geom
instance.</para></listitem>
<listitem><para><function>.destroy_geom</function> is called when
the geom should be disbanded</para></listitem>
<listitem><para><function>.destroy_geom</function> is called when
the geom should be disbanded</para></listitem>
<listitem><para><function>.ctlconf</function> is called when user
requests reconfiguration of existing geom</para></listitem>
<listitem><para><function>.ctlconf</function> is called when user
requests reconfiguration of existing geom</para></listitem>
</itemizedlist>
<para>Also defined are the GEOM event functions, which will get
copied to the geom instance.</para>
copied to the geom instance.</para>
<para>Field <function>.geom</function> in the
<structname>g_class</structname> structure is a LIST of geoms
instantiated from the class.</para>
<structname>g_class</structname> structure is a LIST of geoms
instantiated from the class.</para>
<para>These functions are called from the g_event kernel thread.</para>
@ -395,29 +395,29 @@ KMOD=geom_journal
<title>Softc</title>
<para>The name <quote>softc</quote> is a legacy term for
<quote>driver private data</quote>. The name most probably
comes from the archaic term <quote>software control block</quote>.
In GEOM, it is a structure (more precise: pointer to a
structure) that can be attached to a geom instance to hold
whatever data is private to the geom instance. Most GEOM classes
have the following members:</para>
<quote>driver private data</quote>. The name most probably
comes from the archaic term <quote>software control block</quote>.
In GEOM, it is a structure (more precise: pointer to a
structure) that can be attached to a geom instance to hold
whatever data is private to the geom instance. Most GEOM classes
have the following members:</para>
<itemizedlist>
<listitem><para><varname>struct g_provider *provider</varname> : The
<quote>provider</quote> this geom instantiates</para></listitem>
<listitem><para><varname>struct g_provider *provider</varname> : The
<quote>provider</quote> this geom instantiates</para></listitem>
<listitem><para><varname>uint16_t n_disks</varname> : Number of
consumer this geom consumes</para></listitem>
<listitem><para><varname>uint16_t n_disks</varname> : Number of
consumer this geom consumes</para></listitem>
<listitem><para><varname>struct g_consumer **disks</varname> : Array
of <varname>struct g_consumer*</varname>. (It is not possible
to use just single indirection because struct g_consumer*
are created on our behalf by GEOM).</para></listitem>
<listitem><para><varname>struct g_consumer **disks</varname> : Array
of <varname>struct g_consumer*</varname>. (It is not possible
to use just single indirection because struct g_consumer*
are created on our behalf by GEOM).</para></listitem>
</itemizedlist>
<para>The <structname>softc</structname> structure contains all
the state of geom instance. Every geom instance has its own
softc.</para>
the state of geom instance. Every geom instance has its own
softc.</para>
</sect2>
<sect2 id="geom-metadata">
@ -428,15 +428,15 @@ KMOD=geom_journal
<itemizedlist>
<listitem><para>16 byte buffer for null-terminated signature
(usually the class name)</para></listitem>
<listitem><para>16 byte buffer for null-terminated signature
(usually the class name)</para></listitem>
<listitem><para>uint32 version ID</para></listitem>
<listitem><para>uint32 version ID</para></listitem>
</itemizedlist>
<para>It is assumed that geom classes know how to handle metadata
with version ID's lower than theirs.</para>
with version ID's lower than theirs.</para>
<para>Metadata is located in the last sector of the provider
(and thus must fit in it).</para>
@ -455,15 +455,15 @@ KMOD=geom_journal
<listitem><para>user calls &man.geom.8; utility (or one of its
hardlinked friends)</para></listitem>
<listitem><para>the utility figures out which geom class it is
supposed to handle and searches for
<filename>geom_<replaceable>CLASSNAME</replaceable>.so</filename>
library (usually in
<filename>/lib/geom</filename>).</para></listitem>
<listitem><para>the utility figures out which geom class it is
supposed to handle and searches for
<filename>geom_<replaceable>CLASSNAME</replaceable>.so</filename>
library (usually in
<filename>/lib/geom</filename>).</para></listitem>
<listitem><para>it &man.dlopen.3;-s the library, extracts the
definitions of command-line parameters and helper
functions.</para></listitem>
<listitem><para>it &man.dlopen.3;-s the library, extracts the
definitions of command-line parameters and helper
functions.</para></listitem>
</itemizedlist>
@ -473,23 +473,23 @@ KMOD=geom_journal
<itemizedlist>
<listitem><para>&man.geom.8; looks in the command-line definition
for the command (usually "label"), and calls a helper
function.</para></listitem>
for the command (usually "label"), and calls a helper
function.</para></listitem>
<listitem><para>helper function checks parameters and gathers
metadata, which it proceeds to write to all concerned
providers.</para></listitem>
<listitem><para>helper function checks parameters and gathers
metadata, which it proceeds to write to all concerned
providers.</para></listitem>
<listitem><para>this "spoils" existing geoms (if any) and
initializes a new round of "tasting" of the providers. The
intended geom class recognizes the metadata and brings the
geom up.</para></listitem>
<listitem><para>this "spoils" existing geoms (if any) and
initializes a new round of "tasting" of the providers. The
intended geom class recognizes the metadata and brings the
geom up.</para></listitem>
</itemizedlist>
<para>(The above sequence of events is implementation-dependent
but all existing code works like that, and it is supported by
libraries.)</para>
but all existing code works like that, and it is supported by
libraries.)</para>
</sect2>
@ -497,9 +497,9 @@ KMOD=geom_journal
<title>Geom command structure</title>
<para>The helper <filename>geom_CLASSNAME.so</filename> library
exports <structname>class_commands</structname> structure,
which is an array of <structname>struct g_command</structname>
elements. Commands are of uniform format and look like:</para>
exports <structname>class_commands</structname> structure,
which is an array of <structname>struct g_command</structname>
elements. Commands are of uniform format and look like:</para>
<programlisting> verb [-options] geomname [other]</programlisting>
@ -508,10 +508,10 @@ KMOD=geom_journal
<itemizedlist>
<listitem><para>label &mdash; to write metadata to devices so they can be
recognized at tasting and brought up in geoms</para></listitem>
recognized at tasting and brought up in geoms</para></listitem>
<listitem><para>destroy &mdash; to destroy metadata, so the geoms get
destroyed</para></listitem>
<listitem><para>destroy &mdash; to destroy metadata, so the geoms get
destroyed</para></listitem>
</itemizedlist>
@ -519,26 +519,26 @@ KMOD=geom_journal
<itemizedlist>
<listitem><para><literal>-v</literal> : be verbose</para></listitem>
<listitem><para><literal>-f</literal> : force</para></listitem>
<listitem><para><literal>-f</literal> : force</para></listitem>
</itemizedlist>
<para>Many actions, such as labeling and destroying metadata can
be performed in userland. For this, <structname>struct
g_command</structname> provides field
<varname>gc_func</varname> that can be set to a function (in
the same <filename>.so</filename>) that will be called to
process a verb. If <varname>gc_func</varname> is NULL, the
command will be passed to kernel module, to
<function>.ctlreq</function> function of the geom
class.</para>
be performed in userland. For this, <structname>struct
g_command</structname> provides field
<varname>gc_func</varname> that can be set to a function (in
the same <filename>.so</filename>) that will be called to
process a verb. If <varname>gc_func</varname> is NULL, the
command will be passed to kernel module, to
<function>.ctlreq</function> function of the geom
class.</para>
</sect2>
<sect2 id="geom-geoms">
<title>Geoms</title>
<para>Geoms are instances of GEOM classes. They have internal
data (a softc structure) and some functions with which they
respond to external events.</para>
data (a softc structure) and some functions with which they
respond to external events.</para>
<para>The event functions are:</para>
@ -549,24 +549,24 @@ KMOD=geom_journal
<listitem><para><function>.dumpconf</function> : returns
XML-formatted information about the geom</para></listitem>
<listitem><para><function>.orphan</function> : called when some
underlying provider gets disconnected</para></listitem>
<listitem><para><function>.orphan</function> : called when some
underlying provider gets disconnected</para></listitem>
<listitem><para><function>.spoiled</function> : called when some
underlying provider gets written to</para></listitem>
<listitem><para><function>.spoiled</function> : called when some
underlying provider gets written to</para></listitem>
<listitem><para><function>.start</function> : handles I/O</para></listitem>
<listitem><para><function>.start</function> : handles I/O</para></listitem>
</itemizedlist>
<para>These functions are called from the <function>g_down</function>
kernel thread and there can be no sleeping in this context,
(see definition of sleeping elsewhere) which limits what can be done
quite a bit, but forces the handling to be fast.</para>
kernel thread and there can be no sleeping in this context,
(see definition of sleeping elsewhere) which limits what can be done
quite a bit, but forces the handling to be fast.</para>
<para>Of these, the most important function for doing actual
useful work is the <function>.start</function>() function,
which is called when a BIO request arrives for a provider
managed by a instance of geom class.</para>
useful work is the <function>.start</function>() function,
which is called when a BIO request arrives for a provider
managed by a instance of geom class.</para>
</sect2>
<sect2 id="geom-threads">
@ -576,143 +576,143 @@ KMOD=geom_journal
framework:</para>
<itemizedlist>
<listitem><para><literal>g_down</literal> : Handles requests coming
from high-level entities (such as a userland request) on the
way to physical devices</para></listitem>
<listitem><para><literal>g_down</literal> : Handles requests coming
from high-level entities (such as a userland request) on the
way to physical devices</para></listitem>
<listitem><para><literal>g_up</literal> : Handles responses from
device drivers to requests made by higher-level
entities</para></listitem>
<listitem><para><literal>g_up</literal> : Handles responses from
device drivers to requests made by higher-level
entities</para></listitem>
<listitem><para><literal>g_event</literal> : Handles all other
cases: creation of geom instances, access counting, "spoil"
events, etc.</para></listitem>
<listitem><para><literal>g_event</literal> : Handles all other
cases: creation of geom instances, access counting, "spoil"
events, etc.</para></listitem>
</itemizedlist>
<para>When a user process issues <quote>read data X at offset Y
of a file</quote> request, this is what happens:</para>
of a file</quote> request, this is what happens:</para>
<itemizedlist>
<listitem><para>The filesystem converts the request into a struct bio
instance and passes it to the GEOM subsystem. It knows what geom
instance should handle it because filesystems are hosted
directly on a geom instance.</para></listitem>
instance and passes it to the GEOM subsystem. It knows what geom
instance should handle it because filesystems are hosted
directly on a geom instance.</para></listitem>
<listitem><para>The request ends up as a call to the
<function>.start</function>() function made on the g_down
thread and reaches the top-level geom instance.</para></listitem>
<listitem><para>The request ends up as a call to the
<function>.start</function>() function made on the g_down
thread and reaches the top-level geom instance.</para></listitem>
<listitem><para>This top-level geom instance (for example the
partition slicer) determines that the request should be
routed to a lower-level instance (for example the disk
driver). It makes a copy of the bio request (bio requests
<emphasis>ALWAYS</emphasis> need to be copied between
instances, with <function>g_clone_bio</function>()!),
modifies the data offset and target provider fields and
executes the copy with
<function>g_io_request</function>()</para></listitem>
<listitem><para>This top-level geom instance (for example the
partition slicer) determines that the request should be
routed to a lower-level instance (for example the disk
driver). It makes a copy of the bio request (bio requests
<emphasis>ALWAYS</emphasis> need to be copied between
instances, with <function>g_clone_bio</function>()!),
modifies the data offset and target provider fields and
executes the copy with
<function>g_io_request</function>()</para></listitem>
<listitem><para>The disk driver gets the bio request also as a call
to <function>.start</function>() on the
<literal>g_down</literal> thread. It talks to hardware,
gets the data back, and calls
<function>g_io_deliver</function>() on the bio.</para></listitem>
<listitem><para>The disk driver gets the bio request also as a call
to <function>.start</function>() on the
<literal>g_down</literal> thread. It talks to hardware,
gets the data back, and calls
<function>g_io_deliver</function>() on the bio.</para></listitem>
<listitem><para>Now, the notification of bio completion
<quote>bubbles up</quote> in the <literal>g_up</literal>
thread. First the partition slicer gets
<function>.done</function>() called in the
<literal>g_up</literal> thread, it uses information stored
in the bio to free the cloned <structname>bio</structname>
structure (with <function>g_destroy_bio</function>()) and
calls <function>g_io_deliver</function>() on the original
request.</para></listitem>
<listitem><para>Now, the notification of bio completion
<quote>bubbles up</quote> in the <literal>g_up</literal>
thread. First the partition slicer gets
<function>.done</function>() called in the
<literal>g_up</literal> thread, it uses information stored
in the bio to free the cloned <structname>bio</structname>
structure (with <function>g_destroy_bio</function>()) and
calls <function>g_io_deliver</function>() on the original
request.</para></listitem>
<listitem><para>The filesystem gets the data and transfers it to
userland.</para></listitem>
<listitem><para>The filesystem gets the data and transfers it to
userland.</para></listitem>
</itemizedlist>
<para>See &man.g.bio.9; man page for information how the data is
passed back and forth in the <structname>bio</structname>
structure (note in particular the <varname>bio_parent</varname>
and <varname>bio_children</varname> fields and how they are
handled).</para>
passed back and forth in the <structname>bio</structname>
structure (note in particular the <varname>bio_parent</varname>
and <varname>bio_children</varname> fields and how they are
handled).</para>
<para>One important feature is: <emphasis>THERE CAN BE NO SLEEPING IN G_UP
AND G_DOWN THREADS</emphasis>. This means that none of the following
things can be done in those threads (the list is of course not
complete, but only informative):</para>
AND G_DOWN THREADS</emphasis>. This means that none of the following
things can be done in those threads (the list is of course not
complete, but only informative):</para>
<itemizedlist>
<listitem><para>Calls to <function>msleep</function>() and
<function>tsleep</function>(), obviously.</para></listitem>
<listitem><para>Calls to <function>msleep</function>() and
<function>tsleep</function>(), obviously.</para></listitem>
<listitem><para>Calls to <function>g_write_data</function>() and
<function>g_read_data</function>(), because these sleep
between passing the data to consumers and
returning.</para></listitem>
<listitem><para>Calls to <function>g_write_data</function>() and
<function>g_read_data</function>(), because these sleep
between passing the data to consumers and
returning.</para></listitem>
<listitem><para>Waiting for I/O.</para></listitem>
<listitem><para>Waiting for I/O.</para></listitem>
<listitem><para>Calls to &man.malloc.9; and
<function>uma_zalloc</function>() with
<varname>M_WAITOK</varname> flag set</para></listitem>
<listitem><para>Calls to &man.malloc.9; and
<function>uma_zalloc</function>() with
<varname>M_WAITOK</varname> flag set</para></listitem>
<listitem><para>sx and other sleepable locks</para></listitem>
<listitem><para>sx and other sleepable locks</para></listitem>
</itemizedlist>
<para>This restriction is here to stop GEOM code clogging the I/O
request path, since sleeping is usually not
time-bound and there can be no guarantees on how long will it
take (there are some other, more technical reasons also). It
also means that there is not much that can be done in those
threads; for example, almost any complex thing requires memory
allocation. Fortunately, there is a way out: creating
additional kernel threads.</para>
request path, since sleeping is usually not
time-bound and there can be no guarantees on how long will it
take (there are some other, more technical reasons also). It
also means that there is not much that can be done in those
threads; for example, almost any complex thing requires memory
allocation. Fortunately, there is a way out: creating
additional kernel threads.</para>
</sect2>
<sect2 id="geom-kernelthreads">
<title>Kernel threads for use in geom code</title>
<para>Kernel threads are created with &man.kthread.create.9;
function, and they are sort of similar to userland threads in
behaviour, only they cannot return to caller to signify
termination, but must call &man.kthread.exit.9;.</para>
function, and they are sort of similar to userland threads in
behaviour, only they cannot return to caller to signify
termination, but must call &man.kthread.exit.9;.</para>
<para>In GEOM code, the usual use of threads is to offload
processing of requests from <literal>g_down</literal> thread
(the <function>.start</function>() function). These threads
look like <quote>event handlers</quote>: they have a linked
list of event associated with them (on which events can be posted
by various functions in various threads so it must be
protected by a mutex), take the events from the list one by
one and process them in a big <literal>switch</literal>()
statement.</para>
processing of requests from <literal>g_down</literal> thread
(the <function>.start</function>() function). These threads
look like <quote>event handlers</quote>: they have a linked
list of event associated with them (on which events can be posted
by various functions in various threads so it must be
protected by a mutex), take the events from the list one by
one and process them in a big <literal>switch</literal>()
statement.</para>
<para>The main benefit of using a thread to handle I/O requests
is that it can sleep when needed. Now, this sounds good, but
should be carefully thought out. Sleeping is well and very
convenient but can very effectively destroy performance of the
geom transformation. Extremely performance-sensitive classes
probably should do all the work in
<function>.start</function>() function call, taking great care
to handle out-of-memory and similar errors.</para>
is that it can sleep when needed. Now, this sounds good, but
should be carefully thought out. Sleeping is well and very
convenient but can very effectively destroy performance of the
geom transformation. Extremely performance-sensitive classes
probably should do all the work in
<function>.start</function>() function call, taking great care
to handle out-of-memory and similar errors.</para>
<para>The other benefit of having a event-handler thread like
that is to serialize all the requests and responses coming
from different geom threads into one thread. This is also very
convenient but can be slow. In most cases, handling of
<function>.done</function>() requests can be left to the
<literal>g_up</literal> thread.</para>
that is to serialize all the requests and responses coming
from different geom threads into one thread. This is also very
convenient but can be slow. In most cases, handling of
<function>.done</function>() requests can be left to the
<literal>g_up</literal> thread.</para>
<para>Mutexes in FreeBSD kernel (see &man.mutex.9;) have
one distinction from their more common userland cousins &mdash; the
code cannot sleep while holding
a mutex). If the code needs to sleep a lot, &man.sx.9; locks
may be more appropriate. On the other hand, if you do almost
everything in a single thread, you may get away with no
mutexes at all.</para>
one distinction from their more common userland cousins &mdash; the
code cannot sleep while holding
a mutex). If the code needs to sleep a lot, &man.sx.9; locks
may be more appropriate. On the other hand, if you do almost
everything in a single thread, you may get away with no
mutexes at all.</para>
</sect2>