Small improvements and clarifications. Some are mine, most were:

Submitted by:	ivoras
This commit is contained in:
Ceri Davies 2007-05-04 12:32:03 +00:00
parent 571762032c
commit a9ad3db4c7
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=30144

View file

@ -16,7 +16,7 @@
<firstname>Ivan</firstname>
<surname>Voras</surname>
<affiliation>
<address><email>ivoras@yahoo.com</email>
<address><email>ivoras@FreeBSD.org</email>
</address>
</affiliation>
</author>
@ -34,10 +34,9 @@
<abstract>
<para>This text documents the way I created the gjournal
facility, starting with learning how to do kernel
programming. It is assumed that the reader is familiar with C
userland programming.</para>
<para>This text documents some starting points in developing
GEOM classes, and kernel modules in general. It is assumed
that the reader is familiar with C userland programming.</para>
</abstract>
@ -50,7 +49,7 @@
<sect2 id="intro-docs">
<title>Documentation</title>
<para>Documentation on kernel programming is scarce - it is one of
<para>Documentation on kernel programming is scarce &mdash; it is one of
few areas where there is nearly nothing in the way of friendly
tutorials, and the phrase <quote>use the source!</quote> really
holds true. However, there are some bits and pieces (some of
@ -61,14 +60,13 @@
<listitem><para>The <ulink
url="&url.books.developers-handbook;/index.html">FreeBSD
Developer's Handbook</ulink> - part of the documentation
project, it does not contain anything specific to kernel-land
programming, but rather some general
information.</para></listitem>
Developer's Handbook</ulink> &mdash; part of the documentation
project, it does not contain anything specific to kernel
programming, but rather some general useful information.</para></listitem>
<listitem><para>The <ulink
url="&url.books.arch-handbook;/index.html">FreeBSD
Architecture Handbook</ulink> - also from the documentation
Architecture Handbook</ulink> &mdash; also from the documentation
project, contains descriptions of several low-level facilities
and procedures. The most important chapter is 13, <ulink
url="&url.books.arch-handbook;/driverbasics.html">Writing
@ -76,18 +74,24 @@
<listitem><para>The Blueprints section of <ulink
url="http://www.freebsddiary.org">FreeBSD Diary</ulink> web
site - contains several interesting articles on kernel
site &mdash; contains several interesting articles on kernel
facilities.</para></listitem>
<listitem><para>The man pages in section 9 - for important
<listitem><para>The man pages in section 9 &mdash; for important
documentation on kernel functions.</para></listitem>
<listitem><para>The &man.geom.4; man page and <ulink
url="http://phk.freebsd.dk/pubs/">PHK's GEOM slides</ulink>
- for general introduction of the GEOM
subsystem.</para></listitem>
&mdash; for general introduction of the GEOM
subsystem.</para></listitem>
<listitem><para>The &man.style.9; man page - for documentation on
<listitem><para>Man pages &man.g.bio.9;, &man.g.event.9;, &man.g.data.9;,
&man.g.geom.9;, &man.g.provider.9; &man.g.consumer.9;, &man.g.access.9;
&amp; others linked from those, for documentation on specific
functionalities.
</para></listitem>
<listitem><para>The &man.style.9; man page &mdash; for documentation on
the coding-style conventions which must be followed for any code
which is to be committed to the FreeBSD CVS tree.</para></listitem>
@ -111,18 +115,27 @@
one.</para>
<para>But, since not everybody has two or more computers handy, there are
a few things that can be done to prepare an otherwise "live"
system for developing kernel code.</para>
a few things that can be done to prepare an otherwise <quote>live</quote>
system for developing kernel code. This setup is also applicable
for developing in a <ulink url="http://www.vmware.com/">VMWare</ulink>
or <ulink url="http://www.qemu.org/">QEmu</ulink> virtual machine (the
next best thing after a dedicated development machine).</para>
<sect2 id="prelim-system">
<title>Converting a system for development</title>
<title>Modifying a system for development</title>
<para>For any kernel programming a kernel with
<option>INVARIANTS</option> enabled is a must-have. So enter
these in your kernel configuration file:</para>
<programlisting> options INVARIANT_SUPPORT
options INVARIANTS</programlisting>
<programlisting>options INVARIANT_SUPPORT
options INVARIANTS</programlisting>
<para>For more debugging you should also include WITNESS support,
which will alert you of mistakes in locking:</para>
<programlisting>options WITNESS_SUPPORT
options WITNESS</programlisting>
<para>For debugging crash dumps, a kernel with debug symbols is
needed:</para>
@ -141,9 +154,9 @@
can examine a kernel panic when it happens. For this, enter
the following lines in your kernel configuration file:</para>
<programlisting> options KDB
options DDB
options KDB_TRACE</programlisting>
<programlisting>options KDB
options DDB
options KDB_TRACE</programlisting>
<para>For this to work you might need to set a sysctl (if it is
not on by default):</para>
@ -156,14 +169,14 @@
before it is committed to storage. Disabling softupdates
yields a great performance hit, and still does not guarantee
data consistency. Mounting filesystem with the "sync" option
is needed for that. For a compromise, the cache delays can
is needed for that. For a compromise, the softupdates cache delays can
be shortened. There are three sysctl's that are useful for
this (best to be set in
<filename>/etc/sysctl.conf</filename>):</para>
<programlisting> kern.filedelay=5
kern.dirdelay=4
kern.metadelay=3</programlisting>
<programlisting>kern.filedelay=5
kern.dirdelay=4
kern.metadelay=3</programlisting>
<para>The numbers represent seconds.</para>
@ -177,16 +190,16 @@
before swap is enabled. This is controlled with two
<filename>/etc/rc.conf</filename> variables:</para>
<programlisting> dumpdev="/dev/ad0s4b"
dumpdir="/usr/core"</programlisting>
<programlisting>dumpdev="/dev/ad0s4b"
dumpdir="/usr/core </programlisting>
<para>The <varname>dumpdev</varname> variable specifies the swap
partition and <varname>dumpdir</varname> tells the system
where in the filesystem to relocate the core dump on reboot.</para>
<para>Writing kernel core dumps is slow and takes a long time so
if you have lots of memory (>256M) and lots of panics it could
be frustrating to sit and wait while it is done (twice - first
be frustrating to sit and wait while it is done (twice &mdash; first
to write it to swap, then to relocate it to filesystem). It is
convenient then to limit the amount of RAM the system will use
via a <filename>/boot/loader.conf</filename> tunable:</para>
@ -210,10 +223,10 @@
<sect2 id="prelim-starting">
<title>Starting the project</title>
<para>For the purpose of making gjournal, a new empty
subdirectory was created under an arbitrary user-accessible
directory. You do not have to create the module directory under
<filename>/usr/src</filename>.</para>
<para>For the purpose of creating a new GEOM class, an empty
subdirectory has to be created under an arbitrary user-accessible
directory. You do not have to create the module directory under
<filename>/usr/src</filename>.</para>
</sect2>
<sect2 id="prelim-makefile">
@ -224,17 +237,19 @@
project, which of course includes kernel modules.</para>
<para>Creating the <filename>Makefile</filename> is simple
thanks to extensive set of helper routines provided by the
system. In short, here is how it looks:</para>
thanks to an extensive set of helper routines provided by the
system. In short, here is how a minimal <filename>Makefile</filename>
looks for a kernel module:</para>
<programlisting> SRCS=g_journal.c
KMOD=geom_journal
<programlisting>SRCS=g_journal.c
KMOD=geom_journal
.include &lt;bsd.kmod.mk&gt;</programlisting>
.include &lt;bsd.kmod.mk&gt;</programlisting>
<para>This Makefile (with changed filenames) will do for any
kernel module. If more than one file is required, list it in
<envar>SRCS</envar> variable separated with whitespace from
<para>This <filename>Makefile</filename> (with changed filenames)
will do for any kernel module, and a GEOM class can reside in just
one kernel module. If more than one file is required, list it in the
<envar>SRCS</envar> variable, separated with whitespace from
other filenames.</para>
</sect2>
</sect1>
@ -246,7 +261,7 @@
<title>Memory allocation</title>
<para>See &man.malloc.9;. Basic memory allocation is only
slightly different than its user-land equivalent. Most
slightly different than its userland equivalent. Most
notably, <function>malloc</function>() and
<function>free</function>() accept additional parameters as is
described in the man page.</para>
@ -256,7 +271,7 @@
<programlisting> static MALLOC_DEFINE(M_GJOURNAL, "gjournal data", "GEOM_JOURNAL Data");</programlisting>
<para>To use the macro, <filename>sys/param.h</filename>,
<para>To use this macro, <filename>sys/param.h</filename>,
<filename>sys/kernel.h</filename> and
<filename>sys/malloc.h</filename> headers must be
included.</para>
@ -273,16 +288,16 @@
<para>See &man.queue.3;. There are a LOT of cases when a list of
things needs to be maintained. Fortunately, this data
structure is implemented (in several ways) by the C macros
structure is implemented (in several ways) by C macros
included in the system. The most used list type is TAILQ
because it is the most flexible. It is also the one with largest
memory requirements (its elements are doubly-linked) and
theoretically the slowest (though the speed variation is on
also the slowest (although the speed variation is on
the order of several CPU instructions more, so it should not be
taken seriously).</para>
<para>If data retrieval speed is very important, see
&man.tree.3;.</para>
&man.tree.3; and &man.hashinit.9;.</para>
</sect2>
<sect2 id="kernelprog-bios">
@ -295,21 +310,25 @@
a buffer, and a bunch of <quote>user-specific</quote> flags
and fields that can help implement various hacks.</para>
<para>The important thing here is that bios are dealt with
asynchronously. That means that, in most parts of the code,
<para>The important thing here is that <structname>bio</structname>s
are handled asynchronously. That means that, in most parts of the code,
there is no analogue to userland's &man.read.2; and
&man.write.2; calls that do not return until a request is
done. Rather, a developer-supplied function is called as a
notification when the request gets completed (or results in
error).</para>
<para>Unfortunately, the asynchronous programming model (also
called "event-driven") imposed this way is somewhat harder
than the much more used imperative one (at least it takes a
while to get used to it). In some cases helper routines
<para>The asynchronous programming model (also
called "event-driven") is somewhat harder
than the much more used imperative one used in userland
(at least it takes a
while to get used to it). In some cases the helper routines
<function>g_write_data</function>() and
<function>g_read_data</function>() can be used, but <emphasis>NOT
ALWAYS</emphasis>!.</para>
<function>g_read_data</function>() can be used, but <emphasis>not
always</emphasis>. In particular, they cannot be used when
a mutex is held; for example, the GEOM topology mutex or
the internal mutex held during the <function>.start</function>() and
<function>.stop</function>() functions.</para>
</sect2>
</sect1>
@ -330,7 +349,11 @@
<sect2 id="geom-class">
<title>GEOM class</title>
<para>GEOM class has several "class methods" that get called
<para>GEOM classes are transformations on the data. These transformations
can be combined in a tree-like fashion. Instances of GEOM classes are
called <emphasis>geoms</emphasis>.</para>
<para>Each GEOM class has several "class methods" that get called
when there is no geom instance available (or they are simply not
bound to a single instance):</para>
@ -364,8 +387,7 @@
<structname>g_class</structname> structure is a LIST of geoms
instantiated from the class.</para>
<para>These functions are called from g_event? kernel
thread.</para>
<para>These functions are called from the g_event kernel thread.</para>
</sect2>
@ -377,9 +399,8 @@
comes from the archaic term <quote>software control block</quote>.
In GEOM, it is a structure (more precise: pointer to a
structure) that can be attached to a geom instance to hold
whatever data is private to the geom instance. In gjournal
(and most of the other GEOM classes), some of its members
are:</para>
whatever data is private to the geom instance. Most GEOM classes
have the following members:</para>
<itemizedlist>
<listitem><para><varname>struct g_provider *provider</varname> : The
@ -486,10 +507,10 @@
<itemizedlist>
<listitem><para>label - to write metadata to devices so they can be
<listitem><para>label &mdash; to write metadata to devices so they can be
recognized at tasting and brought up in geoms</para></listitem>
<listitem><para>destroy - to destroy metadata, so the geoms get
<listitem><para>destroy &mdash; to destroy metadata, so the geoms get
destroyed</para></listitem>
</itemizedlist>
@ -515,7 +536,7 @@
<sect2 id="geom-geoms">
<title>Geoms</title>
<para>Geoms are instances of geom classes. They have internal
<para>Geoms are instances of GEOM classes. They have internal
data (a softc structure) and some functions with which they
respond to external events.</para>
@ -537,9 +558,9 @@
<listitem><para><function>.start</function> : handles I/O</para></listitem>
</itemizedlist>
<para>These functions are called from the g_down? kernel thread and
there can be no sleeping in this context (no blocking on a
mutex or any kind of locks) which limits what can be done
<para>These functions are called from the <function>g_down</function>
kernel thread and there can be no sleeping in this context,
(see definition of sleeping elsewhere) which limits what can be done
quite a bit, but forces the handling to be fast.</para>
<para>Of these, the most important function for doing actual
@ -632,15 +653,17 @@
between passing the data to consumers and
returning.</para></listitem>
<listitem><para>Waiting for I/O.</para></listitem>
<listitem><para>Calls to &man.malloc.9; and
<function>uma_zalloc</function>() with
<varname>M_WAITOK</varname> flag set</para></listitem>
<listitem><para>sx locks</para></listitem>
<listitem><para>sx and other sleepable locks</para></listitem>
</itemizedlist>
<para>This restriction is here to stop geom code clogging the I/O
request path, because sleeping in the code is usually not
<para>This restriction is here to stop GEOM code clogging the I/O
request path, since sleeping is usually not
time-bound and there can be no guarantees on how long will it
take (there are some other, more technical reasons also). It
also means that there is not much that can be done in those
@ -657,7 +680,7 @@
behaviour, only they cannot return to caller to signify
termination, but must call &man.kthread.exit.9;.</para>
<para>In geom code, the usual use of threads is to offload
<para>In GEOM code, the usual use of threads is to offload
processing of requests from <literal>g_down</literal> thread
(the <function>.start</function>() function). These threads
look like <quote>event handlers</quote>: they have a linked
@ -683,9 +706,9 @@
<function>.done</function>() requests can be left to the
<literal>g_up</literal> thread.</para>
<para>Mutexes in FreeBSD kernel (see &man.mutex.9; man page) have
one distinction from their more common userland cousins - they
disallow sleeping (meaning: the code cannot sleep while holding
<para>Mutexes in FreeBSD kernel (see &man.mutex.9;) have
one distinction from their more common userland cousins &mdash; the
code cannot sleep while holding
a mutex). If the code needs to sleep a lot, &man.sx.9; locks
may be more appropriate. On the other hand, if you do almost
everything in a single thread, you may get away with no