Small improvements and clarifications. Some are mine, most were:

Submitted by:	ivoras
This commit is contained in:
Ceri Davies 2007-05-04 12:32:03 +00:00
parent 571762032c
commit a9ad3db4c7
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=30144

View file

@ -16,7 +16,7 @@
<firstname>Ivan</firstname> <firstname>Ivan</firstname>
<surname>Voras</surname> <surname>Voras</surname>
<affiliation> <affiliation>
<address><email>ivoras@yahoo.com</email> <address><email>ivoras@FreeBSD.org</email>
</address> </address>
</affiliation> </affiliation>
</author> </author>
@ -34,10 +34,9 @@
<abstract> <abstract>
<para>This text documents the way I created the gjournal <para>This text documents some starting points in developing
facility, starting with learning how to do kernel GEOM classes, and kernel modules in general. It is assumed
programming. It is assumed that the reader is familiar with C that the reader is familiar with C userland programming.</para>
userland programming.</para>
</abstract> </abstract>
@ -50,7 +49,7 @@
<sect2 id="intro-docs"> <sect2 id="intro-docs">
<title>Documentation</title> <title>Documentation</title>
<para>Documentation on kernel programming is scarce - it is one of <para>Documentation on kernel programming is scarce &mdash; it is one of
few areas where there is nearly nothing in the way of friendly few areas where there is nearly nothing in the way of friendly
tutorials, and the phrase <quote>use the source!</quote> really tutorials, and the phrase <quote>use the source!</quote> really
holds true. However, there are some bits and pieces (some of holds true. However, there are some bits and pieces (some of
@ -61,14 +60,13 @@
<listitem><para>The <ulink <listitem><para>The <ulink
url="&url.books.developers-handbook;/index.html">FreeBSD url="&url.books.developers-handbook;/index.html">FreeBSD
Developer's Handbook</ulink> - part of the documentation Developer's Handbook</ulink> &mdash; part of the documentation
project, it does not contain anything specific to kernel-land project, it does not contain anything specific to kernel
programming, but rather some general programming, but rather some general useful information.</para></listitem>
information.</para></listitem>
<listitem><para>The <ulink <listitem><para>The <ulink
url="&url.books.arch-handbook;/index.html">FreeBSD url="&url.books.arch-handbook;/index.html">FreeBSD
Architecture Handbook</ulink> - also from the documentation Architecture Handbook</ulink> &mdash; also from the documentation
project, contains descriptions of several low-level facilities project, contains descriptions of several low-level facilities
and procedures. The most important chapter is 13, <ulink and procedures. The most important chapter is 13, <ulink
url="&url.books.arch-handbook;/driverbasics.html">Writing url="&url.books.arch-handbook;/driverbasics.html">Writing
@ -76,18 +74,24 @@
<listitem><para>The Blueprints section of <ulink <listitem><para>The Blueprints section of <ulink
url="http://www.freebsddiary.org">FreeBSD Diary</ulink> web url="http://www.freebsddiary.org">FreeBSD Diary</ulink> web
site - contains several interesting articles on kernel site &mdash; contains several interesting articles on kernel
facilities.</para></listitem> facilities.</para></listitem>
<listitem><para>The man pages in section 9 - for important <listitem><para>The man pages in section 9 &mdash; for important
documentation on kernel functions.</para></listitem> documentation on kernel functions.</para></listitem>
<listitem><para>The &man.geom.4; man page and <ulink <listitem><para>The &man.geom.4; man page and <ulink
url="http://phk.freebsd.dk/pubs/">PHK's GEOM slides</ulink> url="http://phk.freebsd.dk/pubs/">PHK's GEOM slides</ulink>
- for general introduction of the GEOM &mdash; for general introduction of the GEOM
subsystem.</para></listitem> subsystem.</para></listitem>
<listitem><para>The &man.style.9; man page - for documentation on <listitem><para>Man pages &man.g.bio.9;, &man.g.event.9;, &man.g.data.9;,
&man.g.geom.9;, &man.g.provider.9; &man.g.consumer.9;, &man.g.access.9;
&amp; others linked from those, for documentation on specific
functionalities.
</para></listitem>
<listitem><para>The &man.style.9; man page &mdash; for documentation on
the coding-style conventions which must be followed for any code the coding-style conventions which must be followed for any code
which is to be committed to the FreeBSD CVS tree.</para></listitem> which is to be committed to the FreeBSD CVS tree.</para></listitem>
@ -111,18 +115,27 @@
one.</para> one.</para>
<para>But, since not everybody has two or more computers handy, there are <para>But, since not everybody has two or more computers handy, there are
a few things that can be done to prepare an otherwise "live" a few things that can be done to prepare an otherwise <quote>live</quote>
system for developing kernel code.</para> system for developing kernel code. This setup is also applicable
for developing in a <ulink url="http://www.vmware.com/">VMWare</ulink>
or <ulink url="http://www.qemu.org/">QEmu</ulink> virtual machine (the
next best thing after a dedicated development machine).</para>
<sect2 id="prelim-system"> <sect2 id="prelim-system">
<title>Converting a system for development</title> <title>Modifying a system for development</title>
<para>For any kernel programming a kernel with <para>For any kernel programming a kernel with
<option>INVARIANTS</option> enabled is a must-have. So enter <option>INVARIANTS</option> enabled is a must-have. So enter
these in your kernel configuration file:</para> these in your kernel configuration file:</para>
<programlisting> options INVARIANT_SUPPORT <programlisting>options INVARIANT_SUPPORT
options INVARIANTS</programlisting> options INVARIANTS</programlisting>
<para>For more debugging you should also include WITNESS support,
which will alert you of mistakes in locking:</para>
<programlisting>options WITNESS_SUPPORT
options WITNESS</programlisting>
<para>For debugging crash dumps, a kernel with debug symbols is <para>For debugging crash dumps, a kernel with debug symbols is
needed:</para> needed:</para>
@ -141,9 +154,9 @@
can examine a kernel panic when it happens. For this, enter can examine a kernel panic when it happens. For this, enter
the following lines in your kernel configuration file:</para> the following lines in your kernel configuration file:</para>
<programlisting> options KDB <programlisting>options KDB
options DDB options DDB
options KDB_TRACE</programlisting> options KDB_TRACE</programlisting>
<para>For this to work you might need to set a sysctl (if it is <para>For this to work you might need to set a sysctl (if it is
not on by default):</para> not on by default):</para>
@ -156,14 +169,14 @@
before it is committed to storage. Disabling softupdates before it is committed to storage. Disabling softupdates
yields a great performance hit, and still does not guarantee yields a great performance hit, and still does not guarantee
data consistency. Mounting filesystem with the "sync" option data consistency. Mounting filesystem with the "sync" option
is needed for that. For a compromise, the cache delays can is needed for that. For a compromise, the softupdates cache delays can
be shortened. There are three sysctl's that are useful for be shortened. There are three sysctl's that are useful for
this (best to be set in this (best to be set in
<filename>/etc/sysctl.conf</filename>):</para> <filename>/etc/sysctl.conf</filename>):</para>
<programlisting> kern.filedelay=5 <programlisting>kern.filedelay=5
kern.dirdelay=4 kern.dirdelay=4
kern.metadelay=3</programlisting> kern.metadelay=3</programlisting>
<para>The numbers represent seconds.</para> <para>The numbers represent seconds.</para>
@ -177,8 +190,8 @@
before swap is enabled. This is controlled with two before swap is enabled. This is controlled with two
<filename>/etc/rc.conf</filename> variables:</para> <filename>/etc/rc.conf</filename> variables:</para>
<programlisting> dumpdev="/dev/ad0s4b" <programlisting>dumpdev="/dev/ad0s4b"
dumpdir="/usr/core"</programlisting> dumpdir="/usr/core </programlisting>
<para>The <varname>dumpdev</varname> variable specifies the swap <para>The <varname>dumpdev</varname> variable specifies the swap
partition and <varname>dumpdir</varname> tells the system partition and <varname>dumpdir</varname> tells the system
@ -186,7 +199,7 @@
<para>Writing kernel core dumps is slow and takes a long time so <para>Writing kernel core dumps is slow and takes a long time so
if you have lots of memory (>256M) and lots of panics it could if you have lots of memory (>256M) and lots of panics it could
be frustrating to sit and wait while it is done (twice - first be frustrating to sit and wait while it is done (twice &mdash; first
to write it to swap, then to relocate it to filesystem). It is to write it to swap, then to relocate it to filesystem). It is
convenient then to limit the amount of RAM the system will use convenient then to limit the amount of RAM the system will use
via a <filename>/boot/loader.conf</filename> tunable:</para> via a <filename>/boot/loader.conf</filename> tunable:</para>
@ -210,10 +223,10 @@
<sect2 id="prelim-starting"> <sect2 id="prelim-starting">
<title>Starting the project</title> <title>Starting the project</title>
<para>For the purpose of making gjournal, a new empty <para>For the purpose of creating a new GEOM class, an empty
subdirectory was created under an arbitrary user-accessible subdirectory has to be created under an arbitrary user-accessible
directory. You do not have to create the module directory under directory. You do not have to create the module directory under
<filename>/usr/src</filename>.</para> <filename>/usr/src</filename>.</para>
</sect2> </sect2>
<sect2 id="prelim-makefile"> <sect2 id="prelim-makefile">
@ -224,17 +237,19 @@
project, which of course includes kernel modules.</para> project, which of course includes kernel modules.</para>
<para>Creating the <filename>Makefile</filename> is simple <para>Creating the <filename>Makefile</filename> is simple
thanks to extensive set of helper routines provided by the thanks to an extensive set of helper routines provided by the
system. In short, here is how it looks:</para> system. In short, here is how a minimal <filename>Makefile</filename>
looks for a kernel module:</para>
<programlisting> SRCS=g_journal.c <programlisting>SRCS=g_journal.c
KMOD=geom_journal KMOD=geom_journal
.include &lt;bsd.kmod.mk&gt;</programlisting> .include &lt;bsd.kmod.mk&gt;</programlisting>
<para>This Makefile (with changed filenames) will do for any <para>This <filename>Makefile</filename> (with changed filenames)
kernel module. If more than one file is required, list it in will do for any kernel module, and a GEOM class can reside in just
<envar>SRCS</envar> variable separated with whitespace from one kernel module. If more than one file is required, list it in the
<envar>SRCS</envar> variable, separated with whitespace from
other filenames.</para> other filenames.</para>
</sect2> </sect2>
</sect1> </sect1>
@ -246,7 +261,7 @@
<title>Memory allocation</title> <title>Memory allocation</title>
<para>See &man.malloc.9;. Basic memory allocation is only <para>See &man.malloc.9;. Basic memory allocation is only
slightly different than its user-land equivalent. Most slightly different than its userland equivalent. Most
notably, <function>malloc</function>() and notably, <function>malloc</function>() and
<function>free</function>() accept additional parameters as is <function>free</function>() accept additional parameters as is
described in the man page.</para> described in the man page.</para>
@ -256,7 +271,7 @@
<programlisting> static MALLOC_DEFINE(M_GJOURNAL, "gjournal data", "GEOM_JOURNAL Data");</programlisting> <programlisting> static MALLOC_DEFINE(M_GJOURNAL, "gjournal data", "GEOM_JOURNAL Data");</programlisting>
<para>To use the macro, <filename>sys/param.h</filename>, <para>To use this macro, <filename>sys/param.h</filename>,
<filename>sys/kernel.h</filename> and <filename>sys/kernel.h</filename> and
<filename>sys/malloc.h</filename> headers must be <filename>sys/malloc.h</filename> headers must be
included.</para> included.</para>
@ -273,16 +288,16 @@
<para>See &man.queue.3;. There are a LOT of cases when a list of <para>See &man.queue.3;. There are a LOT of cases when a list of
things needs to be maintained. Fortunately, this data things needs to be maintained. Fortunately, this data
structure is implemented (in several ways) by the C macros structure is implemented (in several ways) by C macros
included in the system. The most used list type is TAILQ included in the system. The most used list type is TAILQ
because it is the most flexible. It is also the one with largest because it is the most flexible. It is also the one with largest
memory requirements (its elements are doubly-linked) and memory requirements (its elements are doubly-linked) and
theoretically the slowest (though the speed variation is on also the slowest (although the speed variation is on
the order of several CPU instructions more, so it should not be the order of several CPU instructions more, so it should not be
taken seriously).</para> taken seriously).</para>
<para>If data retrieval speed is very important, see <para>If data retrieval speed is very important, see
&man.tree.3;.</para> &man.tree.3; and &man.hashinit.9;.</para>
</sect2> </sect2>
<sect2 id="kernelprog-bios"> <sect2 id="kernelprog-bios">
@ -295,21 +310,25 @@
a buffer, and a bunch of <quote>user-specific</quote> flags a buffer, and a bunch of <quote>user-specific</quote> flags
and fields that can help implement various hacks.</para> and fields that can help implement various hacks.</para>
<para>The important thing here is that bios are dealt with <para>The important thing here is that <structname>bio</structname>s
asynchronously. That means that, in most parts of the code, are handled asynchronously. That means that, in most parts of the code,
there is no analogue to userland's &man.read.2; and there is no analogue to userland's &man.read.2; and
&man.write.2; calls that do not return until a request is &man.write.2; calls that do not return until a request is
done. Rather, a developer-supplied function is called as a done. Rather, a developer-supplied function is called as a
notification when the request gets completed (or results in notification when the request gets completed (or results in
error).</para> error).</para>
<para>Unfortunately, the asynchronous programming model (also <para>The asynchronous programming model (also
called "event-driven") imposed this way is somewhat harder called "event-driven") is somewhat harder
than the much more used imperative one (at least it takes a than the much more used imperative one used in userland
while to get used to it). In some cases helper routines (at least it takes a
while to get used to it). In some cases the helper routines
<function>g_write_data</function>() and <function>g_write_data</function>() and
<function>g_read_data</function>() can be used, but <emphasis>NOT <function>g_read_data</function>() can be used, but <emphasis>not
ALWAYS</emphasis>!.</para> always</emphasis>. In particular, they cannot be used when
a mutex is held; for example, the GEOM topology mutex or
the internal mutex held during the <function>.start</function>() and
<function>.stop</function>() functions.</para>
</sect2> </sect2>
</sect1> </sect1>
@ -330,7 +349,11 @@
<sect2 id="geom-class"> <sect2 id="geom-class">
<title>GEOM class</title> <title>GEOM class</title>
<para>GEOM class has several "class methods" that get called <para>GEOM classes are transformations on the data. These transformations
can be combined in a tree-like fashion. Instances of GEOM classes are
called <emphasis>geoms</emphasis>.</para>
<para>Each GEOM class has several "class methods" that get called
when there is no geom instance available (or they are simply not when there is no geom instance available (or they are simply not
bound to a single instance):</para> bound to a single instance):</para>
@ -364,8 +387,7 @@
<structname>g_class</structname> structure is a LIST of geoms <structname>g_class</structname> structure is a LIST of geoms
instantiated from the class.</para> instantiated from the class.</para>
<para>These functions are called from g_event? kernel <para>These functions are called from the g_event kernel thread.</para>
thread.</para>
</sect2> </sect2>
@ -377,9 +399,8 @@
comes from the archaic term <quote>software control block</quote>. comes from the archaic term <quote>software control block</quote>.
In GEOM, it is a structure (more precise: pointer to a In GEOM, it is a structure (more precise: pointer to a
structure) that can be attached to a geom instance to hold structure) that can be attached to a geom instance to hold
whatever data is private to the geom instance. In gjournal whatever data is private to the geom instance. Most GEOM classes
(and most of the other GEOM classes), some of its members have the following members:</para>
are:</para>
<itemizedlist> <itemizedlist>
<listitem><para><varname>struct g_provider *provider</varname> : The <listitem><para><varname>struct g_provider *provider</varname> : The
@ -486,10 +507,10 @@
<itemizedlist> <itemizedlist>
<listitem><para>label - to write metadata to devices so they can be <listitem><para>label &mdash; to write metadata to devices so they can be
recognized at tasting and brought up in geoms</para></listitem> recognized at tasting and brought up in geoms</para></listitem>
<listitem><para>destroy - to destroy metadata, so the geoms get <listitem><para>destroy &mdash; to destroy metadata, so the geoms get
destroyed</para></listitem> destroyed</para></listitem>
</itemizedlist> </itemizedlist>
@ -515,7 +536,7 @@
<sect2 id="geom-geoms"> <sect2 id="geom-geoms">
<title>Geoms</title> <title>Geoms</title>
<para>Geoms are instances of geom classes. They have internal <para>Geoms are instances of GEOM classes. They have internal
data (a softc structure) and some functions with which they data (a softc structure) and some functions with which they
respond to external events.</para> respond to external events.</para>
@ -537,9 +558,9 @@
<listitem><para><function>.start</function> : handles I/O</para></listitem> <listitem><para><function>.start</function> : handles I/O</para></listitem>
</itemizedlist> </itemizedlist>
<para>These functions are called from the g_down? kernel thread and <para>These functions are called from the <function>g_down</function>
there can be no sleeping in this context (no blocking on a kernel thread and there can be no sleeping in this context,
mutex or any kind of locks) which limits what can be done (see definition of sleeping elsewhere) which limits what can be done
quite a bit, but forces the handling to be fast.</para> quite a bit, but forces the handling to be fast.</para>
<para>Of these, the most important function for doing actual <para>Of these, the most important function for doing actual
@ -632,15 +653,17 @@
between passing the data to consumers and between passing the data to consumers and
returning.</para></listitem> returning.</para></listitem>
<listitem><para>Waiting for I/O.</para></listitem>
<listitem><para>Calls to &man.malloc.9; and <listitem><para>Calls to &man.malloc.9; and
<function>uma_zalloc</function>() with <function>uma_zalloc</function>() with
<varname>M_WAITOK</varname> flag set</para></listitem> <varname>M_WAITOK</varname> flag set</para></listitem>
<listitem><para>sx locks</para></listitem> <listitem><para>sx and other sleepable locks</para></listitem>
</itemizedlist> </itemizedlist>
<para>This restriction is here to stop geom code clogging the I/O <para>This restriction is here to stop GEOM code clogging the I/O
request path, because sleeping in the code is usually not request path, since sleeping is usually not
time-bound and there can be no guarantees on how long will it time-bound and there can be no guarantees on how long will it
take (there are some other, more technical reasons also). It take (there are some other, more technical reasons also). It
also means that there is not much that can be done in those also means that there is not much that can be done in those
@ -657,7 +680,7 @@
behaviour, only they cannot return to caller to signify behaviour, only they cannot return to caller to signify
termination, but must call &man.kthread.exit.9;.</para> termination, but must call &man.kthread.exit.9;.</para>
<para>In geom code, the usual use of threads is to offload <para>In GEOM code, the usual use of threads is to offload
processing of requests from <literal>g_down</literal> thread processing of requests from <literal>g_down</literal> thread
(the <function>.start</function>() function). These threads (the <function>.start</function>() function). These threads
look like <quote>event handlers</quote>: they have a linked look like <quote>event handlers</quote>: they have a linked
@ -683,9 +706,9 @@
<function>.done</function>() requests can be left to the <function>.done</function>() requests can be left to the
<literal>g_up</literal> thread.</para> <literal>g_up</literal> thread.</para>
<para>Mutexes in FreeBSD kernel (see &man.mutex.9; man page) have <para>Mutexes in FreeBSD kernel (see &man.mutex.9;) have
one distinction from their more common userland cousins - they one distinction from their more common userland cousins &mdash; the
disallow sleeping (meaning: the code cannot sleep while holding code cannot sleep while holding
a mutex). If the code needs to sleep a lot, &man.sx.9; locks a mutex). If the code needs to sleep a lot, &man.sx.9; locks
may be more appropriate. On the other hand, if you do almost may be more appropriate. On the other hand, if you do almost
everything in a single thread, you may get away with no everything in a single thread, you may get away with no