Small improvements and clarifications. Some are mine, most were:
Submitted by: ivoras
This commit is contained in:
parent
571762032c
commit
a9ad3db4c7
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/head/; revision=30144
1 changed files with 99 additions and 76 deletions
|
@ -16,7 +16,7 @@
|
|||
<firstname>Ivan</firstname>
|
||||
<surname>Voras</surname>
|
||||
<affiliation>
|
||||
<address><email>ivoras@yahoo.com</email>
|
||||
<address><email>ivoras@FreeBSD.org</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
|
@ -34,10 +34,9 @@
|
|||
|
||||
<abstract>
|
||||
|
||||
<para>This text documents the way I created the gjournal
|
||||
facility, starting with learning how to do kernel
|
||||
programming. It is assumed that the reader is familiar with C
|
||||
userland programming.</para>
|
||||
<para>This text documents some starting points in developing
|
||||
GEOM classes, and kernel modules in general. It is assumed
|
||||
that the reader is familiar with C userland programming.</para>
|
||||
|
||||
</abstract>
|
||||
|
||||
|
@ -50,7 +49,7 @@
|
|||
<sect2 id="intro-docs">
|
||||
<title>Documentation</title>
|
||||
|
||||
<para>Documentation on kernel programming is scarce - it is one of
|
||||
<para>Documentation on kernel programming is scarce — it is one of
|
||||
few areas where there is nearly nothing in the way of friendly
|
||||
tutorials, and the phrase <quote>use the source!</quote> really
|
||||
holds true. However, there are some bits and pieces (some of
|
||||
|
@ -61,14 +60,13 @@
|
|||
|
||||
<listitem><para>The <ulink
|
||||
url="&url.books.developers-handbook;/index.html">FreeBSD
|
||||
Developer's Handbook</ulink> - part of the documentation
|
||||
project, it does not contain anything specific to kernel-land
|
||||
programming, but rather some general
|
||||
information.</para></listitem>
|
||||
Developer's Handbook</ulink> — part of the documentation
|
||||
project, it does not contain anything specific to kernel
|
||||
programming, but rather some general useful information.</para></listitem>
|
||||
|
||||
<listitem><para>The <ulink
|
||||
url="&url.books.arch-handbook;/index.html">FreeBSD
|
||||
Architecture Handbook</ulink> - also from the documentation
|
||||
Architecture Handbook</ulink> — also from the documentation
|
||||
project, contains descriptions of several low-level facilities
|
||||
and procedures. The most important chapter is 13, <ulink
|
||||
url="&url.books.arch-handbook;/driverbasics.html">Writing
|
||||
|
@ -76,18 +74,24 @@
|
|||
|
||||
<listitem><para>The Blueprints section of <ulink
|
||||
url="http://www.freebsddiary.org">FreeBSD Diary</ulink> web
|
||||
site - contains several interesting articles on kernel
|
||||
site — contains several interesting articles on kernel
|
||||
facilities.</para></listitem>
|
||||
|
||||
<listitem><para>The man pages in section 9 - for important
|
||||
<listitem><para>The man pages in section 9 — for important
|
||||
documentation on kernel functions.</para></listitem>
|
||||
|
||||
<listitem><para>The &man.geom.4; man page and <ulink
|
||||
url="http://phk.freebsd.dk/pubs/">PHK's GEOM slides</ulink>
|
||||
- for general introduction of the GEOM
|
||||
subsystem.</para></listitem>
|
||||
— for general introduction of the GEOM
|
||||
subsystem.</para></listitem>
|
||||
|
||||
<listitem><para>The &man.style.9; man page - for documentation on
|
||||
<listitem><para>Man pages &man.g.bio.9;, &man.g.event.9;, &man.g.data.9;,
|
||||
&man.g.geom.9;, &man.g.provider.9; &man.g.consumer.9;, &man.g.access.9;
|
||||
& others linked from those, for documentation on specific
|
||||
functionalities.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>The &man.style.9; man page — for documentation on
|
||||
the coding-style conventions which must be followed for any code
|
||||
which is to be committed to the FreeBSD CVS tree.</para></listitem>
|
||||
|
||||
|
@ -111,18 +115,27 @@
|
|||
one.</para>
|
||||
|
||||
<para>But, since not everybody has two or more computers handy, there are
|
||||
a few things that can be done to prepare an otherwise "live"
|
||||
system for developing kernel code.</para>
|
||||
a few things that can be done to prepare an otherwise <quote>live</quote>
|
||||
system for developing kernel code. This setup is also applicable
|
||||
for developing in a <ulink url="http://www.vmware.com/">VMWare</ulink>
|
||||
or <ulink url="http://www.qemu.org/">QEmu</ulink> virtual machine (the
|
||||
next best thing after a dedicated development machine).</para>
|
||||
|
||||
<sect2 id="prelim-system">
|
||||
<title>Converting a system for development</title>
|
||||
<title>Modifying a system for development</title>
|
||||
|
||||
<para>For any kernel programming a kernel with
|
||||
<option>INVARIANTS</option> enabled is a must-have. So enter
|
||||
these in your kernel configuration file:</para>
|
||||
|
||||
<programlisting> options INVARIANT_SUPPORT
|
||||
options INVARIANTS</programlisting>
|
||||
<programlisting>options INVARIANT_SUPPORT
|
||||
options INVARIANTS</programlisting>
|
||||
|
||||
<para>For more debugging you should also include WITNESS support,
|
||||
which will alert you of mistakes in locking:</para>
|
||||
|
||||
<programlisting>options WITNESS_SUPPORT
|
||||
options WITNESS</programlisting>
|
||||
|
||||
<para>For debugging crash dumps, a kernel with debug symbols is
|
||||
needed:</para>
|
||||
|
@ -141,9 +154,9 @@
|
|||
can examine a kernel panic when it happens. For this, enter
|
||||
the following lines in your kernel configuration file:</para>
|
||||
|
||||
<programlisting> options KDB
|
||||
options DDB
|
||||
options KDB_TRACE</programlisting>
|
||||
<programlisting>options KDB
|
||||
options DDB
|
||||
options KDB_TRACE</programlisting>
|
||||
|
||||
<para>For this to work you might need to set a sysctl (if it is
|
||||
not on by default):</para>
|
||||
|
@ -156,14 +169,14 @@
|
|||
before it is committed to storage. Disabling softupdates
|
||||
yields a great performance hit, and still does not guarantee
|
||||
data consistency. Mounting filesystem with the "sync" option
|
||||
is needed for that. For a compromise, the cache delays can
|
||||
is needed for that. For a compromise, the softupdates cache delays can
|
||||
be shortened. There are three sysctl's that are useful for
|
||||
this (best to be set in
|
||||
<filename>/etc/sysctl.conf</filename>):</para>
|
||||
|
||||
<programlisting> kern.filedelay=5
|
||||
kern.dirdelay=4
|
||||
kern.metadelay=3</programlisting>
|
||||
<programlisting>kern.filedelay=5
|
||||
kern.dirdelay=4
|
||||
kern.metadelay=3</programlisting>
|
||||
|
||||
<para>The numbers represent seconds.</para>
|
||||
|
||||
|
@ -177,16 +190,16 @@
|
|||
before swap is enabled. This is controlled with two
|
||||
<filename>/etc/rc.conf</filename> variables:</para>
|
||||
|
||||
<programlisting> dumpdev="/dev/ad0s4b"
|
||||
dumpdir="/usr/core"</programlisting>
|
||||
|
||||
<programlisting>dumpdev="/dev/ad0s4b"
|
||||
dumpdir="/usr/core </programlisting>
|
||||
|
||||
<para>The <varname>dumpdev</varname> variable specifies the swap
|
||||
partition and <varname>dumpdir</varname> tells the system
|
||||
where in the filesystem to relocate the core dump on reboot.</para>
|
||||
|
||||
<para>Writing kernel core dumps is slow and takes a long time so
|
||||
if you have lots of memory (>256M) and lots of panics it could
|
||||
be frustrating to sit and wait while it is done (twice - first
|
||||
be frustrating to sit and wait while it is done (twice — first
|
||||
to write it to swap, then to relocate it to filesystem). It is
|
||||
convenient then to limit the amount of RAM the system will use
|
||||
via a <filename>/boot/loader.conf</filename> tunable:</para>
|
||||
|
@ -210,10 +223,10 @@
|
|||
<sect2 id="prelim-starting">
|
||||
<title>Starting the project</title>
|
||||
|
||||
<para>For the purpose of making gjournal, a new empty
|
||||
subdirectory was created under an arbitrary user-accessible
|
||||
directory. You do not have to create the module directory under
|
||||
<filename>/usr/src</filename>.</para>
|
||||
<para>For the purpose of creating a new GEOM class, an empty
|
||||
subdirectory has to be created under an arbitrary user-accessible
|
||||
directory. You do not have to create the module directory under
|
||||
<filename>/usr/src</filename>.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="prelim-makefile">
|
||||
|
@ -224,17 +237,19 @@
|
|||
project, which of course includes kernel modules.</para>
|
||||
|
||||
<para>Creating the <filename>Makefile</filename> is simple
|
||||
thanks to extensive set of helper routines provided by the
|
||||
system. In short, here is how it looks:</para>
|
||||
thanks to an extensive set of helper routines provided by the
|
||||
system. In short, here is how a minimal <filename>Makefile</filename>
|
||||
looks for a kernel module:</para>
|
||||
|
||||
<programlisting> SRCS=g_journal.c
|
||||
KMOD=geom_journal
|
||||
<programlisting>SRCS=g_journal.c
|
||||
KMOD=geom_journal
|
||||
|
||||
.include <bsd.kmod.mk></programlisting>
|
||||
.include <bsd.kmod.mk></programlisting>
|
||||
|
||||
<para>This Makefile (with changed filenames) will do for any
|
||||
kernel module. If more than one file is required, list it in
|
||||
<envar>SRCS</envar> variable separated with whitespace from
|
||||
<para>This <filename>Makefile</filename> (with changed filenames)
|
||||
will do for any kernel module, and a GEOM class can reside in just
|
||||
one kernel module. If more than one file is required, list it in the
|
||||
<envar>SRCS</envar> variable, separated with whitespace from
|
||||
other filenames.</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
@ -246,7 +261,7 @@
|
|||
<title>Memory allocation</title>
|
||||
|
||||
<para>See &man.malloc.9;. Basic memory allocation is only
|
||||
slightly different than its user-land equivalent. Most
|
||||
slightly different than its userland equivalent. Most
|
||||
notably, <function>malloc</function>() and
|
||||
<function>free</function>() accept additional parameters as is
|
||||
described in the man page.</para>
|
||||
|
@ -256,7 +271,7 @@
|
|||
|
||||
<programlisting> static MALLOC_DEFINE(M_GJOURNAL, "gjournal data", "GEOM_JOURNAL Data");</programlisting>
|
||||
|
||||
<para>To use the macro, <filename>sys/param.h</filename>,
|
||||
<para>To use this macro, <filename>sys/param.h</filename>,
|
||||
<filename>sys/kernel.h</filename> and
|
||||
<filename>sys/malloc.h</filename> headers must be
|
||||
included.</para>
|
||||
|
@ -273,16 +288,16 @@
|
|||
|
||||
<para>See &man.queue.3;. There are a LOT of cases when a list of
|
||||
things needs to be maintained. Fortunately, this data
|
||||
structure is implemented (in several ways) by the C macros
|
||||
structure is implemented (in several ways) by C macros
|
||||
included in the system. The most used list type is TAILQ
|
||||
because it is the most flexible. It is also the one with largest
|
||||
memory requirements (its elements are doubly-linked) and
|
||||
theoretically the slowest (though the speed variation is on
|
||||
also the slowest (although the speed variation is on
|
||||
the order of several CPU instructions more, so it should not be
|
||||
taken seriously).</para>
|
||||
|
||||
<para>If data retrieval speed is very important, see
|
||||
&man.tree.3;.</para>
|
||||
&man.tree.3; and &man.hashinit.9;.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="kernelprog-bios">
|
||||
|
@ -295,21 +310,25 @@
|
|||
a buffer, and a bunch of <quote>user-specific</quote> flags
|
||||
and fields that can help implement various hacks.</para>
|
||||
|
||||
<para>The important thing here is that bios are dealt with
|
||||
asynchronously. That means that, in most parts of the code,
|
||||
<para>The important thing here is that <structname>bio</structname>s
|
||||
are handled asynchronously. That means that, in most parts of the code,
|
||||
there is no analogue to userland's &man.read.2; and
|
||||
&man.write.2; calls that do not return until a request is
|
||||
done. Rather, a developer-supplied function is called as a
|
||||
notification when the request gets completed (or results in
|
||||
error).</para>
|
||||
|
||||
<para>Unfortunately, the asynchronous programming model (also
|
||||
called "event-driven") imposed this way is somewhat harder
|
||||
than the much more used imperative one (at least it takes a
|
||||
while to get used to it). In some cases helper routines
|
||||
<para>The asynchronous programming model (also
|
||||
called "event-driven") is somewhat harder
|
||||
than the much more used imperative one used in userland
|
||||
(at least it takes a
|
||||
while to get used to it). In some cases the helper routines
|
||||
<function>g_write_data</function>() and
|
||||
<function>g_read_data</function>() can be used, but <emphasis>NOT
|
||||
ALWAYS</emphasis>!.</para>
|
||||
<function>g_read_data</function>() can be used, but <emphasis>not
|
||||
always</emphasis>. In particular, they cannot be used when
|
||||
a mutex is held; for example, the GEOM topology mutex or
|
||||
the internal mutex held during the <function>.start</function>() and
|
||||
<function>.stop</function>() functions.</para>
|
||||
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
@ -330,7 +349,11 @@
|
|||
<sect2 id="geom-class">
|
||||
<title>GEOM class</title>
|
||||
|
||||
<para>GEOM class has several "class methods" that get called
|
||||
<para>GEOM classes are transformations on the data. These transformations
|
||||
can be combined in a tree-like fashion. Instances of GEOM classes are
|
||||
called <emphasis>geoms</emphasis>.</para>
|
||||
|
||||
<para>Each GEOM class has several "class methods" that get called
|
||||
when there is no geom instance available (or they are simply not
|
||||
bound to a single instance):</para>
|
||||
|
||||
|
@ -364,8 +387,7 @@
|
|||
<structname>g_class</structname> structure is a LIST of geoms
|
||||
instantiated from the class.</para>
|
||||
|
||||
<para>These functions are called from g_event? kernel
|
||||
thread.</para>
|
||||
<para>These functions are called from the g_event kernel thread.</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
|
@ -377,9 +399,8 @@
|
|||
comes from the archaic term <quote>software control block</quote>.
|
||||
In GEOM, it is a structure (more precise: pointer to a
|
||||
structure) that can be attached to a geom instance to hold
|
||||
whatever data is private to the geom instance. In gjournal
|
||||
(and most of the other GEOM classes), some of its members
|
||||
are:</para>
|
||||
whatever data is private to the geom instance. Most GEOM classes
|
||||
have the following members:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para><varname>struct g_provider *provider</varname> : The
|
||||
|
@ -486,10 +507,10 @@
|
|||
|
||||
<itemizedlist>
|
||||
|
||||
<listitem><para>label - to write metadata to devices so they can be
|
||||
<listitem><para>label — to write metadata to devices so they can be
|
||||
recognized at tasting and brought up in geoms</para></listitem>
|
||||
|
||||
<listitem><para>destroy - to destroy metadata, so the geoms get
|
||||
<listitem><para>destroy — to destroy metadata, so the geoms get
|
||||
destroyed</para></listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
@ -515,7 +536,7 @@
|
|||
<sect2 id="geom-geoms">
|
||||
<title>Geoms</title>
|
||||
|
||||
<para>Geoms are instances of geom classes. They have internal
|
||||
<para>Geoms are instances of GEOM classes. They have internal
|
||||
data (a softc structure) and some functions with which they
|
||||
respond to external events.</para>
|
||||
|
||||
|
@ -537,9 +558,9 @@
|
|||
<listitem><para><function>.start</function> : handles I/O</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>These functions are called from the g_down? kernel thread and
|
||||
there can be no sleeping in this context (no blocking on a
|
||||
mutex or any kind of locks) which limits what can be done
|
||||
<para>These functions are called from the <function>g_down</function>
|
||||
kernel thread and there can be no sleeping in this context,
|
||||
(see definition of sleeping elsewhere) which limits what can be done
|
||||
quite a bit, but forces the handling to be fast.</para>
|
||||
|
||||
<para>Of these, the most important function for doing actual
|
||||
|
@ -632,15 +653,17 @@
|
|||
between passing the data to consumers and
|
||||
returning.</para></listitem>
|
||||
|
||||
<listitem><para>Waiting for I/O.</para></listitem>
|
||||
|
||||
<listitem><para>Calls to &man.malloc.9; and
|
||||
<function>uma_zalloc</function>() with
|
||||
<varname>M_WAITOK</varname> flag set</para></listitem>
|
||||
|
||||
<listitem><para>sx locks</para></listitem>
|
||||
<listitem><para>sx and other sleepable locks</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>This restriction is here to stop geom code clogging the I/O
|
||||
request path, because sleeping in the code is usually not
|
||||
<para>This restriction is here to stop GEOM code clogging the I/O
|
||||
request path, since sleeping is usually not
|
||||
time-bound and there can be no guarantees on how long will it
|
||||
take (there are some other, more technical reasons also). It
|
||||
also means that there is not much that can be done in those
|
||||
|
@ -657,7 +680,7 @@
|
|||
behaviour, only they cannot return to caller to signify
|
||||
termination, but must call &man.kthread.exit.9;.</para>
|
||||
|
||||
<para>In geom code, the usual use of threads is to offload
|
||||
<para>In GEOM code, the usual use of threads is to offload
|
||||
processing of requests from <literal>g_down</literal> thread
|
||||
(the <function>.start</function>() function). These threads
|
||||
look like <quote>event handlers</quote>: they have a linked
|
||||
|
@ -683,9 +706,9 @@
|
|||
<function>.done</function>() requests can be left to the
|
||||
<literal>g_up</literal> thread.</para>
|
||||
|
||||
<para>Mutexes in FreeBSD kernel (see &man.mutex.9; man page) have
|
||||
one distinction from their more common userland cousins - they
|
||||
disallow sleeping (meaning: the code cannot sleep while holding
|
||||
<para>Mutexes in FreeBSD kernel (see &man.mutex.9;) have
|
||||
one distinction from their more common userland cousins — the
|
||||
code cannot sleep while holding
|
||||
a mutex). If the code needs to sleep a lot, &man.sx.9; locks
|
||||
may be more appropriate. On the other hand, if you do almost
|
||||
everything in a single thread, you may get away with no
|
||||
|
|
Loading…
Reference in a new issue