Minor wording and grammar fixes.

Approved by:	murray
This commit is contained in:
Craig Rodrigues 2005-09-01 01:50:27 +00:00
parent 4f09d6756f
commit 6611bdac7c
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=25527

View file

@ -36,7 +36,7 @@
<para>This text documents the way I created the gjournal <para>This text documents the way I created the gjournal
facility, starting with learning how to do kernel facility, starting with learning how to do kernel
programming. It's assumed the reader is familiar with C programming. It is assumed that the reader is familiar with C
userland programming.</para> userland programming.</para>
</abstract> </abstract>
@ -50,8 +50,8 @@
<sect2 id="intro-docs"> <sect2 id="intro-docs">
<title>Documentation</title> <title>Documentation</title>
<para>Documentation on kernel programming is scarce - it's one of <para>Documentation on kernel programming is scarce - it is one of
few areas where there's nearly nothing in the way of friendly few areas where there is nearly nothing in the way of friendly
tutorials, and the phrase <quote>use the source!</quote> really tutorials, and the phrase <quote>use the source!</quote> really
holds true. However, there are some bits and pieces (some of holds true. However, there are some bits and pieces (some of
them seriously outdated) floating around that should be studied them seriously outdated) floating around that should be studied
@ -59,14 +59,14 @@
<itemizedlist> <itemizedlist>
<listitem><para><ulink <listitem><para>The <ulink
url="&url.books.developers-handbook;/index.html">FreeBSD url="&url.books.developers-handbook;/index.html">FreeBSD
Developer's Handbook</ulink> - part of the documentation Developer's Handbook</ulink> - part of the documentation
project, it doesn't contain anything specific to kernel-land project, it does not contain anything specific to kernel-land
programming, but rather some general programming, but rather some general
information.</para></listitem> information.</para></listitem>
<listitem><para><ulink <listitem><para>The <ulink
url="&url.books.arch-handbook;/index.html">FreeBSD url="&url.books.arch-handbook;/index.html">FreeBSD
Architecture Handbook</ulink> - also from the documentation Architecture Handbook</ulink> - also from the documentation
project, contains descriptions of several low-level facilities project, contains descriptions of several low-level facilities
@ -79,15 +79,17 @@
site - contains several interesting articles on kernel site - contains several interesting articles on kernel
facilities.</para></listitem> facilities.</para></listitem>
<listitem><para>The man pages in section 9 - most important <listitem><para>The man pages in section 9 - for important
kernel-land calls are documented here.</para></listitem> documentation on kernel functions.</para></listitem>
<listitem><para>The &man.geom.4; man page and PHK's GEOM slides <listitem><para>The &man.geom.4; man page and <ulink
url="http://phk.freebsd.dk/pubs/">PHK's GEOM slides</ulink>
- for general introduction of the GEOM - for general introduction of the GEOM
subsystem.</para></listitem> subsystem.</para></listitem>
<listitem><para>&man.style.9; man page, if the code should go to <listitem><para>The &man.style.9; man page - for documentation on
FreeBSD CVS tree</para></listitem> the coding-style conventions which must be followed for any code
which is to be committed to the FreeBSD CVS tree.</para></listitem>
</itemizedlist> </itemizedlist>
@ -97,18 +99,18 @@
<sect1 id="prelim"> <sect1 id="prelim">
<title>Preliminaries</title> <title>Preliminaries</title>
<para>The best way to do kernel developing is to have (at least) <para>The best way to do kernel development is to have (at least)
two separate computers. One of these would contain the two separate computers. One of these would contain the
development environment and sources, and the other would be used development environment and sources, and the other would be used
to test the newly written code by network-booting and to test the newly written code by network-booting and
network-mounting filesystems from the first one. This way if network-mounting filesystems from the first one. This way if
the new code contains bugs and crashes the machine, it won't the new code contains bugs and crashes the machine, it will not
mess up the sources (and other <quote>live</quote> data). The mess up the sources (and other <quote>live</quote> data). The
second system doesn't event have to have a proper display - it second system does not even require a proper display. Instead, it
could be connected with a serial cable or KVM to the first could be connected with a serial cable or KVM to the first
one.</para> one.</para>
<para>But, since not everybody has two+ computers handy, there are <para>But, since not everybody has two or more computers handy, there are
a few things that can be done to prepare an otherwise "live" a few things that can be done to prepare an otherwise "live"
system for developing kernel code.</para> system for developing kernel code.</para>
@ -116,7 +118,7 @@
<title>Converting a system for development</title> <title>Converting a system for development</title>
<para>For any kernel programming a kernel with <para>For any kernel programming a kernel with
<option>INVARIANTS</option> enabled is a must have. So enter <option>INVARIANTS</option> enabled is a must-have. So enter
these in your kernel configuration file:</para> these in your kernel configuration file:</para>
<programlisting> options INVARIANT_SUPPORT <programlisting> options INVARIANT_SUPPORT
@ -129,7 +131,7 @@
<para>With the usual way of installing the kernel (<command>make <para>With the usual way of installing the kernel (<command>make
installkernel</command>) the debug kernel will not be installkernel</command>) the debug kernel will not be
automatically installed. It's called automatically installed. It is called
<filename>kernel.debug</filename> and located in <filename>kernel.debug</filename> and located in
<filename>/usr/obj/usr/src/sys/KERNELNAME/</filename>. For <filename>/usr/obj/usr/src/sys/KERNELNAME/</filename>. For
convenience it should be copied to convenience it should be copied to
@ -143,18 +145,18 @@
options DDB options DDB
options KDB_TRACE</programlisting> options KDB_TRACE</programlisting>
<para>For this to work you might need to set a sysctl (if it's <para>For this to work you might need to set a sysctl (if it is
not on by default):</para> not on by default):</para>
<programlisting> debug.debugger_on_panic=1</programlisting> <programlisting> debug.debugger_on_panic=1</programlisting>
<para>Kernel panics will happen, so care should be taken with <para>Kernel panics will happen, so care should be taken with
the filesystem cache. In particular, having softupdates might the filesystem cache. In particular, having softupdates might
mean a latest file version could be lost if a panic occurs mean the latest file version could be lost if a panic occurs
before it's committed to storage. Disabling softupdates before it is committed to storage. Disabling softupdates
yields a great performance hit (and it still doesn't guarantee yields a great performance hit, and still does not guarantee
data consistency - mounting filesystem with the "sync" option data consistency. Mounting filesystem with the "sync" option
is needed for that) so for a compromise, the cache delays can is needed for that. For a compromise, the cache delays can
be shortened. There are three sysctl's that are useful for be shortened. There are three sysctl's that are useful for
this (best to be set in this (best to be set in
<filename>/etc/sysctl.conf</filename>):</para> <filename>/etc/sysctl.conf</filename>):</para>
@ -168,11 +170,11 @@
<para>For debugging kernel panics, kernel core dumps are <para>For debugging kernel panics, kernel core dumps are
required. Since a kernel panic might make filesystems required. Since a kernel panic might make filesystems
unusable, this crash dump is first written to a raw unusable, this crash dump is first written to a raw
partition. Usually, this is the swap partition (it must be at partition. Usually, this is the swap partition. This partition must be at
least as large as the physical RAM in the machine). On the least as large as the physical RAM in the machine. On the
next boot (after filesystems are checked and mounted and next boot, the dump is copied to a regular file.
before swap is enabled), the dump is copied to a regular This happens after filesystems are checked and mounted, and
file. This is controlled with two before swap is enabled. This is controlled with two
<filename>/etc/rc.conf</filename> variables:</para> <filename>/etc/rc.conf</filename> variables:</para>
<programlisting> dumpdev="/dev/ad0s4b" <programlisting> dumpdev="/dev/ad0s4b"
@ -184,24 +186,24 @@
<para>Writing kernel core dumps is slow and takes a long time so <para>Writing kernel core dumps is slow and takes a long time so
if you have lots of memory (>256M) and lots of panics it could if you have lots of memory (>256M) and lots of panics it could
be frustrating to sit and wait while it's done (twice - first be frustrating to sit and wait while it is done (twice - first
to write it to swap, then to relocate it to filesystem). It's to write it to swap, then to relocate it to filesystem). It is
convenient then to limit the amount of RAM the system will use convenient then to limit the amount of RAM the system will use
via a <filename>/boot/loader.conf</filename> tunable:</para> via a <filename>/boot/loader.conf</filename> tunable:</para>
<programlisting> hw.physmem="256M"</programlisting> <programlisting> hw.physmem="256M"</programlisting>
<para>If the panics are frequent and filesystems large (or you <para>If the panics are frequent and filesystems large (or you
simply don't trust softupdates+background fsck) it's advisable simply do not trust softupdates+background fsck) it is advisable
to turn background fsck off via to turn background fsck off via
<filename>/etc/rc.conf</filename> variable:</para> <filename>/etc/rc.conf</filename> variable:</para>
<programlisting> background_fsck="NO"</programlisting> <programlisting> background_fsck="NO"</programlisting>
<para>This way, the filesystems will always get checked when <para>This way, the filesystems will always get checked when
needed (with background fsck, a new panic could happen while needed. Note that with background fsck, a new panic could happen while
it's checking the disks). Again, the safest way is not to have it is checking the disks. Again, the safest way is not to have
many local filesystems by using another computer as NFS many local filesystems by using another computer as an NFS
server.</para> server.</para>
</sect2> </sect2>
@ -210,20 +212,20 @@
<para>For the purpose of making gjournal, a new empty <para>For the purpose of making gjournal, a new empty
subdirectory was created under an arbitrary user-accessible subdirectory was created under an arbitrary user-accessible
directory. You don't have to create the module directory under directory. You do not have to create the module directory under
<filename>/usr/src</filename>.</para> <filename>/usr/src</filename>.</para>
</sect2> </sect2>
<sect2 id="prelim-makefile"> <sect2 id="prelim-makefile">
<title>The Makefile</title> <title>The Makefile</title>
<para>It's good practice to create <para>It is good practice to create
<filename>Makefile</filename>s for every nontrivial coding <filename>Makefile</filename>s for every nontrivial coding
project, which of course includes kernel modules.</para> project, which of course includes kernel modules.</para>
<para>Creating the <filename>Makefile</filename> is simple <para>Creating the <filename>Makefile</filename> is simple
thanks to extensive set of helper routines provided by the thanks to extensive set of helper routines provided by the
system. In short, here's how it looks:</para> system. In short, here is how it looks:</para>
<programlisting> SRCS=g_journal.c <programlisting> SRCS=g_journal.c
KMOD=geom_journal KMOD=geom_journal
@ -259,9 +261,9 @@
<filename>sys/malloc.h</filename> headers must be <filename>sys/malloc.h</filename> headers must be
included.</para> included.</para>
<para>There's another mechanism for allocating memory, the UMA <para>There is another mechanism for allocating memory, the UMA
(Universal Memory Allocator). See &man.uma.9; for details, but (Universal Memory Allocator). See &man.uma.9; for details, but
it's a special type of allocator mainly used for speedy it is a special type of allocator mainly used for speedy
allocation of lists comprised of same-sized items (for allocation of lists comprised of same-sized items (for
example, dynamic arrays of structs).</para> example, dynamic arrays of structs).</para>
</sect2> </sect2>
@ -273,10 +275,10 @@
things needs to be maintained. Fortunately, this data things needs to be maintained. Fortunately, this data
structure is implemented (in several ways) by the C macros structure is implemented (in several ways) by the C macros
included in the system. The most used list type is TAILQ included in the system. The most used list type is TAILQ
because it's the most flexible. It's also the one with largest because it is the most flexible. It is also the one with largest
memory requirements (its elements are doubly-linked) and memory requirements (its elements are doubly-linked) and
theoretically the slowest (though the speed variation is on theoretically the slowest (though the speed variation is on
the order of several CPU instructions more, so it shouldn't be the order of several CPU instructions more, so it should not be
taken seriously).</para> taken seriously).</para>
<para>If data retrieval speed is very important, see <para>If data retrieval speed is very important, see
@ -295,8 +297,8 @@
<para>The important thing here is that bios are dealt with <para>The important thing here is that bios are dealt with
asynchronously. That means that, in most parts of the code, asynchronously. That means that, in most parts of the code,
there's no analogue to userland's &man.read.2; and there is no analogue to userland's &man.read.2; and
&man.write.2; calls that don't return until a request is &man.write.2; calls that do not return until a request is
done. Rather, a developer-supplied function is called as a done. Rather, a developer-supplied function is called as a
notification when the request gets completed (or results in notification when the request gets completed (or results in
error).</para> error).</para>
@ -306,8 +308,8 @@
than the much more used imperative one (at least it takes a than the much more used imperative one (at least it takes a
while to get used to it). In some cases helper routines while to get used to it). In some cases helper routines
<function>g_write_data</function>() and <function>g_write_data</function>() and
<function>g_read_data</function>() can be used (NOT <function>g_read_data</function>() can be used, but <emphasis>NOT
ALWAYS!).</para> ALWAYS</emphasis>!.</para>
</sect2> </sect2>
</sect1> </sect1>
@ -320,7 +322,7 @@
<para>If maximum performance is not needed, a much simpler way <para>If maximum performance is not needed, a much simpler way
of making a data transformation is to implement it in userland of making a data transformation is to implement it in userland
via the ggate (GEOM gate) facility. Unfortunately, there's no via the ggate (GEOM gate) facility. Unfortunately, there is no
easy way to convert between, or even share code between the easy way to convert between, or even share code between the
two approaches.</para> two approaches.</para>
</sect2> </sect2>
@ -329,7 +331,7 @@
<title>GEOM class</title> <title>GEOM class</title>
<para>GEOM class has several "class methods" that get called <para>GEOM class has several "class methods" that get called
when there's no geom instance available (or they're simply not when there is no geom instance available (or they are simply not
bound to a single instance):</para> bound to a single instance):</para>
<itemizedlist> <itemizedlist>
@ -372,11 +374,11 @@
<para>The name <quote>softc</quote> is a legacy term for <para>The name <quote>softc</quote> is a legacy term for
<quote>driver private data</quote>. The name most probably <quote>driver private data</quote>. The name most probably
comes from archaic term <quote>software control block</quote>. comes from the archaic term <quote>software control block</quote>.
In GEOM, it's a structure (more precise: pointer to a In GEOM, it is a structure (more precise: pointer to a
structure) that can be attached to a geom instance to hold structure) that can be attached to a geom instance to hold
whatever data is private to the geom instance. In gjournal whatever data is private to the geom instance. In gjournal
(and most of the other GEOM classes), some of it's members (and most of the other GEOM classes), some of its members
are:</para> are:</para>
<itemizedlist> <itemizedlist>
@ -387,7 +389,7 @@
consumer this geom consumes</para></listitem> consumer this geom consumes</para></listitem>
<listitem><para><varname>struct g_consumer **disks</varname> : Array <listitem><para><varname>struct g_consumer **disks</varname> : Array
of <varname>struct g_consumer*</varname>. (It's not possible of <varname>struct g_consumer*</varname>. (It is not possible
to use just single indirection because struct g_consumer* to use just single indirection because struct g_consumer*
are created on our behalf by GEOM).</para></listitem> are created on our behalf by GEOM).</para></listitem>
</itemizedlist> </itemizedlist>
@ -412,14 +414,14 @@
</itemizedlist> </itemizedlist>
<para>It's assumed that geom classes know how to handle metadata <para>It is assumed that geom classes know how to handle metadata
with version ID's lower than theirs.</para> with version ID's lower than theirs.</para>
<para>Metadata is located in the last sector of the provider <para>Metadata is located in the last sector of the provider
(and thus must fit in it).</para> (and thus must fit in it).</para>
<para>(All this is implementation-dependent but all existing <para>(All this is implementation-dependent but all existing
code works like that, and it's supported by libraries.)</para> code works like that, and it is supported by libraries.)</para>
</sect2> </sect2>
<sect2 id="geom-creating"> <sect2 id="geom-creating">
@ -429,10 +431,10 @@
<itemizedlist> <itemizedlist>
<listitem><para>user calls &man.geom.8; utility (or one of it's <listitem><para>user calls &man.geom.8; utility (or one of its
hardlinked friends)</para></listitem> hardlinked friends)</para></listitem>
<listitem><para>the utility figures out which geom class it's <listitem><para>the utility figures out which geom class it is
supposed to handle and searches for supposed to handle and searches for
<filename>geom_<replaceable>CLASSNAME</replaceable>.so</filename> <filename>geom_<replaceable>CLASSNAME</replaceable>.so</filename>
library (usually in library (usually in
@ -450,10 +452,10 @@
<itemizedlist> <itemizedlist>
<listitem><para>&man.geom.8; looks in the command-line definition <listitem><para>&man.geom.8; looks in the command-line definition
for the command (usually "label"), calls a helper for the command (usually "label"), and calls a helper
function.</para></listitem> function.</para></listitem>
<listitem><para>helper function checks parameters & gathers <listitem><para>helper function checks parameters and gathers
metadata, which it proceeds to write to all concerned metadata, which it proceeds to write to all concerned
providers.</para></listitem> providers.</para></listitem>
@ -465,7 +467,7 @@
</itemizedlist> </itemizedlist>
<para>(The above sequence of events is implementation-dependent <para>(The above sequence of events is implementation-dependent
but all existing code works like that, and it's supported by but all existing code works like that, and it is supported by
libraries.)</para> libraries.)</para>
</sect2> </sect2>
@ -532,10 +534,10 @@
<listitem><para><function>.spoiled</function> : called when some <listitem><para><function>.spoiled</function> : called when some
underlying provider gets written to</para></listitem> underlying provider gets written to</para></listitem>
<listitem><para><function>.start</function> : handles IO</para></listitem> <listitem><para><function>.start</function> : handles I/O</para></listitem>
</itemizedlist> </itemizedlist>
<para>These functions are called from g_down? kernel thread and <para>These functions are called from the g_down? kernel thread and
there can be no sleeping in this context (no blocking on a there can be no sleeping in this context (no blocking on a
mutex or any kind of locks) which limits what can be done mutex or any kind of locks) which limits what can be done
quite a bit, but forces the handling to be fast.</para> quite a bit, but forces the handling to be fast.</para>
@ -567,16 +569,16 @@
</itemizedlist> </itemizedlist>
<para>When a user process issues <quote>read data X at offset Y <para>When a user process issues <quote>read data X at offset Y
of a file</quote> request, this is what happenes:</para> of a file</quote> request, this is what happens:</para>
<itemizedlist> <itemizedlist>
<listitem><para>The filesystem converts the request into struct bio <listitem><para>The filesystem converts the request into a struct bio
instance and passes it to GEOM subsystem. It knows what geom instance and passes it to the GEOM subsystem. It knows what geom
instance should handle it because filesystems are hosted instance should handle it because filesystems are hosted
directly on a geom instance.</para></listitem> directly on a geom instance.</para></listitem>
<listitem><para>The request ends up as a call to <listitem><para>The request ends up as a call to the
<function>.start</function>() function made on the g_down <function>.start</function>() function made on the g_down
thread and reaches the top-level geom instance.</para></listitem> thread and reaches the top-level geom instance.</para></listitem>
@ -612,12 +614,12 @@
<para>See &man.g.bio.9; man page for information how the data is <para>See &man.g.bio.9; man page for information how the data is
passed back and forth in the <structname>bio</structname> passed back and forth in the <structname>bio</structname>
structure (note particular the <varname>bio_parent</varname> structure (note in particular the <varname>bio_parent</varname>
and <varname>bio_children</varname> fields and how they are and <varname>bio_children</varname> fields and how they are
handled).</para> handled).</para>
<para>One important feature is: THERE CAN BE NO SLEEPING IN G_UP <para>One important feature is: <emphasis>THERE CAN BE NO SLEEPING IN G_UP
AND G_DOWN THREADS. This means that none of the following AND G_DOWN THREADS</emphasis>. This means that none of the following
things can be done in those threads (the list is of course not things can be done in those threads (the list is of course not
complete, but only informative):</para> complete, but only informative):</para>
@ -637,11 +639,11 @@
<listitem><para>sx locks</para></listitem> <listitem><para>sx locks</para></listitem>
</itemizedlist> </itemizedlist>
<para>This restriction is here to stop geom code clogging the IO <para>This restriction is here to stop geom code clogging the I/O
request path, because sleeping in the code is usually not request path, because sleeping in the code is usually not
time-bound and there can be no guarantiees on how long will it time-bound and there can be no guarantiees on how long will it
take (there are some other, more technical reasons also). It take (there are some other, more technical reasons also). It
also means that there's not much that can be done in those also means that there is not much that can be done in those
threads; for example, almost any complex thing requires memory threads; for example, almost any complex thing requires memory
allocation. Fortunately, there is a way out: creating allocation. Fortunately, there is a way out: creating
additional kernel threads.</para> additional kernel threads.</para>
@ -652,20 +654,20 @@
<para>Kernel threads are created with &man.kthread.create.9; <para>Kernel threads are created with &man.kthread.create.9;
function, and they are sort of similar to userland threads in function, and they are sort of similar to userland threads in
behaviour, only they can't return to caller to signify behaviour, only they cannot return to caller to signify
termination, but must call &man.kthread.exit.9;.</para> termination, but must call &man.kthread.exit.9;.</para>
<para>In geom code, the usual use of threads is to offload <para>In geom code, the usual use of threads is to offload
processing of requests from <literal>g_down</literal> thread processing of requests from <literal>g_down</literal> thread
(the <function>.start</function>() function). These threads (the <function>.start</function>() function). These threads
look like <quote>event handlers</quote>: they have a linked look like <quote>event handlers</quote>: they have a linked
list of event associated with them (on which events can posted list of event associated with them (on which events can be posted
by various functions in various threads so it must be by various functions in various threads so it must be
protected by a mutex), take the events from the list one by protected by a mutex), take the events from the list one by
one and process them in a big <literal>switch</literal>() one and process them in a big <literal>switch</literal>()
statement.</para> statement.</para>
<para>The main benefit of using a thread to handle IO requests <para>The main benefit of using a thread to handle I/O requests
is that it can sleep when needed. Now, this sounds good, but is that it can sleep when needed. Now, this sounds good, but
should be carefully thought out. Sleeping is well and very should be carefully thought out. Sleeping is well and very
convenient but can very effectively destroy performance of the convenient but can very effectively destroy performance of the
@ -683,11 +685,11 @@
<para>Mutexes in FreeBSD kernel (see &man.mutex.9; man page) have <para>Mutexes in FreeBSD kernel (see &man.mutex.9; man page) have
one distinction from their more common userland cousins - they one distinction from their more common userland cousins - they
disallow sleeping (meaning: the code can't sleep while holding disallow sleeping (meaning: the code cannot sleep while holding
a mutex). If the code needs to sleep a lot, &man.sx.9; locks a mutex). If the code needs to sleep a lot, &man.sx.9; locks
may be more appropriate. (On the other hand, if you do almost may be more appropriate. On the other hand, if you do almost
everything in a single thread, you may get away with no everything in a single thread, you may get away with no
mutexes at all).</para> mutexes at all.</para>
</sect2> </sect2>