Minor wording and grammar fixes.

Approved by:	murray
This commit is contained in:
Craig Rodrigues 2005-09-01 01:50:27 +00:00
parent 4f09d6756f
commit 6611bdac7c
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=25527

View file

@ -36,7 +36,7 @@
<para>This text documents the way I created the gjournal
facility, starting with learning how to do kernel
programming. It's assumed the reader is familiar with C
programming. It is assumed that the reader is familiar with C
userland programming.</para>
</abstract>
@ -50,8 +50,8 @@
<sect2 id="intro-docs">
<title>Documentation</title>
<para>Documentation on kernel programming is scarce - it's one of
few areas where there's nearly nothing in the way of friendly
<para>Documentation on kernel programming is scarce - it is one of
few areas where there is nearly nothing in the way of friendly
tutorials, and the phrase <quote>use the source!</quote> really
holds true. However, there are some bits and pieces (some of
them seriously outdated) floating around that should be studied
@ -59,14 +59,14 @@
<itemizedlist>
<listitem><para><ulink
<listitem><para>The <ulink
url="&url.books.developers-handbook;/index.html">FreeBSD
Developer's Handbook</ulink> - part of the documentation
project, it doesn't contain anything specific to kernel-land
project, it does not contain anything specific to kernel-land
programming, but rather some general
information.</para></listitem>
<listitem><para><ulink
<listitem><para>The <ulink
url="&url.books.arch-handbook;/index.html">FreeBSD
Architecture Handbook</ulink> - also from the documentation
project, contains descriptions of several low-level facilities
@ -79,15 +79,17 @@
site - contains several interesting articles on kernel
facilities.</para></listitem>
<listitem><para>The man pages in section 9 - most important
kernel-land calls are documented here.</para></listitem>
<listitem><para>The man pages in section 9 - for important
documentation on kernel functions.</para></listitem>
<listitem><para>The &man.geom.4; man page and PHK's GEOM slides
<listitem><para>The &man.geom.4; man page and <ulink
url="http://phk.freebsd.dk/pubs/">PHK's GEOM slides</ulink>
- for general introduction of the GEOM
subsystem.</para></listitem>
<listitem><para>&man.style.9; man page, if the code should go to
FreeBSD CVS tree</para></listitem>
<listitem><para>The &man.style.9; man page - for documentation on
the coding-style conventions which must be followed for any code
which is to be committed to the FreeBSD CVS tree.</para></listitem>
</itemizedlist>
@ -97,18 +99,18 @@
<sect1 id="prelim">
<title>Preliminaries</title>
<para>The best way to do kernel developing is to have (at least)
<para>The best way to do kernel development is to have (at least)
two separate computers. One of these would contain the
development environment and sources, and the other would be used
to test the newly written code by network-booting and
network-mounting filesystems from the first one. This way if
the new code contains bugs and crashes the machine, it won't
the new code contains bugs and crashes the machine, it will not
mess up the sources (and other <quote>live</quote> data). The
second system doesn't event have to have a proper display - it
second system does not even require a proper display. Instead, it
could be connected with a serial cable or KVM to the first
one.</para>
<para>But, since not everybody has two+ computers handy, there are
<para>But, since not everybody has two or more computers handy, there are
a few things that can be done to prepare an otherwise "live"
system for developing kernel code.</para>
@ -116,7 +118,7 @@
<title>Converting a system for development</title>
<para>For any kernel programming a kernel with
<option>INVARIANTS</option> enabled is a must have. So enter
<option>INVARIANTS</option> enabled is a must-have. So enter
these in your kernel configuration file:</para>
<programlisting> options INVARIANT_SUPPORT
@ -129,7 +131,7 @@
<para>With the usual way of installing the kernel (<command>make
installkernel</command>) the debug kernel will not be
automatically installed. It's called
automatically installed. It is called
<filename>kernel.debug</filename> and located in
<filename>/usr/obj/usr/src/sys/KERNELNAME/</filename>. For
convenience it should be copied to
@ -143,18 +145,18 @@
options DDB
options KDB_TRACE</programlisting>
<para>For this to work you might need to set a sysctl (if it's
<para>For this to work you might need to set a sysctl (if it is
not on by default):</para>
<programlisting> debug.debugger_on_panic=1</programlisting>
<para>Kernel panics will happen, so care should be taken with
the filesystem cache. In particular, having softupdates might
mean a latest file version could be lost if a panic occurs
before it's committed to storage. Disabling softupdates
yields a great performance hit (and it still doesn't guarantee
data consistency - mounting filesystem with the "sync" option
is needed for that) so for a compromise, the cache delays can
mean the latest file version could be lost if a panic occurs
before it is committed to storage. Disabling softupdates
yields a great performance hit, and still does not guarantee
data consistency. Mounting filesystem with the "sync" option
is needed for that. For a compromise, the cache delays can
be shortened. There are three sysctl's that are useful for
this (best to be set in
<filename>/etc/sysctl.conf</filename>):</para>
@ -168,11 +170,11 @@
<para>For debugging kernel panics, kernel core dumps are
required. Since a kernel panic might make filesystems
unusable, this crash dump is first written to a raw
partition. Usually, this is the swap partition (it must be at
least as large as the physical RAM in the machine). On the
next boot (after filesystems are checked and mounted and
before swap is enabled), the dump is copied to a regular
file. This is controlled with two
partition. Usually, this is the swap partition. This partition must be at
least as large as the physical RAM in the machine. On the
next boot, the dump is copied to a regular file.
This happens after filesystems are checked and mounted, and
before swap is enabled. This is controlled with two
<filename>/etc/rc.conf</filename> variables:</para>
<programlisting> dumpdev="/dev/ad0s4b"
@ -184,24 +186,24 @@
<para>Writing kernel core dumps is slow and takes a long time so
if you have lots of memory (>256M) and lots of panics it could
be frustrating to sit and wait while it's done (twice - first
to write it to swap, then to relocate it to filesystem). It's
be frustrating to sit and wait while it is done (twice - first
to write it to swap, then to relocate it to filesystem). It is
convenient then to limit the amount of RAM the system will use
via a <filename>/boot/loader.conf</filename> tunable:</para>
<programlisting> hw.physmem="256M"</programlisting>
<para>If the panics are frequent and filesystems large (or you
simply don't trust softupdates+background fsck) it's advisable
simply do not trust softupdates+background fsck) it is advisable
to turn background fsck off via
<filename>/etc/rc.conf</filename> variable:</para>
<programlisting> background_fsck="NO"</programlisting>
<para>This way, the filesystems will always get checked when
needed (with background fsck, a new panic could happen while
it's checking the disks). Again, the safest way is not to have
many local filesystems by using another computer as NFS
needed. Note that with background fsck, a new panic could happen while
it is checking the disks. Again, the safest way is not to have
many local filesystems by using another computer as an NFS
server.</para>
</sect2>
@ -210,20 +212,20 @@
<para>For the purpose of making gjournal, a new empty
subdirectory was created under an arbitrary user-accessible
directory. You don't have to create the module directory under
directory. You do not have to create the module directory under
<filename>/usr/src</filename>.</para>
</sect2>
<sect2 id="prelim-makefile">
<title>The Makefile</title>
<para>It's good practice to create
<para>It is good practice to create
<filename>Makefile</filename>s for every nontrivial coding
project, which of course includes kernel modules.</para>
<para>Creating the <filename>Makefile</filename> is simple
thanks to extensive set of helper routines provided by the
system. In short, here's how it looks:</para>
system. In short, here is how it looks:</para>
<programlisting> SRCS=g_journal.c
KMOD=geom_journal
@ -259,9 +261,9 @@
<filename>sys/malloc.h</filename> headers must be
included.</para>
<para>There's another mechanism for allocating memory, the UMA
<para>There is another mechanism for allocating memory, the UMA
(Universal Memory Allocator). See &man.uma.9; for details, but
it's a special type of allocator mainly used for speedy
it is a special type of allocator mainly used for speedy
allocation of lists comprised of same-sized items (for
example, dynamic arrays of structs).</para>
</sect2>
@ -273,10 +275,10 @@
things needs to be maintained. Fortunately, this data
structure is implemented (in several ways) by the C macros
included in the system. The most used list type is TAILQ
because it's the most flexible. It's also the one with largest
because it is the most flexible. It is also the one with largest
memory requirements (its elements are doubly-linked) and
theoretically the slowest (though the speed variation is on
the order of several CPU instructions more, so it shouldn't be
the order of several CPU instructions more, so it should not be
taken seriously).</para>
<para>If data retrieval speed is very important, see
@ -295,8 +297,8 @@
<para>The important thing here is that bios are dealt with
asynchronously. That means that, in most parts of the code,
there's no analogue to userland's &man.read.2; and
&man.write.2; calls that don't return until a request is
there is no analogue to userland's &man.read.2; and
&man.write.2; calls that do not return until a request is
done. Rather, a developer-supplied function is called as a
notification when the request gets completed (or results in
error).</para>
@ -306,8 +308,8 @@
than the much more used imperative one (at least it takes a
while to get used to it). In some cases helper routines
<function>g_write_data</function>() and
<function>g_read_data</function>() can be used (NOT
ALWAYS!).</para>
<function>g_read_data</function>() can be used, but <emphasis>NOT
ALWAYS</emphasis>!.</para>
</sect2>
</sect1>
@ -320,7 +322,7 @@
<para>If maximum performance is not needed, a much simpler way
of making a data transformation is to implement it in userland
via the ggate (GEOM gate) facility. Unfortunately, there's no
via the ggate (GEOM gate) facility. Unfortunately, there is no
easy way to convert between, or even share code between the
two approaches.</para>
</sect2>
@ -329,7 +331,7 @@
<title>GEOM class</title>
<para>GEOM class has several "class methods" that get called
when there's no geom instance available (or they're simply not
when there is no geom instance available (or they are simply not
bound to a single instance):</para>
<itemizedlist>
@ -372,11 +374,11 @@
<para>The name <quote>softc</quote> is a legacy term for
<quote>driver private data</quote>. The name most probably
comes from archaic term <quote>software control block</quote>.
In GEOM, it's a structure (more precise: pointer to a
comes from the archaic term <quote>software control block</quote>.
In GEOM, it is a structure (more precise: pointer to a
structure) that can be attached to a geom instance to hold
whatever data is private to the geom instance. In gjournal
(and most of the other GEOM classes), some of it's members
(and most of the other GEOM classes), some of its members
are:</para>
<itemizedlist>
@ -387,7 +389,7 @@
consumer this geom consumes</para></listitem>
<listitem><para><varname>struct g_consumer **disks</varname> : Array
of <varname>struct g_consumer*</varname>. (It's not possible
of <varname>struct g_consumer*</varname>. (It is not possible
to use just single indirection because struct g_consumer*
are created on our behalf by GEOM).</para></listitem>
</itemizedlist>
@ -412,14 +414,14 @@
</itemizedlist>
<para>It's assumed that geom classes know how to handle metadata
<para>It is assumed that geom classes know how to handle metadata
with version ID's lower than theirs.</para>
<para>Metadata is located in the last sector of the provider
(and thus must fit in it).</para>
<para>(All this is implementation-dependent but all existing
code works like that, and it's supported by libraries.)</para>
code works like that, and it is supported by libraries.)</para>
</sect2>
<sect2 id="geom-creating">
@ -429,10 +431,10 @@
<itemizedlist>
<listitem><para>user calls &man.geom.8; utility (or one of it's
<listitem><para>user calls &man.geom.8; utility (or one of its
hardlinked friends)</para></listitem>
<listitem><para>the utility figures out which geom class it's
<listitem><para>the utility figures out which geom class it is
supposed to handle and searches for
<filename>geom_<replaceable>CLASSNAME</replaceable>.so</filename>
library (usually in
@ -450,10 +452,10 @@
<itemizedlist>
<listitem><para>&man.geom.8; looks in the command-line definition
for the command (usually "label"), calls a helper
for the command (usually "label"), and calls a helper
function.</para></listitem>
<listitem><para>helper function checks parameters & gathers
<listitem><para>helper function checks parameters and gathers
metadata, which it proceeds to write to all concerned
providers.</para></listitem>
@ -465,7 +467,7 @@
</itemizedlist>
<para>(The above sequence of events is implementation-dependent
but all existing code works like that, and it's supported by
but all existing code works like that, and it is supported by
libraries.)</para>
</sect2>
@ -532,10 +534,10 @@
<listitem><para><function>.spoiled</function> : called when some
underlying provider gets written to</para></listitem>
<listitem><para><function>.start</function> : handles IO</para></listitem>
<listitem><para><function>.start</function> : handles I/O</para></listitem>
</itemizedlist>
<para>These functions are called from g_down? kernel thread and
<para>These functions are called from the g_down? kernel thread and
there can be no sleeping in this context (no blocking on a
mutex or any kind of locks) which limits what can be done
quite a bit, but forces the handling to be fast.</para>
@ -567,16 +569,16 @@
</itemizedlist>
<para>When a user process issues <quote>read data X at offset Y
of a file</quote> request, this is what happenes:</para>
of a file</quote> request, this is what happens:</para>
<itemizedlist>
<listitem><para>The filesystem converts the request into struct bio
instance and passes it to GEOM subsystem. It knows what geom
<listitem><para>The filesystem converts the request into a struct bio
instance and passes it to the GEOM subsystem. It knows what geom
instance should handle it because filesystems are hosted
directly on a geom instance.</para></listitem>
<listitem><para>The request ends up as a call to
<listitem><para>The request ends up as a call to the
<function>.start</function>() function made on the g_down
thread and reaches the top-level geom instance.</para></listitem>
@ -612,12 +614,12 @@
<para>See &man.g.bio.9; man page for information how the data is
passed back and forth in the <structname>bio</structname>
structure (note particular the <varname>bio_parent</varname>
structure (note in particular the <varname>bio_parent</varname>
and <varname>bio_children</varname> fields and how they are
handled).</para>
<para>One important feature is: THERE CAN BE NO SLEEPING IN G_UP
AND G_DOWN THREADS. This means that none of the following
<para>One important feature is: <emphasis>THERE CAN BE NO SLEEPING IN G_UP
AND G_DOWN THREADS</emphasis>. This means that none of the following
things can be done in those threads (the list is of course not
complete, but only informative):</para>
@ -637,11 +639,11 @@
<listitem><para>sx locks</para></listitem>
</itemizedlist>
<para>This restriction is here to stop geom code clogging the IO
<para>This restriction is here to stop geom code clogging the I/O
request path, because sleeping in the code is usually not
time-bound and there can be no guarantiees on how long will it
take (there are some other, more technical reasons also). It
also means that there's not much that can be done in those
also means that there is not much that can be done in those
threads; for example, almost any complex thing requires memory
allocation. Fortunately, there is a way out: creating
additional kernel threads.</para>
@ -652,20 +654,20 @@
<para>Kernel threads are created with &man.kthread.create.9;
function, and they are sort of similar to userland threads in
behaviour, only they can't return to caller to signify
behaviour, only they cannot return to caller to signify
termination, but must call &man.kthread.exit.9;.</para>
<para>In geom code, the usual use of threads is to offload
processing of requests from <literal>g_down</literal> thread
(the <function>.start</function>() function). These threads
look like <quote>event handlers</quote>: they have a linked
list of event associated with them (on which events can posted
list of event associated with them (on which events can be posted
by various functions in various threads so it must be
protected by a mutex), take the events from the list one by
one and process them in a big <literal>switch</literal>()
statement.</para>
<para>The main benefit of using a thread to handle IO requests
<para>The main benefit of using a thread to handle I/O requests
is that it can sleep when needed. Now, this sounds good, but
should be carefully thought out. Sleeping is well and very
convenient but can very effectively destroy performance of the
@ -683,11 +685,11 @@
<para>Mutexes in FreeBSD kernel (see &man.mutex.9; man page) have
one distinction from their more common userland cousins - they
disallow sleeping (meaning: the code can't sleep while holding
disallow sleeping (meaning: the code cannot sleep while holding
a mutex). If the code needs to sleep a lot, &man.sx.9; locks
may be more appropriate. (On the other hand, if you do almost
may be more appropriate. On the other hand, if you do almost
everything in a single thread, you may get away with no
mutexes at all).</para>
mutexes at all.</para>
</sect2>