Make the whitespace in this file consistent; that last merge was
horrible.
This commit is contained in:
parent
eb6784ebf1
commit
389924ee76
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/head/; revision=30146
1 changed files with 285 additions and 285 deletions
|
@ -65,20 +65,20 @@
|
|||
programming, but rather some general useful information.</para></listitem>
|
||||
|
||||
<listitem><para>The <ulink
|
||||
url="&url.books.arch-handbook;/index.html">FreeBSD
|
||||
Architecture Handbook</ulink> — also from the documentation
|
||||
project, contains descriptions of several low-level facilities
|
||||
and procedures. The most important chapter is 13, <ulink
|
||||
url="&url.books.arch-handbook;/driverbasics.html">Writing
|
||||
FreeBSD device drivers</ulink>.</para></listitem>
|
||||
url="&url.books.arch-handbook;/index.html">FreeBSD
|
||||
Architecture Handbook</ulink> — also from the documentation
|
||||
project, contains descriptions of several low-level facilities
|
||||
and procedures. The most important chapter is 13, <ulink
|
||||
url="&url.books.arch-handbook;/driverbasics.html">Writing
|
||||
FreeBSD device drivers</ulink>.</para></listitem>
|
||||
|
||||
<listitem><para>The Blueprints section of <ulink
|
||||
url="http://www.freebsddiary.org">FreeBSD Diary</ulink> web
|
||||
site — contains several interesting articles on kernel
|
||||
facilities.</para></listitem>
|
||||
url="http://www.freebsddiary.org">FreeBSD Diary</ulink> web
|
||||
site — contains several interesting articles on kernel
|
||||
facilities.</para></listitem>
|
||||
|
||||
<listitem><para>The man pages in section 9 — for important
|
||||
documentation on kernel functions.</para></listitem>
|
||||
documentation on kernel functions.</para></listitem>
|
||||
|
||||
<listitem><para>The &man.geom.4; man page and <ulink
|
||||
url="http://phk.freebsd.dk/pubs/">PHK's GEOM slides</ulink>
|
||||
|
@ -86,9 +86,9 @@
|
|||
subsystem.</para></listitem>
|
||||
|
||||
<listitem><para>Man pages &man.g.bio.9;, &man.g.event.9;, &man.g.data.9;,
|
||||
&man.g.geom.9;, &man.g.provider.9; &man.g.consumer.9;, &man.g.access.9;
|
||||
& others linked from those, for documentation on specific
|
||||
functionalities.
|
||||
&man.g.geom.9;, &man.g.provider.9; &man.g.consumer.9;, &man.g.access.9;
|
||||
& others linked from those, for documentation on specific
|
||||
functionalities.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>The &man.style.9; man page — for documentation on
|
||||
|
@ -125,8 +125,8 @@
|
|||
<title>Modifying a system for development</title>
|
||||
|
||||
<para>For any kernel programming a kernel with
|
||||
<option>INVARIANTS</option> enabled is a must-have. So enter
|
||||
these in your kernel configuration file:</para>
|
||||
<option>INVARIANTS</option> enabled is a must-have. So enter
|
||||
these in your kernel configuration file:</para>
|
||||
|
||||
<programlisting>options INVARIANT_SUPPORT
|
||||
options INVARIANTS</programlisting>
|
||||
|
@ -138,41 +138,41 @@ options INVARIANTS</programlisting>
|
|||
options WITNESS</programlisting>
|
||||
|
||||
<para>For debugging crash dumps, a kernel with debug symbols is
|
||||
needed:</para>
|
||||
needed:</para>
|
||||
|
||||
<programlisting> makeoptions DEBUG=-g</programlisting>
|
||||
|
||||
<para>With the usual way of installing the kernel (<command>make
|
||||
installkernel</command>) the debug kernel will not be
|
||||
automatically installed. It is called
|
||||
<filename>kernel.debug</filename> and located in
|
||||
<filename>/usr/obj/usr/src/sys/KERNELNAME/</filename>. For
|
||||
convenience it should be copied to
|
||||
<filename>/boot/kernel/</filename>.</para>
|
||||
installkernel</command>) the debug kernel will not be
|
||||
automatically installed. It is called
|
||||
<filename>kernel.debug</filename> and located in
|
||||
<filename>/usr/obj/usr/src/sys/KERNELNAME/</filename>. For
|
||||
convenience it should be copied to
|
||||
<filename>/boot/kernel/</filename>.</para>
|
||||
|
||||
<para>Another convenience is enabling the kernel debugger so you
|
||||
can examine a kernel panic when it happens. For this, enter
|
||||
the following lines in your kernel configuration file:</para>
|
||||
can examine a kernel panic when it happens. For this, enter
|
||||
the following lines in your kernel configuration file:</para>
|
||||
|
||||
<programlisting>options KDB
|
||||
options DDB
|
||||
options KDB_TRACE</programlisting>
|
||||
|
||||
<para>For this to work you might need to set a sysctl (if it is
|
||||
not on by default):</para>
|
||||
not on by default):</para>
|
||||
|
||||
<programlisting> debug.debugger_on_panic=1</programlisting>
|
||||
|
||||
<para>Kernel panics will happen, so care should be taken with
|
||||
the filesystem cache. In particular, having softupdates might
|
||||
mean the latest file version could be lost if a panic occurs
|
||||
before it is committed to storage. Disabling softupdates
|
||||
yields a great performance hit, and still does not guarantee
|
||||
data consistency. Mounting filesystem with the "sync" option
|
||||
is needed for that. For a compromise, the softupdates cache delays can
|
||||
be shortened. There are three sysctl's that are useful for
|
||||
this (best to be set in
|
||||
<filename>/etc/sysctl.conf</filename>):</para>
|
||||
the filesystem cache. In particular, having softupdates might
|
||||
mean the latest file version could be lost if a panic occurs
|
||||
before it is committed to storage. Disabling softupdates
|
||||
yields a great performance hit, and still does not guarantee
|
||||
data consistency. Mounting filesystem with the "sync" option
|
||||
is needed for that. For a compromise, the softupdates cache delays can
|
||||
be shortened. There are three sysctl's that are useful for
|
||||
this (best to be set in
|
||||
<filename>/etc/sysctl.conf</filename>):</para>
|
||||
|
||||
<programlisting>kern.filedelay=5
|
||||
kern.dirdelay=4
|
||||
|
@ -181,35 +181,35 @@ kern.metadelay=3</programlisting>
|
|||
<para>The numbers represent seconds.</para>
|
||||
|
||||
<para>For debugging kernel panics, kernel core dumps are
|
||||
required. Since a kernel panic might make filesystems
|
||||
unusable, this crash dump is first written to a raw
|
||||
partition. Usually, this is the swap partition. This partition must be at
|
||||
least as large as the physical RAM in the machine. On the
|
||||
next boot, the dump is copied to a regular file.
|
||||
This happens after filesystems are checked and mounted, and
|
||||
before swap is enabled. This is controlled with two
|
||||
<filename>/etc/rc.conf</filename> variables:</para>
|
||||
required. Since a kernel panic might make filesystems
|
||||
unusable, this crash dump is first written to a raw
|
||||
partition. Usually, this is the swap partition. This partition must be at
|
||||
least as large as the physical RAM in the machine. On the
|
||||
next boot, the dump is copied to a regular file.
|
||||
This happens after filesystems are checked and mounted, and
|
||||
before swap is enabled. This is controlled with two
|
||||
<filename>/etc/rc.conf</filename> variables:</para>
|
||||
|
||||
<programlisting>dumpdev="/dev/ad0s4b"
|
||||
dumpdir="/usr/core </programlisting>
|
||||
|
||||
<para>The <varname>dumpdev</varname> variable specifies the swap
|
||||
partition and <varname>dumpdir</varname> tells the system
|
||||
where in the filesystem to relocate the core dump on reboot.</para>
|
||||
partition and <varname>dumpdir</varname> tells the system
|
||||
where in the filesystem to relocate the core dump on reboot.</para>
|
||||
|
||||
<para>Writing kernel core dumps is slow and takes a long time so
|
||||
if you have lots of memory (>256M) and lots of panics it could
|
||||
be frustrating to sit and wait while it is done (twice — first
|
||||
to write it to swap, then to relocate it to filesystem). It is
|
||||
convenient then to limit the amount of RAM the system will use
|
||||
via a <filename>/boot/loader.conf</filename> tunable:</para>
|
||||
if you have lots of memory (>256M) and lots of panics it could
|
||||
be frustrating to sit and wait while it is done (twice — first
|
||||
to write it to swap, then to relocate it to filesystem). It is
|
||||
convenient then to limit the amount of RAM the system will use
|
||||
via a <filename>/boot/loader.conf</filename> tunable:</para>
|
||||
|
||||
<programlisting> hw.physmem="256M"</programlisting>
|
||||
|
||||
<para>If the panics are frequent and filesystems large (or you
|
||||
simply do not trust softupdates+background fsck) it is advisable
|
||||
to turn background fsck off via
|
||||
<filename>/etc/rc.conf</filename> variable:</para>
|
||||
simply do not trust softupdates+background fsck) it is advisable
|
||||
to turn background fsck off via
|
||||
<filename>/etc/rc.conf</filename> variable:</para>
|
||||
|
||||
<programlisting> background_fsck="NO"</programlisting>
|
||||
|
||||
|
@ -233,13 +233,13 @@ dumpdir="/usr/core </programlisting>
|
|||
<title>The Makefile</title>
|
||||
|
||||
<para>It is good practice to create
|
||||
<filename>Makefile</filename>s for every nontrivial coding
|
||||
project, which of course includes kernel modules.</para>
|
||||
<filename>Makefile</filename>s for every nontrivial coding
|
||||
project, which of course includes kernel modules.</para>
|
||||
|
||||
<para>Creating the <filename>Makefile</filename> is simple
|
||||
thanks to an extensive set of helper routines provided by the
|
||||
system. In short, here is how a minimal <filename>Makefile</filename>
|
||||
looks for a kernel module:</para>
|
||||
thanks to an extensive set of helper routines provided by the
|
||||
system. In short, here is how a minimal <filename>Makefile</filename>
|
||||
looks for a kernel module:</para>
|
||||
|
||||
<programlisting>SRCS=g_journal.c
|
||||
KMOD=geom_journal
|
||||
|
@ -247,10 +247,10 @@ KMOD=geom_journal
|
|||
.include <bsd.kmod.mk></programlisting>
|
||||
|
||||
<para>This <filename>Makefile</filename> (with changed filenames)
|
||||
will do for any kernel module, and a GEOM class can reside in just
|
||||
one kernel module. If more than one file is required, list it in the
|
||||
<envar>SRCS</envar> variable, separated with whitespace from
|
||||
other filenames.</para>
|
||||
will do for any kernel module, and a GEOM class can reside in just
|
||||
one kernel module. If more than one file is required, list it in the
|
||||
<envar>SRCS</envar> variable, separated with whitespace from
|
||||
other filenames.</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
|
@ -261,13 +261,13 @@ KMOD=geom_journal
|
|||
<title>Memory allocation</title>
|
||||
|
||||
<para>See &man.malloc.9;. Basic memory allocation is only
|
||||
slightly different than its userland equivalent. Most
|
||||
notably, <function>malloc</function>() and
|
||||
<function>free</function>() accept additional parameters as is
|
||||
described in the man page.</para>
|
||||
slightly different than its userland equivalent. Most
|
||||
notably, <function>malloc</function>() and
|
||||
<function>free</function>() accept additional parameters as is
|
||||
described in the man page.</para>
|
||||
|
||||
<para>A <quote>malloc type</quote> must be declared in the
|
||||
declaration section of a source file, like this:</para>
|
||||
declaration section of a source file, like this:</para>
|
||||
|
||||
<programlisting> static MALLOC_DEFINE(M_GJOURNAL, "gjournal data", "GEOM_JOURNAL Data");</programlisting>
|
||||
|
||||
|
@ -277,24 +277,24 @@ KMOD=geom_journal
|
|||
included.</para>
|
||||
|
||||
<para>There is another mechanism for allocating memory, the UMA
|
||||
(Universal Memory Allocator). See &man.uma.9; for details, but
|
||||
it is a special type of allocator mainly used for speedy
|
||||
allocation of lists comprised of same-sized items (for
|
||||
example, dynamic arrays of structs).</para>
|
||||
(Universal Memory Allocator). See &man.uma.9; for details, but
|
||||
it is a special type of allocator mainly used for speedy
|
||||
allocation of lists comprised of same-sized items (for
|
||||
example, dynamic arrays of structs).</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="kernelprog-lists">
|
||||
<title>Lists and queues</title>
|
||||
|
||||
<para>See &man.queue.3;. There are a LOT of cases when a list of
|
||||
things needs to be maintained. Fortunately, this data
|
||||
structure is implemented (in several ways) by C macros
|
||||
included in the system. The most used list type is TAILQ
|
||||
because it is the most flexible. It is also the one with largest
|
||||
memory requirements (its elements are doubly-linked) and
|
||||
also the slowest (although the speed variation is on
|
||||
the order of several CPU instructions more, so it should not be
|
||||
taken seriously).</para>
|
||||
things needs to be maintained. Fortunately, this data
|
||||
structure is implemented (in several ways) by C macros
|
||||
included in the system. The most used list type is TAILQ
|
||||
because it is the most flexible. It is also the one with largest
|
||||
memory requirements (its elements are doubly-linked) and
|
||||
also the slowest (although the speed variation is on
|
||||
the order of several CPU instructions more, so it should not be
|
||||
taken seriously).</para>
|
||||
|
||||
<para>If data retrieval speed is very important, see
|
||||
&man.tree.3; and &man.hashinit.9;.</para>
|
||||
|
@ -304,31 +304,31 @@ KMOD=geom_journal
|
|||
<title>BIOs</title>
|
||||
|
||||
<para>Structure <structname>bio</structname> is used for any and
|
||||
all Input/Output operations concerning GEOM. It basically
|
||||
contains information about what device ('provider') should
|
||||
satisfy the request, request type, offset, length, pointer to
|
||||
a buffer, and a bunch of <quote>user-specific</quote> flags
|
||||
and fields that can help implement various hacks.</para>
|
||||
all Input/Output operations concerning GEOM. It basically
|
||||
contains information about what device ('provider') should
|
||||
satisfy the request, request type, offset, length, pointer to
|
||||
a buffer, and a bunch of <quote>user-specific</quote> flags
|
||||
and fields that can help implement various hacks.</para>
|
||||
|
||||
<para>The important thing here is that <structname>bio</structname>s
|
||||
are handled asynchronously. That means that, in most parts of the code,
|
||||
there is no analogue to userland's &man.read.2; and
|
||||
&man.write.2; calls that do not return until a request is
|
||||
done. Rather, a developer-supplied function is called as a
|
||||
notification when the request gets completed (or results in
|
||||
error).</para>
|
||||
are handled asynchronously. That means that, in most parts of the code,
|
||||
there is no analogue to userland's &man.read.2; and
|
||||
&man.write.2; calls that do not return until a request is
|
||||
done. Rather, a developer-supplied function is called as a
|
||||
notification when the request gets completed (or results in
|
||||
error).</para>
|
||||
|
||||
<para>The asynchronous programming model (also
|
||||
called "event-driven") is somewhat harder
|
||||
than the much more used imperative one used in userland
|
||||
(at least it takes a
|
||||
while to get used to it). In some cases the helper routines
|
||||
<function>g_write_data</function>() and
|
||||
<function>g_read_data</function>() can be used, but <emphasis>not
|
||||
always</emphasis>. In particular, they cannot be used when
|
||||
a mutex is held; for example, the GEOM topology mutex or
|
||||
the internal mutex held during the <function>.start</function>() and
|
||||
<function>.stop</function>() functions.</para>
|
||||
called "event-driven") is somewhat harder
|
||||
than the much more used imperative one used in userland
|
||||
(at least it takes a
|
||||
while to get used to it). In some cases the helper routines
|
||||
<function>g_write_data</function>() and
|
||||
<function>g_read_data</function>() can be used, but <emphasis>not
|
||||
always</emphasis>. In particular, they cannot be used when
|
||||
a mutex is held; for example, the GEOM topology mutex or
|
||||
the internal mutex held during the <function>.start</function>() and
|
||||
<function>.stop</function>() functions.</para>
|
||||
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
@ -340,52 +340,52 @@ KMOD=geom_journal
|
|||
<title>Ggate</title>
|
||||
|
||||
<para>If maximum performance is not needed, a much simpler way
|
||||
of making a data transformation is to implement it in userland
|
||||
via the ggate (GEOM gate) facility. Unfortunately, there is no
|
||||
easy way to convert between, or even share code between the
|
||||
two approaches.</para>
|
||||
of making a data transformation is to implement it in userland
|
||||
via the ggate (GEOM gate) facility. Unfortunately, there is no
|
||||
easy way to convert between, or even share code between the
|
||||
two approaches.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="geom-class">
|
||||
<title>GEOM class</title>
|
||||
|
||||
<para>GEOM classes are transformations on the data. These transformations
|
||||
can be combined in a tree-like fashion. Instances of GEOM classes are
|
||||
called <emphasis>geoms</emphasis>.</para>
|
||||
can be combined in a tree-like fashion. Instances of GEOM classes are
|
||||
called <emphasis>geoms</emphasis>.</para>
|
||||
|
||||
<para>Each GEOM class has several "class methods" that get called
|
||||
when there is no geom instance available (or they are simply not
|
||||
bound to a single instance):</para>
|
||||
when there is no geom instance available (or they are simply not
|
||||
bound to a single instance):</para>
|
||||
|
||||
<itemizedlist>
|
||||
|
||||
<listitem><para><function>.init</function> is called when GEOM
|
||||
becomes aware of a GEOM class (e.g. when the kernel module
|
||||
gets loaded.)</para></listitem>
|
||||
becomes aware of a GEOM class (e.g. when the kernel module
|
||||
gets loaded.)</para></listitem>
|
||||
|
||||
<listitem><para><function>.fini</function> gets called when GEOM
|
||||
abandons the class (e.g. when the module gets
|
||||
unloaded)</para></listitem>
|
||||
<listitem><para><function>.fini</function> gets called when GEOM
|
||||
abandons the class (e.g. when the module gets
|
||||
unloaded)</para></listitem>
|
||||
|
||||
<listitem><para><function>.taste</function> is called next, once for
|
||||
each provider the system has available. If applicable, this
|
||||
function will usually create and start a geom
|
||||
instance.</para></listitem>
|
||||
<listitem><para><function>.taste</function> is called next, once for
|
||||
each provider the system has available. If applicable, this
|
||||
function will usually create and start a geom
|
||||
instance.</para></listitem>
|
||||
|
||||
<listitem><para><function>.destroy_geom</function> is called when
|
||||
the geom should be disbanded</para></listitem>
|
||||
<listitem><para><function>.destroy_geom</function> is called when
|
||||
the geom should be disbanded</para></listitem>
|
||||
|
||||
<listitem><para><function>.ctlconf</function> is called when user
|
||||
requests reconfiguration of existing geom</para></listitem>
|
||||
<listitem><para><function>.ctlconf</function> is called when user
|
||||
requests reconfiguration of existing geom</para></listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
<para>Also defined are the GEOM event functions, which will get
|
||||
copied to the geom instance.</para>
|
||||
copied to the geom instance.</para>
|
||||
|
||||
<para>Field <function>.geom</function> in the
|
||||
<structname>g_class</structname> structure is a LIST of geoms
|
||||
instantiated from the class.</para>
|
||||
<structname>g_class</structname> structure is a LIST of geoms
|
||||
instantiated from the class.</para>
|
||||
|
||||
<para>These functions are called from the g_event kernel thread.</para>
|
||||
|
||||
|
@ -395,29 +395,29 @@ KMOD=geom_journal
|
|||
<title>Softc</title>
|
||||
|
||||
<para>The name <quote>softc</quote> is a legacy term for
|
||||
<quote>driver private data</quote>. The name most probably
|
||||
comes from the archaic term <quote>software control block</quote>.
|
||||
In GEOM, it is a structure (more precise: pointer to a
|
||||
structure) that can be attached to a geom instance to hold
|
||||
whatever data is private to the geom instance. Most GEOM classes
|
||||
have the following members:</para>
|
||||
<quote>driver private data</quote>. The name most probably
|
||||
comes from the archaic term <quote>software control block</quote>.
|
||||
In GEOM, it is a structure (more precise: pointer to a
|
||||
structure) that can be attached to a geom instance to hold
|
||||
whatever data is private to the geom instance. Most GEOM classes
|
||||
have the following members:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para><varname>struct g_provider *provider</varname> : The
|
||||
<quote>provider</quote> this geom instantiates</para></listitem>
|
||||
<listitem><para><varname>struct g_provider *provider</varname> : The
|
||||
<quote>provider</quote> this geom instantiates</para></listitem>
|
||||
|
||||
<listitem><para><varname>uint16_t n_disks</varname> : Number of
|
||||
consumer this geom consumes</para></listitem>
|
||||
<listitem><para><varname>uint16_t n_disks</varname> : Number of
|
||||
consumer this geom consumes</para></listitem>
|
||||
|
||||
<listitem><para><varname>struct g_consumer **disks</varname> : Array
|
||||
of <varname>struct g_consumer*</varname>. (It is not possible
|
||||
to use just single indirection because struct g_consumer*
|
||||
are created on our behalf by GEOM).</para></listitem>
|
||||
<listitem><para><varname>struct g_consumer **disks</varname> : Array
|
||||
of <varname>struct g_consumer*</varname>. (It is not possible
|
||||
to use just single indirection because struct g_consumer*
|
||||
are created on our behalf by GEOM).</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>The <structname>softc</structname> structure contains all
|
||||
the state of geom instance. Every geom instance has its own
|
||||
softc.</para>
|
||||
the state of geom instance. Every geom instance has its own
|
||||
softc.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="geom-metadata">
|
||||
|
@ -428,15 +428,15 @@ KMOD=geom_journal
|
|||
|
||||
<itemizedlist>
|
||||
|
||||
<listitem><para>16 byte buffer for null-terminated signature
|
||||
(usually the class name)</para></listitem>
|
||||
<listitem><para>16 byte buffer for null-terminated signature
|
||||
(usually the class name)</para></listitem>
|
||||
|
||||
<listitem><para>uint32 version ID</para></listitem>
|
||||
<listitem><para>uint32 version ID</para></listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
<para>It is assumed that geom classes know how to handle metadata
|
||||
with version ID's lower than theirs.</para>
|
||||
with version ID's lower than theirs.</para>
|
||||
|
||||
<para>Metadata is located in the last sector of the provider
|
||||
(and thus must fit in it).</para>
|
||||
|
@ -455,15 +455,15 @@ KMOD=geom_journal
|
|||
<listitem><para>user calls &man.geom.8; utility (or one of its
|
||||
hardlinked friends)</para></listitem>
|
||||
|
||||
<listitem><para>the utility figures out which geom class it is
|
||||
supposed to handle and searches for
|
||||
<filename>geom_<replaceable>CLASSNAME</replaceable>.so</filename>
|
||||
library (usually in
|
||||
<filename>/lib/geom</filename>).</para></listitem>
|
||||
<listitem><para>the utility figures out which geom class it is
|
||||
supposed to handle and searches for
|
||||
<filename>geom_<replaceable>CLASSNAME</replaceable>.so</filename>
|
||||
library (usually in
|
||||
<filename>/lib/geom</filename>).</para></listitem>
|
||||
|
||||
<listitem><para>it &man.dlopen.3;-s the library, extracts the
|
||||
definitions of command-line parameters and helper
|
||||
functions.</para></listitem>
|
||||
<listitem><para>it &man.dlopen.3;-s the library, extracts the
|
||||
definitions of command-line parameters and helper
|
||||
functions.</para></listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
|
@ -473,23 +473,23 @@ KMOD=geom_journal
|
|||
<itemizedlist>
|
||||
|
||||
<listitem><para>&man.geom.8; looks in the command-line definition
|
||||
for the command (usually "label"), and calls a helper
|
||||
function.</para></listitem>
|
||||
for the command (usually "label"), and calls a helper
|
||||
function.</para></listitem>
|
||||
|
||||
<listitem><para>helper function checks parameters and gathers
|
||||
metadata, which it proceeds to write to all concerned
|
||||
providers.</para></listitem>
|
||||
<listitem><para>helper function checks parameters and gathers
|
||||
metadata, which it proceeds to write to all concerned
|
||||
providers.</para></listitem>
|
||||
|
||||
<listitem><para>this "spoils" existing geoms (if any) and
|
||||
initializes a new round of "tasting" of the providers. The
|
||||
intended geom class recognizes the metadata and brings the
|
||||
geom up.</para></listitem>
|
||||
<listitem><para>this "spoils" existing geoms (if any) and
|
||||
initializes a new round of "tasting" of the providers. The
|
||||
intended geom class recognizes the metadata and brings the
|
||||
geom up.</para></listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
<para>(The above sequence of events is implementation-dependent
|
||||
but all existing code works like that, and it is supported by
|
||||
libraries.)</para>
|
||||
but all existing code works like that, and it is supported by
|
||||
libraries.)</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
|
@ -497,9 +497,9 @@ KMOD=geom_journal
|
|||
<title>Geom command structure</title>
|
||||
|
||||
<para>The helper <filename>geom_CLASSNAME.so</filename> library
|
||||
exports <structname>class_commands</structname> structure,
|
||||
which is an array of <structname>struct g_command</structname>
|
||||
elements. Commands are of uniform format and look like:</para>
|
||||
exports <structname>class_commands</structname> structure,
|
||||
which is an array of <structname>struct g_command</structname>
|
||||
elements. Commands are of uniform format and look like:</para>
|
||||
|
||||
<programlisting> verb [-options] geomname [other]</programlisting>
|
||||
|
||||
|
@ -508,10 +508,10 @@ KMOD=geom_journal
|
|||
<itemizedlist>
|
||||
|
||||
<listitem><para>label — to write metadata to devices so they can be
|
||||
recognized at tasting and brought up in geoms</para></listitem>
|
||||
recognized at tasting and brought up in geoms</para></listitem>
|
||||
|
||||
<listitem><para>destroy — to destroy metadata, so the geoms get
|
||||
destroyed</para></listitem>
|
||||
<listitem><para>destroy — to destroy metadata, so the geoms get
|
||||
destroyed</para></listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
|
@ -519,26 +519,26 @@ KMOD=geom_journal
|
|||
|
||||
<itemizedlist>
|
||||
<listitem><para><literal>-v</literal> : be verbose</para></listitem>
|
||||
<listitem><para><literal>-f</literal> : force</para></listitem>
|
||||
<listitem><para><literal>-f</literal> : force</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>Many actions, such as labeling and destroying metadata can
|
||||
be performed in userland. For this, <structname>struct
|
||||
g_command</structname> provides field
|
||||
<varname>gc_func</varname> that can be set to a function (in
|
||||
the same <filename>.so</filename>) that will be called to
|
||||
process a verb. If <varname>gc_func</varname> is NULL, the
|
||||
command will be passed to kernel module, to
|
||||
<function>.ctlreq</function> function of the geom
|
||||
class.</para>
|
||||
be performed in userland. For this, <structname>struct
|
||||
g_command</structname> provides field
|
||||
<varname>gc_func</varname> that can be set to a function (in
|
||||
the same <filename>.so</filename>) that will be called to
|
||||
process a verb. If <varname>gc_func</varname> is NULL, the
|
||||
command will be passed to kernel module, to
|
||||
<function>.ctlreq</function> function of the geom
|
||||
class.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="geom-geoms">
|
||||
<title>Geoms</title>
|
||||
|
||||
<para>Geoms are instances of GEOM classes. They have internal
|
||||
data (a softc structure) and some functions with which they
|
||||
respond to external events.</para>
|
||||
data (a softc structure) and some functions with which they
|
||||
respond to external events.</para>
|
||||
|
||||
<para>The event functions are:</para>
|
||||
|
||||
|
@ -549,24 +549,24 @@ KMOD=geom_journal
|
|||
<listitem><para><function>.dumpconf</function> : returns
|
||||
XML-formatted information about the geom</para></listitem>
|
||||
|
||||
<listitem><para><function>.orphan</function> : called when some
|
||||
underlying provider gets disconnected</para></listitem>
|
||||
<listitem><para><function>.orphan</function> : called when some
|
||||
underlying provider gets disconnected</para></listitem>
|
||||
|
||||
<listitem><para><function>.spoiled</function> : called when some
|
||||
underlying provider gets written to</para></listitem>
|
||||
<listitem><para><function>.spoiled</function> : called when some
|
||||
underlying provider gets written to</para></listitem>
|
||||
|
||||
<listitem><para><function>.start</function> : handles I/O</para></listitem>
|
||||
<listitem><para><function>.start</function> : handles I/O</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>These functions are called from the <function>g_down</function>
|
||||
kernel thread and there can be no sleeping in this context,
|
||||
(see definition of sleeping elsewhere) which limits what can be done
|
||||
quite a bit, but forces the handling to be fast.</para>
|
||||
kernel thread and there can be no sleeping in this context,
|
||||
(see definition of sleeping elsewhere) which limits what can be done
|
||||
quite a bit, but forces the handling to be fast.</para>
|
||||
|
||||
<para>Of these, the most important function for doing actual
|
||||
useful work is the <function>.start</function>() function,
|
||||
which is called when a BIO request arrives for a provider
|
||||
managed by a instance of geom class.</para>
|
||||
useful work is the <function>.start</function>() function,
|
||||
which is called when a BIO request arrives for a provider
|
||||
managed by a instance of geom class.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="geom-threads">
|
||||
|
@ -576,143 +576,143 @@ KMOD=geom_journal
|
|||
framework:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para><literal>g_down</literal> : Handles requests coming
|
||||
from high-level entities (such as a userland request) on the
|
||||
way to physical devices</para></listitem>
|
||||
<listitem><para><literal>g_down</literal> : Handles requests coming
|
||||
from high-level entities (such as a userland request) on the
|
||||
way to physical devices</para></listitem>
|
||||
|
||||
<listitem><para><literal>g_up</literal> : Handles responses from
|
||||
device drivers to requests made by higher-level
|
||||
entities</para></listitem>
|
||||
<listitem><para><literal>g_up</literal> : Handles responses from
|
||||
device drivers to requests made by higher-level
|
||||
entities</para></listitem>
|
||||
|
||||
<listitem><para><literal>g_event</literal> : Handles all other
|
||||
cases: creation of geom instances, access counting, "spoil"
|
||||
events, etc.</para></listitem>
|
||||
<listitem><para><literal>g_event</literal> : Handles all other
|
||||
cases: creation of geom instances, access counting, "spoil"
|
||||
events, etc.</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>When a user process issues <quote>read data X at offset Y
|
||||
of a file</quote> request, this is what happens:</para>
|
||||
of a file</quote> request, this is what happens:</para>
|
||||
|
||||
<itemizedlist>
|
||||
|
||||
<listitem><para>The filesystem converts the request into a struct bio
|
||||
instance and passes it to the GEOM subsystem. It knows what geom
|
||||
instance should handle it because filesystems are hosted
|
||||
directly on a geom instance.</para></listitem>
|
||||
instance and passes it to the GEOM subsystem. It knows what geom
|
||||
instance should handle it because filesystems are hosted
|
||||
directly on a geom instance.</para></listitem>
|
||||
|
||||
<listitem><para>The request ends up as a call to the
|
||||
<function>.start</function>() function made on the g_down
|
||||
thread and reaches the top-level geom instance.</para></listitem>
|
||||
<listitem><para>The request ends up as a call to the
|
||||
<function>.start</function>() function made on the g_down
|
||||
thread and reaches the top-level geom instance.</para></listitem>
|
||||
|
||||
<listitem><para>This top-level geom instance (for example the
|
||||
partition slicer) determines that the request should be
|
||||
routed to a lower-level instance (for example the disk
|
||||
driver). It makes a copy of the bio request (bio requests
|
||||
<emphasis>ALWAYS</emphasis> need to be copied between
|
||||
instances, with <function>g_clone_bio</function>()!),
|
||||
modifies the data offset and target provider fields and
|
||||
executes the copy with
|
||||
<function>g_io_request</function>()</para></listitem>
|
||||
<listitem><para>This top-level geom instance (for example the
|
||||
partition slicer) determines that the request should be
|
||||
routed to a lower-level instance (for example the disk
|
||||
driver). It makes a copy of the bio request (bio requests
|
||||
<emphasis>ALWAYS</emphasis> need to be copied between
|
||||
instances, with <function>g_clone_bio</function>()!),
|
||||
modifies the data offset and target provider fields and
|
||||
executes the copy with
|
||||
<function>g_io_request</function>()</para></listitem>
|
||||
|
||||
<listitem><para>The disk driver gets the bio request also as a call
|
||||
to <function>.start</function>() on the
|
||||
<literal>g_down</literal> thread. It talks to hardware,
|
||||
gets the data back, and calls
|
||||
<function>g_io_deliver</function>() on the bio.</para></listitem>
|
||||
<listitem><para>The disk driver gets the bio request also as a call
|
||||
to <function>.start</function>() on the
|
||||
<literal>g_down</literal> thread. It talks to hardware,
|
||||
gets the data back, and calls
|
||||
<function>g_io_deliver</function>() on the bio.</para></listitem>
|
||||
|
||||
<listitem><para>Now, the notification of bio completion
|
||||
<quote>bubbles up</quote> in the <literal>g_up</literal>
|
||||
thread. First the partition slicer gets
|
||||
<function>.done</function>() called in the
|
||||
<literal>g_up</literal> thread, it uses information stored
|
||||
in the bio to free the cloned <structname>bio</structname>
|
||||
structure (with <function>g_destroy_bio</function>()) and
|
||||
calls <function>g_io_deliver</function>() on the original
|
||||
request.</para></listitem>
|
||||
<listitem><para>Now, the notification of bio completion
|
||||
<quote>bubbles up</quote> in the <literal>g_up</literal>
|
||||
thread. First the partition slicer gets
|
||||
<function>.done</function>() called in the
|
||||
<literal>g_up</literal> thread, it uses information stored
|
||||
in the bio to free the cloned <structname>bio</structname>
|
||||
structure (with <function>g_destroy_bio</function>()) and
|
||||
calls <function>g_io_deliver</function>() on the original
|
||||
request.</para></listitem>
|
||||
|
||||
<listitem><para>The filesystem gets the data and transfers it to
|
||||
userland.</para></listitem>
|
||||
<listitem><para>The filesystem gets the data and transfers it to
|
||||
userland.</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>See &man.g.bio.9; man page for information how the data is
|
||||
passed back and forth in the <structname>bio</structname>
|
||||
structure (note in particular the <varname>bio_parent</varname>
|
||||
and <varname>bio_children</varname> fields and how they are
|
||||
handled).</para>
|
||||
passed back and forth in the <structname>bio</structname>
|
||||
structure (note in particular the <varname>bio_parent</varname>
|
||||
and <varname>bio_children</varname> fields and how they are
|
||||
handled).</para>
|
||||
|
||||
<para>One important feature is: <emphasis>THERE CAN BE NO SLEEPING IN G_UP
|
||||
AND G_DOWN THREADS</emphasis>. This means that none of the following
|
||||
things can be done in those threads (the list is of course not
|
||||
complete, but only informative):</para>
|
||||
AND G_DOWN THREADS</emphasis>. This means that none of the following
|
||||
things can be done in those threads (the list is of course not
|
||||
complete, but only informative):</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para>Calls to <function>msleep</function>() and
|
||||
<function>tsleep</function>(), obviously.</para></listitem>
|
||||
<listitem><para>Calls to <function>msleep</function>() and
|
||||
<function>tsleep</function>(), obviously.</para></listitem>
|
||||
|
||||
<listitem><para>Calls to <function>g_write_data</function>() and
|
||||
<function>g_read_data</function>(), because these sleep
|
||||
between passing the data to consumers and
|
||||
returning.</para></listitem>
|
||||
<listitem><para>Calls to <function>g_write_data</function>() and
|
||||
<function>g_read_data</function>(), because these sleep
|
||||
between passing the data to consumers and
|
||||
returning.</para></listitem>
|
||||
|
||||
<listitem><para>Waiting for I/O.</para></listitem>
|
||||
<listitem><para>Waiting for I/O.</para></listitem>
|
||||
|
||||
<listitem><para>Calls to &man.malloc.9; and
|
||||
<function>uma_zalloc</function>() with
|
||||
<varname>M_WAITOK</varname> flag set</para></listitem>
|
||||
<listitem><para>Calls to &man.malloc.9; and
|
||||
<function>uma_zalloc</function>() with
|
||||
<varname>M_WAITOK</varname> flag set</para></listitem>
|
||||
|
||||
<listitem><para>sx and other sleepable locks</para></listitem>
|
||||
<listitem><para>sx and other sleepable locks</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>This restriction is here to stop GEOM code clogging the I/O
|
||||
request path, since sleeping is usually not
|
||||
time-bound and there can be no guarantees on how long will it
|
||||
take (there are some other, more technical reasons also). It
|
||||
also means that there is not much that can be done in those
|
||||
threads; for example, almost any complex thing requires memory
|
||||
allocation. Fortunately, there is a way out: creating
|
||||
additional kernel threads.</para>
|
||||
request path, since sleeping is usually not
|
||||
time-bound and there can be no guarantees on how long will it
|
||||
take (there are some other, more technical reasons also). It
|
||||
also means that there is not much that can be done in those
|
||||
threads; for example, almost any complex thing requires memory
|
||||
allocation. Fortunately, there is a way out: creating
|
||||
additional kernel threads.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="geom-kernelthreads">
|
||||
<title>Kernel threads for use in geom code</title>
|
||||
|
||||
<para>Kernel threads are created with &man.kthread.create.9;
|
||||
function, and they are sort of similar to userland threads in
|
||||
behaviour, only they cannot return to caller to signify
|
||||
termination, but must call &man.kthread.exit.9;.</para>
|
||||
function, and they are sort of similar to userland threads in
|
||||
behaviour, only they cannot return to caller to signify
|
||||
termination, but must call &man.kthread.exit.9;.</para>
|
||||
|
||||
<para>In GEOM code, the usual use of threads is to offload
|
||||
processing of requests from <literal>g_down</literal> thread
|
||||
(the <function>.start</function>() function). These threads
|
||||
look like <quote>event handlers</quote>: they have a linked
|
||||
list of event associated with them (on which events can be posted
|
||||
by various functions in various threads so it must be
|
||||
protected by a mutex), take the events from the list one by
|
||||
one and process them in a big <literal>switch</literal>()
|
||||
statement.</para>
|
||||
processing of requests from <literal>g_down</literal> thread
|
||||
(the <function>.start</function>() function). These threads
|
||||
look like <quote>event handlers</quote>: they have a linked
|
||||
list of event associated with them (on which events can be posted
|
||||
by various functions in various threads so it must be
|
||||
protected by a mutex), take the events from the list one by
|
||||
one and process them in a big <literal>switch</literal>()
|
||||
statement.</para>
|
||||
|
||||
<para>The main benefit of using a thread to handle I/O requests
|
||||
is that it can sleep when needed. Now, this sounds good, but
|
||||
should be carefully thought out. Sleeping is well and very
|
||||
convenient but can very effectively destroy performance of the
|
||||
geom transformation. Extremely performance-sensitive classes
|
||||
probably should do all the work in
|
||||
<function>.start</function>() function call, taking great care
|
||||
to handle out-of-memory and similar errors.</para>
|
||||
is that it can sleep when needed. Now, this sounds good, but
|
||||
should be carefully thought out. Sleeping is well and very
|
||||
convenient but can very effectively destroy performance of the
|
||||
geom transformation. Extremely performance-sensitive classes
|
||||
probably should do all the work in
|
||||
<function>.start</function>() function call, taking great care
|
||||
to handle out-of-memory and similar errors.</para>
|
||||
|
||||
<para>The other benefit of having a event-handler thread like
|
||||
that is to serialize all the requests and responses coming
|
||||
from different geom threads into one thread. This is also very
|
||||
convenient but can be slow. In most cases, handling of
|
||||
<function>.done</function>() requests can be left to the
|
||||
<literal>g_up</literal> thread.</para>
|
||||
that is to serialize all the requests and responses coming
|
||||
from different geom threads into one thread. This is also very
|
||||
convenient but can be slow. In most cases, handling of
|
||||
<function>.done</function>() requests can be left to the
|
||||
<literal>g_up</literal> thread.</para>
|
||||
|
||||
<para>Mutexes in FreeBSD kernel (see &man.mutex.9;) have
|
||||
one distinction from their more common userland cousins — the
|
||||
code cannot sleep while holding
|
||||
a mutex). If the code needs to sleep a lot, &man.sx.9; locks
|
||||
may be more appropriate. On the other hand, if you do almost
|
||||
everything in a single thread, you may get away with no
|
||||
mutexes at all.</para>
|
||||
one distinction from their more common userland cousins — the
|
||||
code cannot sleep while holding
|
||||
a mutex). If the code needs to sleep a lot, &man.sx.9; locks
|
||||
may be more appropriate. On the other hand, if you do almost
|
||||
everything in a single thread, you may get away with no
|
||||
mutexes at all.</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
|
|
Loading…
Reference in a new issue