diff --git a/en_US.ISO8859-1/articles/linux-emulation/article.xml b/en_US.ISO8859-1/articles/linux-emulation/article.xml
index a4d9c13fda..96a38a865b 100644
--- a/en_US.ISO8859-1/articles/linux-emulation/article.xml
+++ b/en_US.ISO8859-1/articles/linux-emulation/article.xml
@@ -3,13 +3,23 @@
 	"http://www.FreeBSD.org/XML/share/xml/freebsd50.dtd">
 <!-- $FreeBSD$ -->
 <!-- The FreeBSD Documentation Project -->
-<article xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en">
-  <info><title>&linux; emulation in &os;</title>
-    
+<article xmlns="http://docbook.org/ns/docbook"
+  xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"
+  xml:lang="en">
+  <info>
+    <title>&linux; emulation in &os;</title>
 
-    <author><personname><firstname>Roman</firstname><surname>Divacky</surname></personname><affiliation>
-	<address><email>rdivacky@FreeBSD.org</email></address>
-      </affiliation></author>
+    <author>
+      <personname>
+	<firstname>Roman</firstname>
+	<surname>Divacky</surname>
+      </personname>
+      <affiliation>
+	<address>
+	  <email>rdivacky@FreeBSD.org</email>
+	</address>
+      </affiliation>
+    </author>
 
     <legalnotice xml:id="trademarks" role="trademarks">
       &tm-attrib.adobe;
@@ -28,151 +38,165 @@
     <releaseinfo>$FreeBSD$</releaseinfo>
 
     <abstract>
-      <para>This masters thesis deals with updating the &linux; emulation layer
-	(the so called <firstterm>Linuxulator</firstterm>).  The task was to update the layer to match
-	the functionality of &linux; 2.6. As a reference implementation, the
-	&linux; 2.6.16 kernel was chosen.  The concept is loosely based on
-	the NetBSD implementation.  Most of the work was done in the summer
-	of 2006 as a part of the Google Summer of Code students program.
-	The focus was on bringing the <firstterm>NPTL</firstterm> (new &posix;
-	thread library) support	into the emulation layer, including
-	<firstterm>TLS</firstterm> (thread local storage),
+      <para>This masters thesis deals with updating the &linux;
+	emulation layer (the so called
+	<firstterm>Linuxulator</firstterm>).  The task was to update
+	the layer to match the functionality of &linux; 2.6. As a
+	reference implementation, the &linux; 2.6.16 kernel was
+	chosen.  The concept is loosely based on the NetBSD
+	implementation.  Most of the work was done in the summer of
+	2006 as a part of the Google Summer of Code students program.
+	The focus was on bringing the <firstterm>NPTL</firstterm> (new
+	&posix; thread library) support into the emulation layer,
+	including <firstterm>TLS</firstterm> (thread local storage),
 	<firstterm>futexes</firstterm> (fast user space mutexes),
 	<firstterm>PID mangling</firstterm>, and some other minor
 	things.  Many small problems were identified and fixed in the
 	process.  My work was integrated into the main &os; source
-	repository and will be shipped in the upcoming 7.0R release.  We,
-	the emulation development team, are working on making the
-	&linux; 2.6 emulation the default emulation layer in &os;.</para>
+	repository and will be shipped in the upcoming 7.0R release.
+	We, the emulation development team, are working on making the
+	&linux; 2.6 emulation the default emulation layer in
+	&os;.</para>
     </abstract>
   </info>
 
   <sect1 xml:id="intro">
     <title>Introduction</title>
 
-    <para>In the last few years the open source &unix; based operating systems
-      started to be widely deployed on server and client machines.  Among
-      these operating systems I would like to point out two: &os;, for its BSD
-      heritage, time proven code base and many interesting features and
-      &linux; for its wide user base, enthusiastic open developer community
-      and support from large companies.  &os; tends to be used on server
-      class machines serving heavy duty networking tasks with less usage on
-      desktop class machines for ordinary users.  While &linux; has the same
-      usage on servers, but it is used much more by home based users.  This
-      leads to a situation where there are many binary only programs available
-      for &linux; that lack support for &os;.</para>
+    <para>In the last few years the open source &unix; based operating
+      systems started to be widely deployed on server and client
+      machines.  Among these operating systems I would like to point
+      out two: &os;, for its BSD heritage, time proven code base and
+      many interesting features and &linux; for its wide user base,
+      enthusiastic open developer community and support from large
+      companies.  &os; tends to be used on server class machines
+      serving heavy duty networking tasks with less usage on desktop
+      class machines for ordinary users.  While &linux; has the same
+      usage on servers, but it is used much more by home based users.
+      This leads to a situation where there are many binary only
+      programs available for &linux; that lack support for
+      &os;.</para>
 
-    <para>Naturally, a need for the ability to run &linux; binaries on a &os;
-      system arises and this is what this thesis deals with: the emulation of
-      the &linux; kernel in the &os; operating system.</para>
+    <para>Naturally, a need for the ability to run &linux; binaries on
+      a &os; system arises and this is what this thesis deals with:
+      the emulation of the &linux; kernel in the &os; operating
+      system.</para>
 
-    <para>During the Summer of 2006 Google Inc. sponsored a project which
-      focused on extending the &linux; emulation layer (the so called Linuxulator)
-      in &os; to include &linux; 2.6 facilities.  This thesis is written as a
-      part of this project.</para>
+    <para>During the Summer of 2006 Google Inc. sponsored a project
+      which focused on extending the &linux; emulation layer (the so
+      called Linuxulator) in &os; to include &linux; 2.6 facilities.
+      This thesis is written as a part of this project.</para>
   </sect1>
 
   <sect1 xml:id="inside">
     <title>A look inside&hellip;</title>
 
-    <para>In this section we are going to describe every operating system in
-      question.  How they deal with syscalls, trapframes etc., all the low-level
-      stuff. We also describe the way they understand common &unix;
-      primitives like what a PID is, what a thread is, etc.  In the third
-      subsection we talk about how &unix; on &unix; emulation could be done
-      in general.</para>
+    <para>In this section we are going to describe every operating
+      system in question.  How they deal with syscalls, trapframes
+      etc., all the low-level stuff.  We also describe the way they
+      understand common &unix; primitives like what a PID is, what a
+      thread is, etc.  In the third subsection we talk about how
+      &unix; on &unix; emulation could be done in general.</para>
 
     <sect2 xml:id="what-is-unix">
       <title>What is &unix;</title>
 
       <para>&unix; is an operating system with a long history that has
-	influenced almost every other operating system currently in use.
-	Starting in the 1960s, its development continues to this day (although
-	in different projects).  &unix; development soon forked into two main
-	ways: the BSDs and System III/V families.  They mutually influenced
-	themselves by growing a common &unix; standard.  Among the
-	contributions originated in BSD we can name virtual memory, TCP/IP
-	networking, FFS, and many others.  The System V branch contributed to
-	SysV interprocess communication primitives, copy-on-write, etc. &unix;
-	itself does not exist any more but its ideas have been used by many
-	other operating systems world wide thus forming the so called &unix;-like
-	operating systems.  These days the most influential ones are &linux;,
-	Solaris, and possibly (to some extent) &os;.  There are in-company
-	&unix; derivatives (AIX, HP-UX etc.), but these have been more and
-	more migrated to the aforementioned systems.  Let us summarize typical
-	&unix; characteristics.</para>
+	influenced almost every other operating system currently in
+	use.  Starting in the 1960s, its development continues to this
+	day (although in different projects).  &unix; development soon
+	forked into two main ways: the BSDs and System III/V families.
+	They mutually influenced themselves by growing a common &unix;
+	standard.  Among the contributions originated in BSD we can
+	name virtual memory, TCP/IP networking, FFS, and many others.
+	The System V branch contributed to SysV interprocess
+	communication primitives, copy-on-write, etc. &unix; itself
+	does not exist any more but its ideas have been used by many
+	other operating systems world wide thus forming the so called
+	&unix;-like operating systems.  These days the most
+	influential ones are &linux;, Solaris, and possibly (to some
+	extent) &os;.  There are in-company &unix; derivatives (AIX,
+	HP-UX etc.), but these have been more and more migrated to the
+	aforementioned systems.  Let us summarize typical &unix;
+	characteristics.</para>
     </sect2>
 
     <sect2 xml:id="tech-details">
       <title>Technical details</title>
 
-      <para>Every running program constitutes a process that represents a state
-	of the computation.  Running process is divided between kernel-space
-	and user-space.  Some operations can be done only from kernel space
-	(dealing with hardware etc.), but the process should spend most of its
-	lifetime in the user space.  The kernel is where the management of the
-	processes, hardware, and low-level details take place.  The kernel
-	provides a standard unified &unix; API to the user space.  The most
-	important ones are covered below.</para>
+      <para>Every running program constitutes a process that
+	represents a state of the computation.  Running process is
+	divided between kernel-space and user-space.  Some operations
+	can be done only from kernel space (dealing with hardware
+	etc.), but the process should spend most of its lifetime in
+	the user space.  The kernel is where the management of the
+	processes, hardware, and low-level details take place.  The
+	kernel provides a standard unified &unix; API to the user
+	space.  The most important ones are covered below.</para>
 
       <sect3 xml:id="kern-proc-comm">
-	<title>Communication between kernel and user space process</title>
+	<title>Communication between kernel and user space
+	  process</title>
 
-	<para>Common &unix; API defines a syscall as a way to issue commands
-	  from a user space process to the kernel.  The most common
-	  implementation is either by using an interrupt or specialized
-	  instruction (think of
-	  <literal>SYSENTER</literal>/<literal>SYSCALL</literal> instructions
-	  for ia32).  Syscalls are defined by a number.  For example in &os;,
-	  the syscall number&nbsp;85 is the &man.swapon.2; syscall and the
-	  syscall number&nbsp;132 is &man.mkfifo.2;.  Some syscalls need
-	  parameters, which are passed from the user-space to the kernel-space
-	  in various ways (implementation dependant).  Syscalls are
+	<para>Common &unix; API defines a syscall as a way to issue
+	  commands from a user space process to the kernel.  The most
+	  common implementation is either by using an interrupt or
+	  specialized instruction (think of
+	  <literal>SYSENTER</literal>/<literal>SYSCALL</literal>
+	  instructions for ia32).  Syscalls are defined by a number.
+	  For example in &os;, the syscall number&nbsp;85 is the
+	  &man.swapon.2; syscall and the syscall number&nbsp;132 is
+	  &man.mkfifo.2;.  Some syscalls need parameters, which are
+	  passed from the user-space to the kernel-space in various
+	  ways (implementation dependant).  Syscalls are
 	  synchronous.</para>
 
 	<para>Another possible way to communicate is by using a
-	  <firstterm>trap</firstterm>.  Traps occur asynchronously after
-	  some event occurs (division by zero, page fault etc.).  A trap
-	  can be transparent for a process (page fault) or can result in
-	  a reaction like sending a <firstterm>signal</firstterm>
-	  (division by zero).</para>
+	  <firstterm>trap</firstterm>.  Traps occur asynchronously
+	  after some event occurs (division by zero, page fault etc.).
+	  A trap can be transparent for a process (page fault) or can
+	  result in a reaction like sending a
+	  <firstterm>signal</firstterm> (division by zero).</para>
       </sect3>
 
       <sect3 xml:id="proc-proc-comm">
 	<title>Communication between processes</title>
 
-	<para>There are other APIs (System V IPC, shared memory etc.) but the
-	  single most important API is signal.  Signals are sent by processes
-	  or by the kernel and received by processes.  Some signals
-	  can be ignored or handled by a user supplied routine, some result
-	  in a predefined action that cannot be altered or ignored.</para>
+	<para>There are other APIs (System V IPC, shared memory etc.)
+	  but the single most important API is signal.  Signals are
+	  sent by processes or by the kernel and received by
+	  processes.  Some signals can be ignored or handled by a user
+	  supplied routine, some result in a predefined action that
+	  cannot be altered or ignored.</para>
       </sect3>
 
       <sect3 xml:id="proc-mgmt">
 	<title>Process management</title>
 
-	<para>Kernel instances are processed first in the system (so called
-	  init).  Every running process can create its identical copy using
-	  the &man.fork.2; syscall.  Some slightly modified versions of this
-	  syscall were introduced but the basic semantic is the same.  Every
-	  running process can morph into some other process using the
-	  &man.exec.3; syscall.  Some modifications of this syscall were
-	  introduced but all serve the same basic purpose.  Processes end
-	  their lives by calling the &man.exit.2; syscall.  Every process is
-	  identified by a unique number called PID.  Every process has a
-	  defined parent (identified by its PID).</para>
+	<para>Kernel instances are processed first in the system (so
+	  called init).  Every running process can create its
+	  identical copy using the &man.fork.2; syscall.  Some
+	  slightly modified versions of this syscall were introduced
+	  but the basic semantic is the same.  Every running process
+	  can morph into some other process using the &man.exec.3;
+	  syscall.  Some modifications of this syscall were introduced
+	  but all serve the same basic purpose.  Processes end their
+	  lives by calling the &man.exit.2; syscall.  Every process is
+	  identified by a unique number called PID.  Every process has
+	  a defined parent (identified by its PID).</para>
       </sect3>
 
       <sect3 xml:id="thread-mgmt">
 	<title>Thread management</title>
 
-	<para>Traditional &unix; does not define any API nor implementation
-	  for threading, while  &posix; defines its threading API but the
-	  implementation is undefined.  Traditionally there were two ways of
-	  implementing threads.  Handling them as separate processes (1:1
-	  threading) or envelope the whole thread group in one process and
-	  managing the threading in userspace (1:N threading).  Comparing
-	  main features of each approach:</para>
+	<para>Traditional &unix; does not define any API nor
+	  implementation for threading, while  &posix; defines its
+	  threading API but the implementation is undefined.
+	  Traditionally there were two ways of implementing threads.
+	  Handling them as separate processes (1:1 threading) or
+	  envelope the whole thread group in one process and managing
+	  the threading in userspace (1:N threading).  Comparing main
+	  features of each approach:</para>
 
 	<para>1:1 threading</para>
 
@@ -199,10 +223,11 @@
 	    <para>+ lightweight threads</para>
 	  </listitem>
 	  <listitem>
-	    <para>+ scheduling can be easily altered by the user</para>
+	    <para>+ scheduling can be easily altered by the
+	      user</para>
 	  </listitem>
 	  <listitem>
-	    <para>- syscalls must be wrapped </para>
+	    <para>- syscalls must be wrapped</para>
 	  </listitem>
 	  <listitem>
 	    <para>- cannot utilize more than one CPU</para>
@@ -214,24 +239,26 @@
     <sect2 xml:id="what-is-freebsd">
       <title>What is &os;?</title>
 
-      <para>The &os; project is one of the oldest open source operating
-	systems currently available for daily use.  It is a direct descendant
-	of the genuine &unix; so it could be claimed that it is a true &unix;
-	although licensing issues do not permit that.  The start of the project
-	dates back to the early 1990's when a crew of fellow BSD users patched
-	the 386BSD operating system.  Based on this patchkit a new operating
-	system arose named &os; for its liberal license.  Another group created
-	the NetBSD operating system with different goals in mind.  We will
-	focus on &os;.</para>
+      <para>The &os; project is one of the oldest open source
+	operating systems currently available for daily use.  It is a
+	direct descendant of the genuine &unix; so it could be claimed
+	that it is a true &unix; although licensing issues do not
+	permit that.  The start of the project dates back to the early
+	1990's when a crew of fellow BSD users patched the 386BSD
+	operating system.  Based on this patchkit a new operating
+	system arose named &os; for its liberal license.  Another
+	group created the NetBSD operating system with different goals
+	in mind.  We will focus on &os;.</para>
 
-      <para>&os; is a modern &unix;-based operating system with all the
-	features of &unix;.  Preemptive multitasking, multiuser facilities,
-	TCP/IP networking, memory protection, symmetric multiprocessing
-	support, virtual memory with merged VM and buffer cache, they are all
-	there.  One of the interesting and extremely useful features is the
-	ability to emulate other &unix;-like operating systems.  As of
-	December&nbsp;2006 and 7-CURRENT development, the following
-	emulation functionalities are supported:</para>
+      <para>&os; is a modern &unix;-based operating system with all
+	the features of &unix;.  Preemptive multitasking, multiuser
+	facilities, TCP/IP networking, memory protection, symmetric
+	multiprocessing support, virtual memory with merged VM and
+	buffer cache, they are all there.  One of the interesting and
+	extremely useful features is the ability to emulate other
+	&unix;-like operating systems.  As of December&nbsp;2006 and
+	7-CURRENT development, the following emulation functionalities
+	are supported:</para>
 
       <itemizedlist>
 	<listitem>
@@ -241,10 +268,12 @@
 	  <para>&os;/i386 emulation on &os;/ia64</para>
 	</listitem>
 	<listitem>
-	  <para>&linux;-emulation of &linux; operating system on &os;</para>
+	  <para>&linux;-emulation of &linux; operating system on
+	    &os;</para>
 	</listitem>
 	<listitem>
-	  <para>NDIS-emulation of Windows networking drivers interface</para>
+	  <para>NDIS-emulation of Windows networking drivers
+	    interface</para>
 	</listitem>
 	<listitem>
 	  <para>NetBSD-emulation of NetBSD operating system</para>
@@ -257,62 +286,70 @@
 	</listitem>
       </itemizedlist>
 
-      <para>Actively developed emulations are the &linux; layer and various
-	&os;-on-&os; layers.  Others are not supposed to work properly nor
-	be usable these days.</para>
+      <para>Actively developed emulations are the &linux; layer and
+	various &os;-on-&os; layers.  Others are not supposed to work
+	properly nor be usable these days.</para>
 
       <sect3 xml:id="freebsd-tech-details">
 	<title>Technical details</title>
 
-	<para>&os; is traditional flavor of &unix; in the sense of dividing the
-	  run of processes into two halves: kernel space and user space run.
-	  There are two types of process entry to the kernel: a syscall and a
-	  trap.  There is only one way to return.  In the subsequent sections
-	  we will describe the three gates to/from the kernel.  The whole
-	  description applies to the i386 architecture as the Linuxulator
-	  only exists there but the concept is similar on other architectures.
-	  The information was taken from [1] and the source code.</para>
+	<para>&os; is traditional flavor of &unix; in the sense of
+	  dividing the run of processes into two halves: kernel space
+	  and user space run.  There are two types of process entry to
+	  the kernel: a syscall and a trap.  There is only one way to
+	  return.  In the subsequent sections we will describe the
+	  three gates to/from the kernel.  The whole description
+	  applies to the i386 architecture as the Linuxulator only
+	  exists there but the concept is similar on other
+	  architectures.  The information was taken from [1] and the
+	  source code.</para>
 
 	<sect4 xml:id="freebsd-sys-entries">
 	  <title>System entries</title>
 
-	  <para>&os; has an abstraction called an execution class loader,
-	    which is a wedge into the &man.execve.2; syscall.  This employs a
-	    structure <literal>sysentvec</literal>, which describes an
-	    executable ABI.  It contains things like errno translation table,
-	    signal translation table, various functions to serve syscall needs
-	    (stack fixup, coredumping, etc.).  Every ABI the &os; kernel wants
-	    to support must define this structure, as it is used later in the
-	    syscall processing code and at some other places.  System entries
-	    are handled by trap handlers, where we can access both the
-	    kernel-space and the user-space at once.</para>
+	  <para>&os; has an abstraction called an execution class
+	    loader, which is a wedge into the &man.execve.2; syscall.
+	    This employs a structure <literal>sysentvec</literal>,
+	    which describes an executable ABI.  It contains things
+	    like errno translation table, signal translation table,
+	    various functions to serve syscall needs (stack fixup,
+	    coredumping, etc.).  Every ABI the &os; kernel wants to
+	    support must define this structure, as it is used later in
+	    the syscall processing code and at some other places.
+	    System entries are handled by trap handlers, where we can
+	    access both the kernel-space and the user-space at
+	    once.</para>
 	</sect4>
 
 	<sect4 xml:id="freebsd-syscalls">
 	  <title>Syscalls</title>
 
 	  <para>Syscalls on &os; are issued by executing interrupt
-	    <literal>0x80</literal> with register <varname>%eax</varname> set
-	    to a desired syscall number with arguments passed on the stack.</para>
+	    <literal>0x80</literal> with register
+	    <varname>%eax</varname> set to a desired syscall number
+	    with arguments passed on the stack.</para>
 
-	  <para>When a process issues an interrupt <literal>0x80</literal>, the
-	    <literal>int0x80</literal> syscall trap handler is issued (defined
-	    in <filename>sys/i386/i386/exception.s</filename>), which prepares
-	    arguments (i.e. copies them on to the stack) for a
-	    call to a C function &man.syscall.2; (defined in
-	    <filename>sys/i386/i386/trap.c</filename>), which processes the
-	    passed in trapframe.  The processing consists of preparing the
-	    syscall (depending on the <literal>sysvec</literal> entry),
-	    determining if the syscall is 32-bit or 64-bit one (changes size
-	    of the parameters), then the parameters are copied, including the
-	    syscall.  Next, the actual syscall function is executed with
-	    processing of the return code (special cases for
-	    <literal>ERESTART</literal> and <literal>EJUSTRETURN</literal>
-	    errors).  Finally an <literal>userret()</literal> is scheduled,
-	    switching the process back to the users-pace.  The parameters to
-	    the actual syscall handler are passed in the form of
-	    <literal>struct thread *td</literal>,
-	    <literal>struct syscall args *</literal> arguments where the second
+	  <para>When a process issues an interrupt
+	    <literal>0x80</literal>, the <literal>int0x80</literal>
+	    syscall trap handler is issued (defined in
+	    <filename>sys/i386/i386/exception.s</filename>), which
+	    prepares arguments (i.e. copies them on to the stack) for
+	    a call to a C function &man.syscall.2; (defined in
+	    <filename>sys/i386/i386/trap.c</filename>), which
+	    processes the passed in trapframe.  The processing
+	    consists of preparing the syscall (depending on the
+	    <literal>sysvec</literal> entry), determining if the
+	    syscall is 32-bit or 64-bit one (changes size of the
+	    parameters), then the parameters are copied, including the
+	    syscall.  Next, the actual syscall function is executed
+	    with processing of the return code (special cases for
+	    <literal>ERESTART</literal> and
+	    <literal>EJUSTRETURN</literal> errors).  Finally an
+	    <literal>userret()</literal> is scheduled, switching the
+	    process back to the users-pace.  The parameters to the
+	    actual syscall handler are passed in the form of
+	    <literal>struct thread *td</literal>, <literal>struct
+	      syscall args *</literal> arguments where the second
 	    parameter is a pointer to the copied in structure of
 	    parameters.</para>
 	</sect4>
@@ -320,68 +357,76 @@
 	<sect4 xml:id="freebsd-traps">
 	  <title>Traps</title>
 
-	  <para>Handling of traps in &os; is similar to the handling of
-	    syscalls.  Whenever a trap occurs, an assembler handler is called.
-	    It is chosen between alltraps, alltraps with regs pushed or
-	    calltrap depending on the type of the trap.  This handler prepares
-	    arguments for a call to a C function <literal>trap()</literal>
-	    (defined in <filename>sys/i386/i386/trap.c</filename>), which then
-	    processes the occurred trap.  After the processing it might send a
-	    signal to the process and/or exit to userland using
-	    <literal>userret()</literal>.</para>
+	  <para>Handling of traps in &os; is similar to the handling
+	    of syscalls.  Whenever a trap occurs, an assembler handler
+	    is called.  It is chosen between alltraps, alltraps with
+	    regs pushed or calltrap depending on the type of the trap.
+	    This handler prepares arguments for a call to a C function
+	    <literal>trap()</literal> (defined in
+	    <filename>sys/i386/i386/trap.c</filename>), which then
+	    processes the occurred trap.  After the processing it
+	    might send a signal to the process and/or exit to userland
+	    using <literal>userret()</literal>.</para>
 	</sect4>
 
 	<sect4 xml:id="freebsd-exits">
 	  <title>Exits</title>
 
-	  <para>Exits from kernel to userspace happen using the assembler
-	    routine <literal>doreti</literal> regardless of whether the kernel
-	    was entered via a trap or via a syscall.  This restores the program
-	    status from the stack and returns to the userspace.</para>
+	  <para>Exits from kernel to userspace happen using the
+	    assembler routine <literal>doreti</literal> regardless of
+	    whether the kernel was entered via a trap or via a
+	    syscall.  This restores the program status from the stack
+	    and returns to the userspace.</para>
 	</sect4>
 
 	<sect4 xml:id="freebsd-unix-primitives">
 	  <title>&unix; primitives</title>
 
-	  <para>&os; operating system adheres to the traditional &unix; scheme,
-	    where every process has a unique identification number, the so
-	    called <firstterm>PID</firstterm> (Process ID).  PID numbers are
+	  <para>&os; operating system adheres to the traditional
+	    &unix; scheme, where every process has a unique
+	    identification number, the so called
+	    <firstterm>PID</firstterm> (Process ID).  PID numbers are
 	    allocated either linearly or randomly ranging from
-	    <literal>0</literal> to <literal>PID_MAX</literal>.  The allocation
-	    of PID numbers is done using linear searching of PID space.  Every
-	    thread in a process receives the same PID number as result of the
-	    &man.getpid.2; call.</para>
+	    <literal>0</literal> to <literal>PID_MAX</literal>.  The
+	    allocation of PID numbers is done using linear searching
+	    of PID space.  Every thread in a process receives the same
+	    PID number as result of the &man.getpid.2; call.</para>
 
-	  <para>There are currently two ways to implement threading in &os;.
-	    The first way is M:N threading followed by the 1:1 threading model.
-	    The default library used is M:N threading
-	    (<literal>libpthread</literal>) and you can switch at runtime to
-	    1:1 threading (<literal>libthr</literal>).  The plan is to switch
-	    to 1:1 library by default soon.  Although those two libraries use
-	    the same kernel primitives, they are accessed through different
-	    API(es).  The M:N library uses the <literal>kse_*</literal> family
-	    of syscalls while the 1:1 library uses the <literal>thr_*</literal>
-	    family of syscalls.  Because of this, there is no general concept
-	    of thread ID shared between kernel and userspace.  Of course, both
-	    threading libraries implement the pthread thread ID API.  Every
-	    kernel thread (as described by <literal>struct thread</literal>)
-	    has td tid identifier but this is not directly accessible
-	    from userland and solely serves the kernel's needs.  It is also
-	    used for 1:1 threading library as pthread's thread ID but handling
-	    of this is internal to the library and cannot be relied on.</para>
+	  <para>There are currently two ways to implement threading in
+	    &os;.  The first way is M:N threading followed by the 1:1
+	    threading model.  The default library used is M:N
+	    threading (<literal>libpthread</literal>) and you can
+	    switch at runtime to 1:1 threading
+	    (<literal>libthr</literal>).  The plan is to switch to 1:1
+	    library by default soon.  Although those two libraries use
+	    the same kernel primitives, they are accessed through
+	    different API(es).  The M:N library uses the
+	    <literal>kse_*</literal> family of syscalls while the 1:1
+	    library uses the <literal>thr_*</literal> family of
+	    syscalls.  Because of this, there is no general concept of
+	    thread ID shared between kernel and userspace.  Of course,
+	    both threading libraries implement the pthread thread ID
+	    API.  Every kernel thread (as described by <literal>struct
+	      thread</literal>) has td tid identifier but this is not
+	    directly accessible from userland and solely serves the
+	    kernel's needs.  It is also used for 1:1 threading library
+	    as pthread's thread ID but handling of this is internal to
+	    the library and cannot be relied on.</para>
 
-	  <para>As stated previously there are two implementations of threading
-	    in &os;.  The M:N library divides the work between kernel space and
-	    userspace.  Thread is an entity that gets scheduled in the kernel
-	    but it can represent various number of userspace threads.
-	    M userspace threads get mapped to N kernel threads thus saving
-	    resources while keeping the ability to exploit multiprocessor
-	    parallelism.  Further information about the implementation can be
-	    obtained from the man page or [1].  The 1:1 library directly maps a
-	    userland thread to a kernel thread thus greatly simplifying the
-	    scheme.  None of these designs implement a fairness mechanism (such
-	    a mechanism was implemented but it was removed recently because it
-	    caused serious slowdown and made the code more difficult to deal
+	  <para>As stated previously there are two implementations of
+	    threading in &os;.  The M:N library divides the work
+	    between kernel space and userspace.  Thread is an entity
+	    that gets scheduled in the kernel but it can represent
+	    various number of userspace threads.  M userspace threads
+	    get mapped to N kernel threads thus saving resources while
+	    keeping the ability to exploit multiprocessor parallelism.
+	    Further information about the implementation can be
+	    obtained from the man page or [1].  The 1:1 library
+	    directly maps a userland thread to a kernel thread thus
+	    greatly simplifying the scheme.  None of these designs
+	    implement a fairness mechanism (such a mechanism was
+	    implemented but it was removed recently because it caused
+	    serious slowdown and made the code more difficult to deal
 	    with).</para>
 	</sect4>
       </sect3>
@@ -390,64 +435,70 @@
     <sect2 xml:id="what-is-linux">
       <title>What is &linux;</title>
 
-      <para>&linux; is a &unix;-like kernel originally developed by Linus
-	Torvalds, and now being contributed to by a massive crowd of
-	programmers all around the world.  From its mere beginnings to today,
-	with wide support from companies such as IBM or Google, &linux; is
-	being associated with its fast development pace, full hardware support
-	and benevolent dictator model of organization.</para>
+      <para>&linux; is a &unix;-like kernel originally developed by
+	Linus Torvalds, and now being contributed to by a massive
+	crowd of programmers all around the world.  From its mere
+	beginnings to today, with wide support from companies such as
+	IBM or Google, &linux; is being associated with its fast
+	development pace, full hardware support and benevolent
+	dictator model of organization.</para>
 
-      <para>&linux; development started in 1991 as a hobbyist project at
-	University of Helsinki in Finland.  Since then it has obtained all the
-	features of a modern &unix;-like OS: multiprocessing, multiuser
-	support, virtual memory, networking, basically everything is there.
-	There are also highly advanced features like virtualization etc.</para>
+      <para>&linux; development started in 1991 as a hobbyist project
+	at University of Helsinki in Finland.  Since then it has
+	obtained all the features of a modern &unix;-like OS:
+	multiprocessing, multiuser support, virtual memory,
+	networking, basically everything is there.  There are also
+	highly advanced features like virtualization etc.</para>
 
-      <para>As of 2006 &linux; seems to be the most widely used open source
-	operating system with support from independent software vendors like
-	Oracle, RealNetworks, Adobe, etc.  Most of the commercial software
-	distributed for &linux; can only be obtained in a binary form so
-	recompilation for other operating systems is impossible.</para>
+      <para>As of 2006 &linux; seems to be the most widely used open
+	source operating system with support from independent software
+	vendors like Oracle, RealNetworks, Adobe, etc.  Most of the
+	commercial software distributed for &linux; can only be
+	obtained in a binary form so recompilation for other operating
+	systems is impossible.</para>
 
       <para>Most of the &linux; development happens in a
 	<application>Git</application> version control system.
-	<application>Git</application> is a distributed system so there is
-	no central source of the &linux; code, but some branches are considered
-	prominent and official.  The version number scheme implemented by
-	&linux; consists of four numbers A.B.C.D.  Currently development
-	happens in 2.6.C.D, where C represents major version, where new
-	features are added or changed while D is a minor version for bugfixes
-	only.</para>
+	<application>Git</application> is a distributed system so
+	there is no central source of the &linux; code, but some
+	branches are considered prominent and official.  The version
+	number scheme implemented by &linux; consists of four numbers
+	A.B.C.D.  Currently development happens in 2.6.C.D, where C
+	represents major version, where new features are added or
+	changed while D is a minor version for bugfixes only.</para>
 
       <para>More information can be obtained from [3].</para>
 
       <sect3 xml:id="linux-tech-details">
 	<title>Technical details</title>
 
-      <para>&linux; follows the traditional &unix; scheme of dividing the run
-	of a process in two halves: the kernel and user space.  The kernel can
-	be entered in two ways: via a trap or via a syscall.  The return is
-	handled only in one way.  The further description applies to
-	&linux;&nbsp;2.6 on the &i386; architecture.  This information was
-	taken from [2].</para>
+	<para>&linux; follows the traditional &unix; scheme of
+	  dividing the run of a process in two halves: the kernel and
+	  user space.  The kernel can be entered in two ways: via a
+	  trap or via a syscall.  The return is handled only in one
+	  way.  The further description applies to &linux;&nbsp;2.6 on
+	  the &i386; architecture.  This information was taken from
+	  [2].</para>
 
 	<sect4 xml:id="linux-syscalls">
 	  <title>Syscalls</title>
 
 	  <para>Syscalls in &linux; are performed (in userspace) using
-	    <literal>syscallX</literal> macros where X substitutes a number
-	    representing the number of parameters of the given syscall.  This
-	    macro translates to a code that loads <varname>%eax</varname>
-	    register with a number of the syscall and executes interrupt
-	    <literal>0x80</literal>.  After this syscall return is called,
-	    which translates negative return values to positive
-	    <literal>errno</literal> values and sets <literal>res</literal> to
-	    <literal>-1</literal> in case of an error.  Whenever the interrupt
-	    <literal>0x80</literal> is called the process enters the kernel in
-	    system call trap handler.  This routine saves all registers on the
-	    stack and calls the selected syscall entry.  Note that the &linux;
-	    calling convention expects parameters to the syscall to be passed
-	    via registers as shown here:</para>
+	    <literal>syscallX</literal> macros where X substitutes a
+	    number representing the number of parameters of the given
+	    syscall.  This macro translates to a code that loads
+	    <varname>%eax</varname> register with a number of the
+	    syscall and executes interrupt <literal>0x80</literal>.
+	    After this syscall return is called, which translates
+	    negative return values to positive
+	    <literal>errno</literal> values and sets
+	    <literal>res</literal> to <literal>-1</literal> in case of
+	    an error.  Whenever the interrupt <literal>0x80</literal>
+	    is called the process enters the kernel in system call
+	    trap handler.  This routine saves all registers on the
+	    stack and calls the selected syscall entry.  Note that the
+	    &linux; calling convention expects parameters to the
+	    syscall to be passed via registers as shown here:</para>
 
 	  <orderedlist>
 	    <listitem>
@@ -470,53 +521,58 @@
 	    </listitem>
 	  </orderedlist>
 
-	  <para>There are some exceptions to this, where &linux; uses different
-	    calling convention (most notably the <literal>clone</literal>
-	    syscall).</para>
+	  <para>There are some exceptions to this, where &linux; uses
+	    different calling convention (most notably the
+	    <literal>clone</literal> syscall).</para>
 	</sect4>
 
 	<sect4 xml:id="linux-traps">
 	  <title>Traps</title>
 
 	  <para>The trap handlers are introduced in
-	    <filename>arch/i386/kernel/traps.c</filename> and most of these
-	    handlers live in <filename>arch/i386/kernel/entry.S</filename>,
-	    where handling of the traps happens.</para>
+	    <filename>arch/i386/kernel/traps.c</filename> and most of
+	    these handlers live in
+	    <filename>arch/i386/kernel/entry.S</filename>, where
+	    handling of the traps happens.</para>
 	</sect4>
 
 	<sect4 xml:id="linux-exits">
 	  <title>Exits</title>
 
-	  <para>Return from the syscall is managed by syscall &man.exit.3;,
-	    which checks for the process having unfinished work, then checks
-	    whether we used user-supplied selectors.  If this happens stack
-	    fixing is applied and finally the registers are restored from the
-	    stack and the process returns to the userspace.</para>
+	  <para>Return from the syscall is managed by syscall
+	    &man.exit.3;, which checks for the process having
+	    unfinished work, then checks whether we used user-supplied
+	    selectors.  If this happens stack fixing is applied and
+	    finally the registers are restored from the stack and the
+	    process returns to the userspace.</para>
 	</sect4>
 
 	<sect4 xml:id="linux-unix-primitives">
 	  <title>&unix; primitives</title>
 
-	  <para>In the 2.6 version, the &linux; operating system redefined some
-	    of the traditional &unix; primitives, notably PID, TID and thread.
-	    PID is defined not to be unique for every process, so for some
-	    processes (threads) &man.getppid.2; returns the same value.  Unique
-	    identification of process is provided by TID.  This is because
-	    <firstterm>NPTL</firstterm> (New &posix; Thread Library) defines
-	    threads to be normal processes (so called 1:1 threading).  Spawning
-	     a new process in &linux;&nbsp;2.6 happens using the
-	    <literal>clone</literal> syscall (fork variants are reimplemented using
-	    it).  This clone syscall defines a set of flags that affect
-	    behavior of the cloning process regarding thread implementation.
-	    The semantic is a bit fuzzy as there is no single flag telling the
-	    syscall to create a thread.</para>
+	  <para>In the 2.6 version, the &linux; operating system
+	    redefined some of the traditional &unix; primitives,
+	    notably PID, TID and thread.  PID is defined not to be
+	    unique for every process, so for some processes (threads)
+	    &man.getppid.2; returns the same value.  Unique
+	    identification of process is provided by TID.  This is
+	    because <firstterm>NPTL</firstterm> (New &posix; Thread
+	    Library) defines threads to be normal processes (so called
+	    1:1 threading).  Spawning a new process in
+	    &linux;&nbsp;2.6 happens using the
+	    <literal>clone</literal> syscall (fork variants are
+	    reimplemented using it).  This clone syscall defines a set
+	    of flags that affect behavior of the cloning process
+	    regarding thread implementation.  The semantic is a bit
+	    fuzzy as there is no single flag telling the syscall to
+	    create a thread.</para>
 
 	  <para>Implemented clone flags are:</para>
 
 	  <itemizedlist>
 	    <listitem>
-	      <para><literal>CLONE_VM</literal> - processes share their memory
-		space</para>
+	      <para><literal>CLONE_VM</literal> - processes share
+		their memory space</para>
 	    </listitem>
 	    <listitem>
 	      <para><literal>CLONE_FS</literal> - share umask, cwd and
@@ -527,72 +583,78 @@
 		files</para>
 	    </listitem>
 	    <listitem>
-	      <para><literal>CLONE_SIGHAND</literal> - share signal handlers
-		and blocked signals</para>
+	      <para><literal>CLONE_SIGHAND</literal> - share signal
+		handlers and blocked signals</para>
 	    </listitem>
 	    <listitem>
-	      <para><literal>CLONE_PARENT</literal> - share parent</para>
+	      <para><literal>CLONE_PARENT</literal> - share
+		parent</para>
 	    </listitem>
 	    <listitem>
-	      <para><literal>CLONE_THREAD</literal> - be thread (further
-		explanation below)</para>
+	      <para><literal>CLONE_THREAD</literal> - be thread
+		(further explanation below)</para>
 	    </listitem>
 	    <listitem>
-	      <para><literal>CLONE_NEWNS</literal> - new namespace</para>
+	      <para><literal>CLONE_NEWNS</literal> - new
+		namespace</para>
 	    </listitem>
 	    <listitem>
 	      <para><literal>CLONE_SYSVSEM</literal> - share SysV undo
 		structures</para>
 	    </listitem>
 	    <listitem>
-	      <para><literal>CLONE_SETTLS</literal> - setup TLS at supplied
-		address</para>
+	      <para><literal>CLONE_SETTLS</literal> - setup TLS at
+		supplied address</para>
 	    </listitem>
 	    <listitem>
-	      <para><literal>CLONE_PARENT_SETTID</literal> - set TID in the
-		parent</para>
+	      <para><literal>CLONE_PARENT_SETTID</literal> - set TID
+		in the parent</para>
 	    </listitem>
 	    <listitem>
-	      <para><literal>CLONE_CHILD_CLEARTID</literal> - clear TID in the
-		child</para>
+	      <para><literal>CLONE_CHILD_CLEARTID</literal> - clear
+		TID in the child</para>
 	    </listitem>
 	    <listitem>
-	      <para><literal>CLONE_CHILD_SETTID</literal> - set TID in the
-		child</para>
+	      <para><literal>CLONE_CHILD_SETTID</literal> - set TID in
+		the child</para>
 	    </listitem>
 	  </itemizedlist>
 
-	  <para><literal>CLONE_PARENT</literal> sets the real parent to the
-	    parent of the caller.  This is useful for threads because if thread
-	    A creates thread B we want thread B to be parented to the parent of
-	    the whole thread group.  <literal>CLONE_THREAD</literal> does
-	    exactly the same thing as <literal>CLONE_PARENT</literal>,
-	    <literal>CLONE_VM</literal> and <literal>CLONE_SIGHAND</literal>,
-	    rewrites PID to be the same as PID of the caller, sets exit signal
-	    to be none and enters the thread group.
-	    <literal>CLONE_SETTLS</literal> sets up GDT entries for TLS
-	    handling.  The <literal>CLONE_*_*TID</literal> set of flags
-	    sets/clears user supplied address to TID or 0.</para>
+	  <para><literal>CLONE_PARENT</literal> sets the real parent
+	    to the parent of the caller.  This is useful for threads
+	    because if thread A creates thread B we want thread B to
+	    be parented to the parent of the whole thread group.
+	    <literal>CLONE_THREAD</literal> does exactly the same
+	    thing as <literal>CLONE_PARENT</literal>,
+	    <literal>CLONE_VM</literal> and
+	    <literal>CLONE_SIGHAND</literal>, rewrites PID to be the
+	    same as PID of the caller, sets exit signal to be none and
+	    enters the thread group.  <literal>CLONE_SETTLS</literal>
+	    sets up GDT entries for TLS handling.  The
+	    <literal>CLONE_*_*TID</literal> set of flags sets/clears
+	    user supplied address to TID or 0.</para>
 
-	  <para>As you can see the <literal>CLONE_THREAD</literal> does most
-	    of the work and does not seem to fit the scheme very well.  The
-	    original intention is unclear (even for authors, according to
-	    comments in the code) but I think originally there was one
-	    threading flag, which was then parcelled among many other flags
-	    but this separation was never fully finished.  It is also unclear
-	    what this partition is good for as glibc does not use that so only
-	    hand-written use of the clone permits a programmer to access this
-	    features.</para>
+	  <para>As you can see the <literal>CLONE_THREAD</literal>
+	    does most of the work and does not seem to fit the scheme
+	    very well.  The original intention is unclear (even for
+	    authors, according to comments in the code) but I think
+	    originally there was one threading flag, which was then
+	    parcelled among many other flags but this separation was
+	    never fully finished.  It is also unclear what this
+	    partition is good for as glibc does not use that so only
+	    hand-written use of the clone permits a programmer to
+	    access this features.</para>
 
-	  <para>For non-threaded programs the PID and TID are the same.  For
-	    threaded programs the first thread PID and TID are the same and
-	    every created thread shares the same PID and gets assigned a
-	    unique TID (because <literal>CLONE_THREAD</literal> is passed in)
-	    also parent is shared for all processes forming this threaded
+	  <para>For non-threaded programs the PID and TID are the
+	    same.  For threaded programs the first thread PID and TID
+	    are the same and every created thread shares the same PID
+	    and gets assigned a unique TID (because
+	    <literal>CLONE_THREAD</literal> is passed in) also parent
+	    is shared for all processes forming this threaded
 	    program.</para>
 
-	  <para>The code that implements &man.pthread.create.3; in NPTL defines
-	    the clone flags like this:</para>
+	  <para>The code that implements &man.pthread.create.3; in
+	    NPTL defines the clone flags like this:</para>
 
 	  <programlisting>int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGNAL
 
@@ -606,12 +668,13 @@
 
 | 0);</programlisting>
 
-	  <para>The <literal>CLONE_SIGNAL</literal> is defined like</para>
+	  <para>The <literal>CLONE_SIGNAL</literal> is defined
+	    like</para>
 
 	  <programlisting>#define CLONE_SIGNAL (CLONE_SIGHAND | CLONE_THREAD)</programlisting>
 
-	  <para>the last 0 means no signal is sent when any of the threads
-	    exits.</para>
+	  <para>the last 0 means no signal is sent when any of the
+	    threads exits.</para>
 	</sect4>
       </sect3>
     </sect2>
@@ -619,71 +682,80 @@
     <sect2 xml:id="what-is-emu">
       <title>What is emulation</title>
 
-      <para>According to a dictionary definition, emulation is the ability of
-	a program or device to imitate another program or device.  This is
-	achieved by providing the same reaction to a given stimulus as the
-	emulated object.  In practice, the software world mostly sees three
-	types of emulation - a program used to emulate a machine (QEMU, various
-	game console emulators etc.), software emulation of a hardware facility
-	(OpenGL emulators, floating point units emulation etc.) and operating
-	system emulation (either in kernel of the operating system or as a
-	userspace program).</para>
+      <para>According to a dictionary definition, emulation is the
+	ability of a program or device to imitate another program or
+	device.  This is achieved by providing the same reaction to a
+	given stimulus as the emulated object.  In practice, the
+	software world mostly sees three types of emulation - a
+	program used to emulate a machine (QEMU, various game console
+	emulators etc.), software emulation of a hardware facility
+	(OpenGL emulators, floating point units emulation etc.) and
+	operating system emulation (either in kernel of the operating
+	system or as a userspace program).</para>
 
-      <para>Emulation is usually used in a place, where using the original
-	component is not feasible nor possible at all.  For example someone
-	might want to use a program developed for a different operating
-	system than they use.  Then emulation comes in handy.  Sometimes
-	there is no other way but to use emulation - e.g. when the hardware
-	device you try to use does not exist (yet/anymore) then there is no
-	other way but emulation.  This happens often when porting an operating
+      <para>Emulation is usually used in a place, where using the
+	original component is not feasible nor possible at all.  For
+	example someone might want to use a program developed for a
+	different operating system than they use.  Then emulation
+	comes in handy.  Sometimes there is no other way but to use
+	emulation - e.g. when the hardware device you try to use does
+	not exist (yet/anymore) then there is no other way but
+	emulation.  This happens often when porting an operating
 	system to a new (non-existent) platform.  Sometimes it is just
 	cheaper to emulate.</para>
 
-      <para>Looking from an implementation point of view, there are two main
-	approaches to the implementation of emulation.  You can either emulate
-	the whole thing - accepting possible inputs of the original object,
-	maintaining inner state and emitting correct output based on the state
-	and/or input.  This kind of emulation does not require any special
-	conditions and basically can be implemented anywhere for any
-	device/program.  The drawback is that implementing such emulation is
-	quite difficult, time-consuming and error-prone.  In some cases we can
-	use a simpler approach.  Imagine you want to emulate a printer that
-	prints from left to right on a printer that prints from right to left.
-	It is obvious that there is no need for a complex emulation layer but
-	simply reversing of the printed text is sufficient.  Sometimes the
-	emulating environment is very similar to the emulated one so just a
-	thin layer of some translation is necessary to provide fully working
-	emulation!  As you can see this is much less demanding to implement,
-	so less time-consuming and error-prone than the previous approach.  But
-	the necessary condition is that the two environments must be similar
-	enough.  The third approach combines the two previous.  Most of the
-	time the objects do not provide the same capabilities so in a case of
-	emulating the more powerful one on the less powerful we have to emulate
-	the missing features with full emulation described above.</para>
+      <para>Looking from an implementation point of view, there are
+	two main approaches to the implementation of emulation.  You
+	can either emulate the whole thing - accepting possible inputs
+	of the original object, maintaining inner state and emitting
+	correct output based on the state and/or input.  This kind of
+	emulation does not require any special conditions and
+	basically can be implemented anywhere for any device/program.
+	The drawback is that implementing such emulation is quite
+	difficult, time-consuming and error-prone.  In some cases we
+	can use a simpler approach.  Imagine you want to emulate a
+	printer that prints from left to right on a printer that
+	prints from right to left.  It is obvious that there is no
+	need for a complex emulation layer but simply reversing of the
+	printed text is sufficient.  Sometimes the
+	emulating environment is very similar to the emulated one so
+	just a thin layer of some translation is necessary to provide
+	fully working emulation!  As you can see this is much less
+	demanding to implement, so less time-consuming and error-prone
+	than the previous approach.  But the necessary condition is
+	that the two environments must be similar enough.  The third
+	approach combines the two previous.  Most of the time the
+	objects do not provide the same capabilities so in a case of
+	emulating the more powerful one on the less powerful we have
+	to emulate the missing features with full emulation described
+	above.</para>
 
-      <para>This master thesis deals with emulation of &unix; on &unix;, which
-	is exactly the case, where only a thin layer of translation is
-	sufficient to provide full emulation.  The &unix; API consists of a set
-	of syscalls, which are usually self contained and do not affect some
-	global kernel state.</para>
-
-      <para>There are a few syscalls that affect inner state but this can be
-	dealt with by providing some structures that maintain the extra
+      <para>This master thesis deals with emulation of &unix; on
+	&unix;, which is exactly the case, where only a thin layer of
+	translation is sufficient to provide full emulation.  The
+	&unix; API consists of a set of syscalls, which are usually
+	self contained and do not affect some global kernel
 	state.</para>
 
-      <para>No emulation is perfect and emulations tend to lack some parts but
-	this usually does not cause any serious drawbacks.  Imagine a game
-	console emulator that emulates everything but music output.  No doubt
-	that the games are playable and one can use the emulator.  It might
-	not be that comfortable as the original game console but its an
-	acceptable compromise between price and comfort.</para>
+      <para>There are a few syscalls that affect inner state but this
+	can be dealt with by providing some structures that maintain
+	the extra state.</para>
 
-      <para>The same goes with the &unix; API.  Most programs can live with a
-	very limited set of syscalls working.  Those syscalls tend to be the
-	oldest ones (&man.read.2;/&man.write.2;, &man.fork.2; family,
-	&man.signal.3; handling, &man.exit.3;, &man.socket.2; API) hence it is
-	easy to emulate because their semantics is shared among all
-	&unix;es, which exist todays.</para>
+      <para>No emulation is perfect and emulations tend to lack some
+	parts but this usually does not cause any serious drawbacks.
+	Imagine a game console emulator that emulates everything but
+	music output.  No doubt that the games are playable and one
+	can use the emulator.  It might not be that comfortable as the
+	original game console but its an acceptable compromise between
+	price and comfort.</para>
+
+      <para>The same goes with the &unix; API.  Most programs can live
+	with a very limited set of syscalls working.  Those syscalls
+	tend to be the oldest ones (&man.read.2;/&man.write.2;,
+	&man.fork.2; family, &man.signal.3; handling, &man.exit.3;,
+	&man.socket.2; API) hence it is easy to emulate because their
+	semantics is shared among all &unix;es, which exist
+	todays.</para>
     </sect2>
   </sect1>
 
@@ -693,63 +765,69 @@
     <sect2>
       <title>How emulation works in &os;</title>
 
-      <para>As stated earlier, &os; supports running binaries from several
-	other &unix;es.  This works because &os; has an abstraction called the
-	execution class loader.  This wedges into the &man.execve.2; syscall,
-	so when &man.execve.2; is about to execute a binary it examines its
-	type.</para>
+      <para>As stated earlier, &os; supports running binaries from
+	several other &unix;es.  This works because &os; has an
+	abstraction called the execution class loader.  This wedges
+	into the &man.execve.2; syscall, so when &man.execve.2; is
+	about to execute a binary it examines its type.</para>
 
-      <para>There are basically two types of binaries in &os;.  Shell-like text
-	scripts which are identified by <literal>#!</literal> as their first
-	two characters and normal (typically <firstterm>ELF</firstterm>)
-	binaries, which are a representation of a compiled executable object.
-	The vast majority (one could say all of them) of binaries in &os; are
-	from type ELF.  ELF files contain a header, which specifies the OS ABI
-	for this ELF file.  By reading this information, the operating system
-	can accurately determine what type of binary the given file is.</para>
+      <para>There are basically two types of binaries in &os;.
+	Shell-like text scripts which are identified by
+	<literal>#!</literal> as their first two characters and normal
+	(typically <firstterm>ELF</firstterm>) binaries, which are a
+	representation of a compiled executable object.  The vast
+	majority (one could say all of them) of binaries in &os; are
+	from type ELF.  ELF files contain a header, which specifies
+	the OS ABI for this ELF file.  By reading this information,
+	the operating system can accurately determine what type of
+	binary the given file is.</para>
 
-      <para>Every OS ABI must be registered in the &os; kernel.  This applies
-	to the &os; native OS ABI, as well.  So when &man.execve.2; executes a
-	binary it iterates through the list of registered APIs and when it
-	finds the right one it starts to use the information contained in the
-	OS ABI description (its syscall table, <literal>errno</literal>
-	translation table, etc.).  So every time the process calls a syscall,
-	it uses its own set of syscalls instead of some global one.  This
+      <para>Every OS ABI must be registered in the &os; kernel.  This
+	applies to the &os; native OS ABI, as well.  So when
+	&man.execve.2; executes a binary it iterates through the list
+	of registered APIs and when it finds the right one it starts
+	to use the information contained in the OS ABI description
+	(its syscall table, <literal>errno</literal> translation
+	table, etc.).  So every time the process calls a syscall, it
+	uses its own set of syscalls instead of some global one.  This
 	effectively provides a very elegant and easy way of supporting
 	execution of various binary formats.</para>
 
-      <para>The nature of emulation of different OSes (and also some other
-	subsystems) led developers to invite a handler event mechanism.  There
-	are various places in the kernel, where a list of event handlers are
-	called.  Every subsystem can register an event handler and they are
-	called accordingly.  For example, when a process exits there is a
-	handler called that possibly cleans up whatever the subsystem needs
-	to be cleaned.</para>
+      <para>The nature of emulation of different OSes (and also some
+	other subsystems) led developers to invite a handler event
+	mechanism.  There are various places in the kernel, where a
+	list of event handlers are called.  Every subsystem can
+	register an event handler and they are called accordingly.
+	For example, when a process exits there is a handler called
+	that possibly cleans up whatever the subsystem needs to be
+	cleaned.</para>
 
-      <para>Those simple facilities provide basically everything that is needed
-	for the emulation infrastructure and in fact these are basically the
-	only things necessary to implement the &linux; emulation layer.</para>
+      <para>Those simple facilities provide basically everything that
+	is needed for the emulation infrastructure and in fact these
+	are basically the only things necessary to implement the
+	&linux; emulation layer.</para>
     </sect2>
 
     <sect2 xml:id="freebsd-common-primitives">
       <title>Common primitives in the &os; kernel</title>
 
-      <para>Emulation layers need some support from the operating system.  I am
-	going to describe some of the supported primitives in the &os;
-	operating system.</para>
+      <para>Emulation layers need some support from the operating
+	system.  I am going to describe some of the supported
+	primitives in the &os; operating system.</para>
 
       <sect3 xml:id="freebsd-locking-primitives">
 	<title>Locking primitives</title>
 
 	<para>Contributed by: &a.attilio.email;</para>
 
-	<para>The &os; synchronization primitive set is based on the idea to
-	  supply a rather huge number of different primitives in a way that
-	  the better one can be used for every particular, appropriate
-	  situation.</para>
+	<para>The &os; synchronization primitive set is based on the
+	  idea to supply a rather huge number of different primitives
+	  in a way that the better one can be used for every
+	  particular, appropriate situation.</para>
 
-	<para>To a high level point of view you can consider three kinds of
-	  synchronization primitives in the &os; kernel:</para>
+	<para>To a high level point of view you can consider three
+	  kinds of synchronization primitives in the &os;
+	  kernel:</para>
 
 	<itemizedlist>
 	  <listitem>
@@ -763,62 +841,68 @@
 	  </listitem>
 	</itemizedlist>
 
-	<para>Below there are descriptions for the 3 families.  For every lock,
-	  you should really check the linked manpage (where possible) for
-	  more detailed explanations.</para>
+	<para>Below there are descriptions for the 3 families.  For
+	  every lock, you should really check the linked manpage
+	  (where possible) for more detailed explanations.</para>
 
 	<sect4 xml:id="freebsd-atomic-op">
 	  <title>Atomic operations and memory barriers</title>
 
-	  <para>Atomic operations are implemented through a set of functions
-	    performing simple arithmetics on memory operands in an atomic way
-	    with respect to external events (interrupts, preemption, etc.).
-	    Atomic operations can guarantee atomicity just on small data types
-	    (in the magnitude order of the <literal>.long.</literal>
-	    architecture C data type), so should be rarely used directly in the
-	    end-level code, if not only for very simple operations (like flag
-	    setting in a bitmap, for example).  In fact, it is rather simple
-	    and common to write down a wrong semantic based on just atomic
-	    operations (usually referred as lock-less).  The &os; kernel offers
-	    a way to perform atomic operations in conjunction with a memory
-	    barrier.  The memory barriers will guarantee that an atomic
-	    operation will happen following some specified ordering with
-	    respect to other memory accesses.  For example, if we need that an
-	    atomic operation happen just after all other pending writes (in
-	    terms of instructions reordering buffers activities) are completed,
-	    we need to explicitly use a memory barrier in conjunction to this
-	    atomic operation.  So it is simple to understand why memory
-	    barriers play a key role for higher-level locks building (just
-	    as refcounts, mutexes, etc.).  For a detailed explanatory on atomic
-	    operations, please refer to &man.atomic.9;.  It is far, however,
-	    noting that atomic operations (and memory barriers as well) should
-	    ideally only be used for building front-ending locks (as
-	    mutexes).</para>
+	  <para>Atomic operations are implemented through a set of
+	    functions performing simple arithmetics on memory operands
+	    in an atomic way with respect to external events
+	    (interrupts, preemption, etc.).  Atomic operations can
+	    guarantee atomicity just on small data types (in the
+	    magnitude order of the <literal>.long.</literal>
+	    architecture C data type), so should be rarely used
+	    directly in the end-level code, if not only for very
+	    simple operations (like flag setting in a bitmap, for
+	    example).  In fact, it is rather simple and common to
+	    write down a wrong semantic based on just atomic
+	    operations (usually referred as lock-less).  The &os;
+	    kernel offers a way to perform atomic operations in
+	    conjunction with a memory barrier.  The memory barriers
+	    will guarantee that an atomic operation will happen
+	    following some specified ordering with respect to other
+	    memory accesses.  For example, if we need that an atomic
+	    operation happen just after all other pending writes (in
+	    terms of instructions reordering buffers activities) are
+	    completed, we need to explicitly use a memory barrier in
+	    conjunction to this atomic operation.  So it is simple to
+	    understand why memory barriers play a key role for
+	    higher-level locks building (just as refcounts, mutexes,
+	    etc.).  For a detailed explanatory on atomic operations,
+	    please refer to &man.atomic.9;.  It is far, however,
+	    noting that atomic operations (and memory barriers as
+	    well) should ideally only be used for building
+	    front-ending locks (as mutexes).</para>
 	</sect4>
 
 	<sect4 xml:id="freebsd-refcounts">
 	  <title>Refcounts</title>
 
-	  <para>Refcounts are interfaces for handling reference counters.
-	    They are implemented through atomic operations and are intended to
-	    be used just for cases, where the reference counter is the only
-	    one thing to be protected, so even something like a spin-mutex is
-	    deprecated.  Using the refcount interface for structures, where
-	    a mutex is already used is often wrong since we should probably
-	    close the reference counter in some already protected paths.  A
-	    manpage discussing refcount does not exist currently, just check
-	    <filename>sys/refcount.h</filename> for an overview of the
-	    existing API.</para>
+	  <para>Refcounts are interfaces for handling reference
+	    counters.  They are implemented through atomic operations
+	    and are intended to be used just for cases, where the
+	    reference counter is the only one thing to be protected,
+	    so even something like a spin-mutex is deprecated.  Using
+	    the refcount interface for structures, where a mutex is
+	    already used is often wrong since we should probably close
+	    the reference counter in some already protected paths.  A
+	    manpage discussing refcount does not exist currently, just
+	    check <filename>sys/refcount.h</filename> for an overview
+	    of the existing API.</para>
 	</sect4>
 
 	<sect4 xml:id="freebsd-locks">
 	  <title>Locks</title>
 
-	  <para>&os; kernel has huge classes of locks.  Every lock is defined
-	    by some peculiar properties, but probably the most important is the
-	    event linked to contesting holders (or in other terms, the
-	    behavior of threads unable to acquire the lock).  &os;'s locking
-	    scheme presents three different behaviors for contenders:</para>
+	  <para>&os; kernel has huge classes of locks.  Every lock is
+	    defined by some peculiar properties, but probably the most
+	    important is the event linked to contesting holders (or in
+	    other terms, the behavior of threads unable to acquire the
+	    lock).  &os;'s locking scheme presents three different
+	    behaviors for contenders:</para>
 
 	  <orderedlist>
 	    <listitem>
@@ -840,55 +924,60 @@
 	<sect4 xml:id="freebsd-spinlocks">
 	  <title>Spinning locks</title>
 
-	  <para>Spin locks let waiters to spin until they cannot acquire the
-	    lock.  An important matter do deal with is when a thread contests
-	    on a spin lock if it is not descheduled.  Since the &os; kernel
-	    is preemptive, this exposes spin lock at the risk of deadlocks
-	    that can be solved just disabling interrupts while they are
-	    acquired.  For this and other reasons (like lack of priority
-	    propagation support, poorness in load balancing schemes between
-	    CPUs, etc.), spin locks are intended to protect very small paths
-	    of code, or ideally not to be used at all if not explicitly
-	    requested (explained later).</para>
+	  <para>Spin locks let waiters to spin until they cannot
+	    acquire the lock.  An important matter do deal with is
+	    when a thread contests on a spin lock if it is not
+	    descheduled.  Since the &os; kernel is preemptive, this
+	    exposes spin lock at the risk of deadlocks that can be
+	    solved just disabling interrupts while they are acquired.
+	    For this and other reasons (like lack of priority
+	    propagation support, poorness in load balancing schemes
+	    between CPUs, etc.), spin locks are intended to protect
+	    very small paths of code, or ideally not to be used at all
+	    if not explicitly requested (explained later).</para>
 	</sect4>
 
 	<sect4 xml:id="freebsd-blocking">
 	  <title>Blocking</title>
 
-	  <para>Block locks let waiters to be descheduled and blocked until
-	    the lock owner does not drop it and wakes up one or more
-	    contenders.  In order to avoid starvation issues, blocking locks
-	    do priority propagation from the waiters to the owner.  Block
-	    locks must be implemented through the turnstile interface and are
-	    intended to be the most used kind of locks in the kernel, if no
-	    particular conditions are met.</para>
+	  <para>Block locks let waiters to be descheduled and blocked
+	    until the lock owner does not drop it and wakes up one or
+	    more contenders.  In order to avoid starvation issues,
+	    blocking locks do priority propagation from the waiters to
+	    the owner.  Block locks must be implemented through the
+	    turnstile interface and are intended to be the most used
+	    kind of locks in the kernel, if no particular conditions
+	    are met.</para>
 	</sect4>
 
 	<sect4 xml:id="freebsd-sleeping">
 	  <title>Sleeping</title>
 
-	  <para>Sleep locks let waiters to be descheduled and fall asleep
-	    until the lock holder does not drop it and wakes up one or more
-	    waiters.  Since sleep locks are intended to protect large paths
-	    of code and to cater asynchronous events, they do not do any form
-	    of priority propagation.  They must be implemented through the
-	    &man.sleepqueue.9; interface.</para>
+	  <para>Sleep locks let waiters to be descheduled and fall
+	    asleep until the lock holder does not drop it and wakes up
+	    one or more waiters.  Since sleep locks are intended to
+	    protect large paths of code and to cater asynchronous
+	    events, they do not do any form of priority propagation.
+	    They must be implemented through the &man.sleepqueue.9;
+	    interface.</para>
 
-	  <para>The order used to acquire locks is very important, not only for
-	    the possibility to deadlock due at lock order reversals, but even
-	    because lock acquisition should follow specific rules linked to
-	    locks natures.  If you give a look at the table above, the
-	    practical rule is that if a thread holds a lock of level n (where
-	    the level is the number listed close to the kind of lock) it is not
-	    allowed to acquire a lock of superior levels, since this would
-	    break the specified semantic for a path.  For example, if a thread
-	    holds a block lock (level 2), it is allowed to acquire a spin lock
-	    (level 1) but not a sleep lock (level 3), since block locks are
-	    intended to protect smaller paths than sleep lock (these rules are
-	    not about atomic operations or scheduling barriers,
-	    however).</para>
+	  <para>The order used to acquire locks is very important, not
+	    only for the possibility to deadlock due at lock order
+	    reversals, but even because lock acquisition should follow
+	    specific rules linked to locks natures.  If you give a
+	    look at the table above, the practical rule is that if a
+	    thread holds a lock of level n (where the level is the
+	    number listed close to the kind of lock) it is not allowed
+	    to acquire a lock of superior levels, since this would
+	    break the specified semantic for a path.  For example, if
+	    a thread holds a block lock (level 2), it is allowed to
+	    acquire a spin lock (level 1) but not a sleep lock (level
+	    3), since block locks are intended to protect smaller
+	    paths than sleep lock (these rules are not about atomic
+	    operations or scheduling barriers, however).</para>
 
-	  <para>This is a list of lock with their respective behaviors:</para>
+	  <para>This is a list of lock with their respective
+	    behaviors:</para>
 
 	  <itemizedlist>
 	    <listitem>
@@ -901,8 +990,8 @@
 	      <para>pool mutex - blocking - &man.mtx.pool.9;</para>
 	    </listitem>
 	    <listitem>
-	      <para>sleep family - sleeping - &man.sleep.9; pause tsleep
-		msleep msleep spin msleep rw msleep sx</para>
+	      <para>sleep family - sleeping - &man.sleep.9; pause
+		tsleep msleep msleep spin msleep rw msleep sx</para>
 	    </listitem>
 	    <listitem>
 	      <para>condvar - sleeping - &man.condvar.9;</para>
@@ -921,17 +1010,18 @@
 	    </listitem>
 	  </itemizedlist>
 
-	  <para>Among these locks only mutexes, sxlocks, rwlocks and lockmgrs
-	    are intended to handle recursion, but currently recursion is only
-	    supported by mutexes and lockmgrs.</para>
+	  <para>Among these locks only mutexes, sxlocks, rwlocks and
+	    lockmgrs are intended to handle recursion, but currently
+	    recursion is only supported by mutexes and
+	    lockmgrs.</para>
 	</sect4>
 
 	<sect4 xml:id="freebsd-scheduling">
 	  <title>Scheduling barriers</title>
 
-	  <para>Scheduling barriers are intended to be used in order to drive
-	    scheduling of threading.  They consist mainly of three
-	    different stubs:</para>
+	  <para>Scheduling barriers are intended to be used in order
+	    to drive scheduling of threading.  They consist mainly of
+	    three different stubs:</para>
 
 	  <itemizedlist>
 	    <listitem>
@@ -945,28 +1035,31 @@
 	    </listitem>
 	  </itemizedlist>
 
-	  <para>Generally, these should be used only in a particular context
-	    and even if they can often replace locks, they should be avoided
-	    because they do not let the diagnose of simple eventual problems
-	    with locking debugging tools (as &man.witness.4;).</para>
+	  <para>Generally, these should be used only in a particular
+	    context and even if they can often replace locks, they
+	    should be avoided because they do not let the diagnose of
+	    simple eventual problems with locking debugging tools (as
+	    &man.witness.4;).</para>
 	</sect4>
 
 	<sect4 xml:id="freebsd-critical">
 	  <title>Critical sections</title>
 
-	  <para>The &os; kernel has been made preemptive basically to deal with
-	    interrupt threads.  In fact, in order to avoid high interrupt
-	    latency, time-sharing priority threads can be preempted by
-	    interrupt threads (in this way, they do not need to wait to be
-	    scheduled as the normal path previews).  Preemption, however,
-	    introduces new racing points that need to be handled, as well.
-	    Often, in order to deal with preemption, the simplest thing to do
-	    is to completely disable it.  A critical section defines a piece of
-	    code (borderlined by the pair of functions &man.critical.enter.9;
-	    and &man.critical.exit.9;, where preemption is guaranteed to not
-	    happen (until the protected code is fully executed).  This can
-	    often replace a lock effectively but should be used carefully in
-	    order to not lose the whole advantage that preemption
+	  <para>The &os; kernel has been made preemptive basically to
+	    deal with interrupt threads.  In fact, in order to avoid
+	    high interrupt latency, time-sharing priority threads can
+	    be preempted by interrupt threads (in this way, they do
+	    not need to wait to be scheduled as the normal path
+	    previews).  Preemption, however, introduces new racing
+	    points that need to be handled, as well.  Often, in order
+	    to deal with preemption, the simplest thing to do is to
+	    completely disable it.  A critical section defines a piece
+	    of code (borderlined by the pair of functions
+	    &man.critical.enter.9; and &man.critical.exit.9;, where
+	    preemption is guaranteed to not happen (until the
+	    protected code is fully executed).  This can often replace
+	    a lock effectively but should be used carefully in order
+	    to not lose the whole advantage that preemption
 	    brings.</para>
 	</sect4>
 
@@ -974,29 +1067,32 @@
 	  <title>sched_pin/sched_unpin</title>
 
 	  <para>Another way to deal with preemption is the
-	    <function>sched_pin()</function> interface.  If a piece of code
-	    is closed in the <function>sched_pin()</function>  and
-	    <function>sched_unpin()</function> pair of functions it is
-	    guaranteed that the respective thread, even if it can be preempted,
-	    it will always be executed on the same CPU.  Pinning is very
-	    effective in the particular case when we have to access at
-	    per-cpu datas and we assume other threads will not change those
-	    data.  The latter condition will determine a critical section
-	    as a too strong condition for our code.</para>
+	    <function>sched_pin()</function> interface.  If a piece of
+	    code is closed in the <function>sched_pin()</function>
+	    and <function>sched_unpin()</function> pair of functions
+	    it is guaranteed that the respective thread, even if it
+	    can be preempted, it will always be executed on the same
+	    CPU.  Pinning is very effective in the particular case
+	    when we have to access at per-cpu datas and we assume
+	    other threads will not change those data.  The latter
+	    condition will determine a critical section as a too
+	    strong condition for our code.</para>
 	</sect4>
 
 	<sect4 xml:id="freebsd-schedbind">
 	  <title>sched_bind/sched_unbind</title>
 
-	  <para><function>sched_bind</function> is an API used in order to bind
-	    a thread to a particular CPU for all the time it executes the code,
-	    until a <function>sched_unbind</function> function call does not
-	    unbind it.  This feature has a key role in situations where you
-	    cannot trust the current state of CPUs (for example, at very early
-	    stages of boot), as you want to avoid your thread to migrate on
-	    inactive CPUs.  Since <function>sched_bind</function> and
-	    <function>sched_unbind</function> manipulate internal scheduler
-	    structures, they need to be enclosed in
+	  <para><function>sched_bind</function> is an API used in
+	    order to bind a thread to a particular CPU for all the
+	    time it executes the code, until a
+	    <function>sched_unbind</function> function call does not
+	    unbind it.  This feature has a key role in situations
+	    where you cannot trust the current state of CPUs (for
+	    example, at very early stages of boot), as you want to
+	    avoid your thread to migrate on inactive CPUs.  Since
+	    <function>sched_bind</function> and
+	    <function>sched_unbind</function> manipulate internal
+	    scheduler structures, they need to be enclosed in
 	    <function>sched_lock</function> acquisition/releasing when
 	    used.</para>
 	</sect4>
@@ -1005,71 +1101,78 @@
       <sect3 xml:id="freebsd-proc">
 	<title>Proc structure</title>
 
-	<para>Various emulation layers sometimes require some additional
-	  per-process data.  It can manage separate structures (a list, a tree
-	  etc.) containing these data for every process but this tends to be
-	  slow and memory consuming.  To solve this problem the &os;
+	<para>Various emulation layers sometimes require some
+	  additional per-process data.  It can manage separate
+	  structures (a list, a tree etc.) containing these data for
+	  every process but this tends to be slow and memory
+	  consuming.  To solve this problem the &os;
 	  <literal>proc</literal> structure contains
-	  <literal>p_emuldata</literal>, which is a void pointer to some
-	  emulation layer specific data.  This <literal>proc</literal> entry
-	  is protected by the proc mutex.</para>
+	  <literal>p_emuldata</literal>, which is a void pointer to
+	  some emulation layer specific data.  This
+	  <literal>proc</literal> entry is protected by the proc
+	  mutex.</para>
 
 	<para>The &os; <literal>proc</literal> structure contains a
-	  <literal>p_sysent</literal> entry that identifies, which ABI this
-	   process is running.  In fact, it is a pointer to the
-	  <literal>sysentvec</literal> described above.  So by comparing this
-	  pointer to the address where the <literal>sysentvec</literal>
-	  structure for the given ABI is stored we can effectively determine
-	  whether the process belongs to our emulation layer.  The code
-	  typically looks like:</para>
+	  <literal>p_sysent</literal> entry that identifies, which ABI
+	  this process is running.  In fact, it is a pointer to the
+	  <literal>sysentvec</literal> described above.  So by
+	  comparing this pointer to the address where the
+	  <literal>sysentvec</literal> structure for the given ABI is
+	  stored we can effectively determine whether the process
+	  belongs to our emulation layer.  The code typically looks
+	  like:</para>
 
 	<programlisting>if (__predict_true(p-&gt;p_sysent != &amp;elf_&linux;_sysvec))
 	  return;</programlisting>
 
 	<para>As you can see, we effectively use the
-	  <literal>__predict_true</literal> modifier to collapse the most
-	  common case (&os; process) to a simple return operation thus
-	  preserving high performance.  This code should be turned into a
-	  macro because currently it is not very flexible, i.e. we do not
-	  support &linux;64 emulation nor A.OUT &linux; processes
-	  on i386.</para>
+	  <literal>__predict_true</literal> modifier to collapse the
+	  most common case (&os; process) to a simple return operation
+	  thus preserving high performance.  This code should be
+	  turned into a macro because currently it is not very
+	  flexible, i.e. we do not support &linux;64 emulation nor
+	  A.OUT &linux; processes on i386.</para>
       </sect3>
 
       <sect3 xml:id="freebsd-vfs">
 	<title>VFS</title>
 
-	<para>The &os; VFS subsystem is very complex but the &linux; emulation
-	  layer uses just a small subset via a well defined API.  It can either
-	  operate on vnodes or file handlers.  Vnode represents a virtual
-	  vnode, i.e. representation of a node in VFS.  Another representation
-	  is a file handler, which represents an opened file from the
-	  perspective of a process.  A file handler can represent a socket or
-	  an ordinary file.  A file handler contains a pointer to its vnode.
-	  More then one file handler can point to the same vnode.</para>
+	<para>The &os; VFS subsystem is very complex but the &linux;
+	  emulation layer uses just a small subset via a well defined
+	  API.  It can either operate on vnodes or file handlers.
+	  Vnode represents a virtual vnode, i.e. representation of a
+	  node in VFS.  Another representation is a file handler,
+	  which represents an opened file from the perspective of a
+	  process.  A file handler can represent a socket or an
+	  ordinary file.  A file handler contains a pointer to its
+	  vnode.  More then one file handler can point to the same
+	  vnode.</para>
 
 	<sect4 xml:id="freebsd-namei">
 	  <title>namei</title>
 
-	  <para>The &man.namei.9; routine is a central entry point to pathname
-	    lookup and translation.  It traverses the path point by point from
-	    the starting point to the end point using lookup function, which is
-	    internal to VFS.  The &man.namei.9; syscall can cope with symlinks,
-	    absolute and relative paths.  When a path is looked up using
-	    &man.namei.9; it is inputed to the name cache.  This behavior can
-	    be suppressed.  This routine is used all over the kernel and its
-	    performance is very critical.</para>
+	  <para>The &man.namei.9; routine is a central entry point to
+	    pathname lookup and translation.  It traverses the path
+	    point by point from the starting point to the end point
+	    using lookup function, which is internal to VFS.  The
+	    &man.namei.9; syscall can cope with symlinks, absolute and
+	    relative paths.  When a path is looked up using
+	    &man.namei.9; it is inputed to the name cache.  This
+	    behavior can be suppressed.  This routine is used all over
+	    the kernel and its performance is very critical.</para>
 	</sect4>
 
 	<sect4 xml:id="freebsd-vn">
 	  <title>vn_fullpath</title>
 
-	  <para>The &man.vn.fullpath.9; function takes the best effort to
-	   traverse VFS name cache and returns a path for a given (locked)
-	   vnode.  This process is unreliable but works just fine for the most
-	   common cases.  The unreliability is because it relies on VFS cache
-	   (it does not traverse the on medium structures), it does not work
-	   with hardlinks, etc.  This routine is used in several places in the
-	   Linuxulator.</para>
+	  <para>The &man.vn.fullpath.9; function takes the best effort
+	    to traverse VFS name cache and returns a path for a given
+	    (locked) vnode.  This process is unreliable but works just
+	    fine for the most common cases.  The unreliability is
+	    because it relies on VFS cache (it does not traverse the
+	    on medium structures), it does not work with hardlinks,
+	    etc.  This routine is used in several places in the
+	    Linuxulator.</para>
 	</sect4>
 
 	<sect4 xml:id="freebsd-vnode">
@@ -1077,22 +1180,24 @@
 
 	  <itemizedlist>
 	    <listitem>
-	      <para><function>fgetvp</function> - given a thread and a file
-		descriptor number it returns the associated vnode</para>
+	      <para><function>fgetvp</function> - given a thread and a
+		file descriptor number it returns the associated
+		vnode</para>
 	    </listitem>
 	    <listitem>
 	      <para>&man.vn.lock.9; - locks a vnode</para>
 	    </listitem>
 	    <listitem>
-	      <para><function>vn_unlock</function> - unlocks a vnode</para>
+	      <para><function>vn_unlock</function> - unlocks a
+		vnode</para>
 	    </listitem>
 	    <listitem>
-	      <para>&man.VOP.READDIR.9; - reads a directory referenced by
-		a vnode</para>
+	      <para>&man.VOP.READDIR.9; - reads a directory referenced
+		by a vnode</para>
 	    </listitem>
 	    <listitem>
-	      <para>&man.VOP.GETATTR.9; - gets attributes of a file or a
-		directory referenced by a vnode</para>
+	      <para>&man.VOP.GETATTR.9; - gets attributes of a file or
+		a directory referenced by a vnode</para>
 	    </listitem>
 	    <listitem>
 	      <para>&man.VOP.LOOKUP.9; - looks up a path to a given
@@ -1107,14 +1212,16 @@
 		vnode</para>
 	    </listitem>
 	    <listitem>
-	      <para>&man.vput.9; - decrements the use count for a vnode and
-		unlocks it</para>
+	      <para>&man.vput.9; - decrements the use count for a
+		vnode and unlocks it</para>
 	    </listitem>
 	    <listitem>
-	      <para>&man.vrele.9; - decrements the use count for a vnode</para>
+	      <para>&man.vrele.9; - decrements the use count for a
+		vnode</para>
 	    </listitem>
 	    <listitem>
-	      <para>&man.vref.9; - increments the use count for a vnode</para>
+	      <para>&man.vref.9; - increments the use count for a
+		vnode</para>
 	    </listitem>
 	  </itemizedlist>
 	</sect4>
@@ -1124,13 +1231,13 @@
 
 	  <itemizedlist>
 	    <listitem>
-	      <para><function>fget</function> - given a thread and a file
-		descriptor number it returns associated file handler and
-		references it</para>
+	      <para><function>fget</function> - given a thread and a
+		file descriptor number it returns associated file
+		handler and references it</para>
 	    </listitem>
 	    <listitem>
-	      <para><function>fdrop</function> - drops a reference to a file
-		handler</para>
+	      <para><function>fdrop</function> - drops a reference to
+		a file handler</para>
 	    </listitem>
 	    <listitem>
 	      <para><function>fhold</function> - references a file
@@ -1145,46 +1252,50 @@
   <sect1 xml:id="md">
     <title>&linux; emulation layer -MD part</title>
 
-    <para>This section deals with implementation of &linux; emulation layer in
-      &os; operating system.  It first describes the machine dependent part
-      talking about how and where interaction between userland and kernel is
-      implemented.  It talks about syscalls, signals, ptrace, traps, stack
-      fixup.  This part discusses i386 but it is written generally so other
-      architectures should not differ very much.  The next part is the machine
-      independent part of the Linuxulator.  This section only covers i386 and ELF
+    <para>This section deals with implementation of &linux; emulation
+      layer in &os; operating system.  It first describes the machine
+      dependent part talking about how and where interaction between
+      userland and kernel is implemented.  It talks about syscalls,
+      signals, ptrace, traps, stack fixup.  This part discusses i386
+      but it is written generally so other architectures should not
+      differ very much.  The next part is the machine independent part
+      of the Linuxulator.  This section only covers i386 and ELF
       handling.  A.OUT is obsolete and untested.</para>
 
     <sect2 xml:id="syscall-handling">
       <title>Syscall handling</title>
 
       <para>Syscall handling is mostly written in
-	<filename>linux_sysvec.c</filename>, which covers most of the routines
-	pointed out in the <literal>sysentvec</literal> structure.  When a
-	&linux; process running on &os; issues a syscall, the general syscall
-	routine calls linux prepsyscall routine for the &linux; ABI.</para>
+	<filename>linux_sysvec.c</filename>, which covers most of the
+	routines pointed out in the <literal>sysentvec</literal>
+	structure.  When a &linux; process running on &os; issues a
+	syscall, the general syscall routine calls linux prepsyscall
+	routine for the &linux; ABI.</para>
 
       <sect3 xml:id="linux-prepsyscall">
 	<title>&linux; prepsyscall</title>
 
-	<para>&linux; passes arguments to syscalls via registers (that is why
-	  it is limited to 6 parameters on i386) while &os; uses the stack.
-	  The &linux; prepsyscall routine must copy parameters from registers
-	  to the stack.  The order of the registers is:
-	  <varname>%ebx</varname>, <varname>%ecx</varname>,
-	  <varname>%edx</varname>, <varname>%esi</varname>,
-	  <varname>%edi</varname>, <varname>%ebp</varname>.  The catch is that
-	  this is true for only <emphasis>most</emphasis> of the syscalls.
-	  Some (most notably <function>clone</function>) uses a different
-	  order but it is luckily easy to fix by inserting a dummy parameter
+	<para>&linux; passes arguments to syscalls via registers (that
+	  is why it is limited to 6 parameters on i386) while &os;
+	  uses the stack.  The &linux; prepsyscall routine must copy
+	  parameters from registers to the stack.  The order of the
+	  registers is: <varname>%ebx</varname>,
+	  <varname>%ecx</varname>, <varname>%edx</varname>,
+	  <varname>%esi</varname>, <varname>%edi</varname>,
+	  <varname>%ebp</varname>.  The catch is that this is true for
+	  only <emphasis>most</emphasis> of the syscalls.  Some (most
+	  notably <function>clone</function>) uses a different order
+	  but it is luckily easy to fix by inserting a dummy parameter
 	  in the <function>linux_clone</function> prototype.</para>
       </sect3>
 
       <sect3 xml:id="syscall-writing">
 	<title>Syscall writing</title>
 
-	<para>Every syscall implemented in the Linuxulator must have its
-	  prototype with various flags in <filename>syscalls.master</filename>.
-	  The form of the file is:</para>
+	<para>Every syscall implemented in the Linuxulator must have
+	  its prototype with various flags in
+	  <filename>syscalls.master</filename>.  The form of the file
+	  is:</para>
 
 	<programlisting>...
 	AUE_FORK STD		{ int linux_fork(void); }
@@ -1192,26 +1303,28 @@
 	AUE_CLOSE NOPROTO	{ int close(int fd); }
 ...</programlisting>
 
-	<para>The first column represents the syscall number.  The second
-	  column is for auditing support.  The third column represents the
-	  syscall type.  It is either <literal>STD</literal>,
-	  <literal>OBSOL</literal>, <literal>NOPROTO</literal> and
-	  <literal>UNIMPL</literal>.  <literal>STD</literal> is a standard
-	  syscall with full prototype and implementation.
-	  <literal>OBSOL</literal> is obsolete and defines just the prototype.
-	  <literal>NOPROTO</literal> means that the syscall is implemented
-	  elsewhere so do not prepend ABI prefix, etc.
+	<para>The first column represents the syscall number.  The
+	  second column is for auditing support.  The third column
+	  represents the syscall type.  It is either
+	  <literal>STD</literal>, <literal>OBSOL</literal>,
+	  <literal>NOPROTO</literal> and <literal>UNIMPL</literal>.
+	  <literal>STD</literal> is a standard syscall with full
+	  prototype and implementation.  <literal>OBSOL</literal> is
+	  obsolete and defines just the prototype.
+	  <literal>NOPROTO</literal> means that the syscall is
+	  implemented elsewhere so do not prepend ABI prefix, etc.
 	  <literal>UNIMPL</literal> means that the syscall will be
-	  substituted with the <function>nosys</function> syscall
-	  (a syscall just printing out a message about the syscall not being
-	  implemented and returning <literal>ENOSYS</literal>).</para>
+	  substituted with the <function>nosys</function> syscall (a
+	  syscall just printing out a message about the syscall not
+	  being implemented and returning
+	  <literal>ENOSYS</literal>).</para>
 
-	<para>From <filename>syscalls.master</filename> a script generates
-	  three files: <filename>linux_syscall.h</filename>,
+	<para>From <filename>syscalls.master</filename> a script
+	  generates three files: <filename>linux_syscall.h</filename>,
 	  <filename>linux_proto.h</filename> and
 	  <filename>linux_sysent.c</filename>.  The
-	  <filename>linux_syscall.h</filename> contains definitions of syscall
-	  names and their numerical value, e.g.:</para>
+	  <filename>linux_syscall.h</filename> contains definitions of
+	  syscall names and their numerical value, e.g.:</para>
 
 	<programlisting>...
 #define LINUX_SYS_linux_fork 2
@@ -1219,130 +1332,142 @@
 #define LINUX_SYS_close 6
 ...</programlisting>
 
-	<para>The <filename>linux_proto.h</filename> contains structure
-	  definitions of arguments to every syscall, e.g.:</para>
+	<para>The <filename>linux_proto.h</filename> contains
+	  structure definitions of arguments to every syscall,
+	  e.g.:</para>
 
 	<programlisting>struct linux_fork_args {
   register_t dummy;
 };</programlisting>
 
-	<para>And finally, <filename>linux_sysent.c</filename> contains
-	  structure describing the system entry table, used to actually
-	  dispatch a syscall, e.g.:</para>
+	<para>And finally, <filename>linux_sysent.c</filename>
+	  contains structure describing the system entry table, used
+	  to actually dispatch a syscall, e.g.:</para>
 
 	<programlisting>{ 0, (sy_call_t *)linux_fork, AUE_FORK, NULL, 0, 0 }, /* 2 = linux_fork */
 { AS(close_args), (sy_call_t *)close, AUE_CLOSE, NULL, 0, 0 }, /* 6 = close */</programlisting>
 
-	<para>As you can see <function>linux_fork</function> is implemented
-	  in Linuxulator itself so the definition is of <literal>STD</literal>
-	  type and has no argument, which is exhibited by the dummy argument
-	  structure.  On the other hand <function>close</function> is just an
-	  alias for real &os; &man.close.2; so it has no linux arguments
-	  structure associated and in the system entry table it is not prefixed
-	  with linux as it calls the real &man.close.2; in the kernel.</para>
+	<para>As you can see <function>linux_fork</function> is
+	  implemented in Linuxulator itself so the definition is of
+	  <literal>STD</literal> type and has no argument, which is
+	  exhibited by the dummy argument structure.  On the other
+	  hand <function>close</function> is just an alias for real
+	  &os; &man.close.2; so it has no linux arguments structure
+	  associated and in the system entry table it is not prefixed
+	  with linux as it calls the real &man.close.2; in the
+	  kernel.</para>
       </sect3>
 
       <sect3 xml:id="dummy-syscalls">
 	<title>Dummy syscalls</title>
 
-	<para>The &linux; emulation layer is not complete, as some syscalls are
-	  not implemented properly and some are not implemented at all.  The
-	  emulation layer employs a facility to mark unimplemented syscalls
-	  with the <literal>DUMMY</literal> macro.  These dummy definitions
+	<para>The &linux; emulation layer is not complete, as some
+	  syscalls are not implemented properly and some are not
+	  implemented at all.  The emulation layer employs a facility
+	  to mark unimplemented syscalls with the
+	  <literal>DUMMY</literal> macro.  These dummy definitions
 	  reside in <filename>linux_dummy.c</filename> in a form of
-	  <literal>DUMMY(syscall);</literal>, which is then translated to
-	  various syscall auxiliary files and the implementation consists
-	  of printing a message saying that this syscall is not implemented.
-	  The <literal>UNIMPL</literal> prototype is not used because we want
-	  to be able to identify the name of the syscall that was called in
-	  order to know what syscalls are more important to implement.</para>
+	  <literal>DUMMY(syscall);</literal>, which is then translated
+	  to various syscall auxiliary files and the implementation
+	  consists of printing a message saying that this syscall is
+	  not implemented.  The <literal>UNIMPL</literal> prototype is
+	  not used because we want to be able to identify the name of
+	  the syscall that was called in order to know what syscalls
+	  are more important to implement.</para>
       </sect3>
     </sect2>
 
     <sect2 xml:id="signal-handling">
       <title>Signal handling</title>
 
-      <para>Signal handling is done generally in the &os; kernel for all
-	binary compatibilities with a call to a compat-dependent layer.
-	&linux; compatibility layer defines
-	<function>linux_sendsig</function> routine for this purpose.</para>
+      <para>Signal handling is done generally in the &os; kernel for
+	all binary compatibilities with a call to a compat-dependent
+	layer.  &linux; compatibility layer defines
+	<function>linux_sendsig</function> routine for this
+	purpose.</para>
 
       <sect3 xml:id="linux-sendsig">
 	<title>&linux; sendsig</title>
 
-	<para>This routine first checks whether the signal has been installed
-	  with a <literal>SA_SIGINFO</literal> in which case it calls
-	  <function>linux_rt_sendsig</function> routine instead.  Furthermore,
-	   it allocates (or reuses an already existing) signal handle context,
-	  then it builds a list of arguments for the signal handler.  It
-	  translates the signal number based on the signal translation table,
-	  assigns a handler, translates sigset.  Then it saves context for the
-	  <function>sigreturn</function> routine (various registers, translated
-	  trap number and signal mask).  Finally, it copies out the signal
-	  context to the userspace and prepares context for the actual
-	  signal handler to run.</para>
+	<para>This routine first checks whether the signal has been
+	  installed with a <literal>SA_SIGINFO</literal> in which case
+	  it calls <function>linux_rt_sendsig</function> routine
+	  instead.  Furthermore, it allocates (or reuses an already
+	  existing) signal handle context, then it builds a list of
+	  arguments for the signal handler.  It translates the signal
+	  number based on the signal translation table, assigns a
+	  handler, translates sigset.  Then it saves context for the
+	  <function>sigreturn</function> routine (various registers,
+	  translated trap number and signal mask).  Finally, it copies
+	  out the signal context to the userspace and prepares context
+	  for the actual signal handler to run.</para>
       </sect3>
 
       <sect3 xml:id="linux-rt-sendsig">
 	<title>linux_rt_sendsig</title>
 
-	<para>This routine is similar to <function>linux_sendsig</function>
-	  just the signal context preparation is different.  It adds
-	  <literal>siginfo</literal>, <literal>ucontext</literal>, and some
-	  &posix; parts.  It might be worth considering whether those two
-	  functions could not be merged with a benefit of less code duplication
-	  and possibly even faster execution.</para>
+	<para>This routine is similar to
+	  <function>linux_sendsig</function> just the signal context
+	  preparation is different.  It adds
+	  <literal>siginfo</literal>, <literal>ucontext</literal>, and
+	  some &posix; parts.  It might be worth considering whether
+	  those two functions could not be merged with a benefit of
+	  less code duplication and possibly even faster
+	  execution.</para>
       </sect3>
 
       <sect3 xml:id="linux-sigreturn">
 	<title>linux_sigreturn</title>
 
-	<para>This syscall is used for return from the signal handler.  It does
-	  some security checks and restores the original process context.  It
-	  also unmasks the signal in process signal mask.</para>
+	<para>This syscall is used for return from the signal handler.
+	  It does some security checks and restores the original
+	  process context.  It also unmasks the signal in process
+	  signal mask.</para>
       </sect3>
     </sect2>
 
     <sect2 xml:id="ptrace">
       <title>Ptrace</title>
 
-      <para>Many &unix; derivates implement the &man.ptrace.2; syscall in order
-	to allow various tracking and debugging features.  This facility
-	enables the tracing process to obtain various information about the
-	traced process, like register dumps, any memory from the process
-	address space, etc. and also to trace the process like in stepping an
-	instruction or between system entries (syscalls and traps).
-	&man.ptrace.2; also lets you set various information in the traced
-	process (registers etc.).  &man.ptrace.2; is a &unix;-wide standard
-	implemented in most &unix;es around the world.</para>
+      <para>Many &unix; derivates implement the &man.ptrace.2; syscall
+	in order to allow various tracking and debugging features.
+	This facility enables the tracing process to obtain various
+	information about the traced process, like register dumps, any
+	memory from the process address space, etc. and also to trace
+	the process like in stepping an instruction or between system
+	entries (syscalls and traps).  &man.ptrace.2; also lets you
+	set various information in the traced process (registers
+	etc.).  &man.ptrace.2; is a &unix;-wide standard implemented
+	in most &unix;es around the world.</para>
 
-      <para>&linux; emulation in &os; implements the &man.ptrace.2; facility
-	in <filename>linux_ptrace.c</filename>.  The routines for converting
-	registers between &linux; and &os; and the actual &man.ptrace.2;
-	syscall emulation syscall.  The syscall is a long switch block that
-	implements its counterpart in &os; for every &man.ptrace.2; command.
-	The &man.ptrace.2; commands are mostly equal between &linux; and &os;
-	so usually just a small modification is needed.  For example,
-	<literal>PT_GETREGS</literal> in &linux; operates on direct data while
-	&os; uses a pointer to the data so after performing a (native)
-	&man.ptrace.2; syscall, a copyout must be done to preserve &linux;
-	semantics.</para>
+      <para>&linux; emulation in &os; implements the &man.ptrace.2;
+	facility in <filename>linux_ptrace.c</filename>.  The routines
+	for converting registers between &linux; and &os; and the
+	actual &man.ptrace.2; syscall emulation syscall.  The syscall
+	is a long switch block that implements its counterpart in &os;
+	for every &man.ptrace.2; command.  The &man.ptrace.2; commands
+	are mostly equal between &linux; and &os; so usually just a
+	small modification is needed.  For example,
+	<literal>PT_GETREGS</literal> in &linux; operates on direct
+	data while &os; uses a pointer to the data so after performing
+	a (native) &man.ptrace.2; syscall, a copyout must be done to
+	preserve &linux; semantics.</para>
 
-      <para>The &man.ptrace.2; implementation in Linuxulator has some known
-	weaknesses.  There have been panics seen when using
-	<command>strace</command> (which is a &man.ptrace.2; consumer) in the
-	Linuxulator environment.  Also <literal>PT_SYSCALL</literal> is not
-	implemented.</para>
+      <para>The &man.ptrace.2; implementation in Linuxulator has some
+	known weaknesses.  There have been panics seen when using
+	<command>strace</command> (which is a &man.ptrace.2; consumer)
+	in the Linuxulator environment.  Also
+	<literal>PT_SYSCALL</literal> is not implemented.</para>
     </sect2>
 
     <sect2 xml:id="traps">
       <title>Traps</title>
 
-      <para>Whenever a &linux; process running in the emulation layer traps
-	the trap itself is handled transparently with the only exception of
-	the trap translation.  &linux; and &os; differs in opinion on what a
-	trap is so this is dealt with here. The code is actually very
-	short:</para>
+      <para>Whenever a &linux; process running in the emulation layer
+	traps the trap itself is handled transparently with the only
+	exception of the trap translation.  &linux; and &os; differs
+	in opinion on what a trap is so this is dealt with here.  The
+	code is actually very short:</para>
 
       <programlisting>static int
 translate_traps(int signal, int trap_code)
@@ -1368,12 +1493,13 @@ translate_traps(int signal, int trap_code)
     <sect2 xml:id="stack-fixup">
       <title>Stack fixup</title>
 
-      <para>The RTLD run-time link-editor expects so called AUX tags on stack
-	during an <function>execve</function> so a fixup must be done to ensure
-	this.  Of course, every RTLD system is different so the emulation layer
-	must provide its own stack fixup routine to do this.  So does
-	Linuxulator.  The <function>elf_linux_fixup</function> simply copies
-	out AUX tags to the stack and adjusts the stack of the user space
+      <para>The RTLD run-time link-editor expects so called AUX tags
+	on stack during an <function>execve</function> so a fixup must
+	be done to ensure this.  Of course, every RTLD system is
+	different so the emulation layer must provide its own stack
+	fixup routine to do this.  So does Linuxulator.  The
+	<function>elf_linux_fixup</function> simply copies out AUX
+	tags to the stack and adjusts the stack of the user space
 	process to point right after those tags.  So RTLD works in a
 	smart way.</para>
     </sect2>
@@ -1381,15 +1507,17 @@ translate_traps(int signal, int trap_code)
     <sect2 xml:id="aout-support">
       <title>A.OUT support</title>
 
-      <para>The &linux; emulation layer on i386 also supports &linux; A.OUT
-	binaries.  Pretty much everything described in the previous sections
-	must be implemented for A.OUT support (beside traps translation and
-	signals sending).  The support for A.OUT binaries is no longer
-	maintained, especially the 2.6 emulation does not work with it but
-	this does not cause any problem, as the linux-base in ports probably
-	do not support A.OUT binaries at all.  This support will probably be
-	removed in future.  Most of the stuff necessary for loading &linux;
-	A.OUT binaries is in <filename>imgact_linux.c</filename> file.</para>
+      <para>The &linux; emulation layer on i386 also supports &linux;
+	A.OUT binaries.  Pretty much everything described in the
+	previous sections must be implemented for A.OUT support
+	(beside traps translation and signals sending).  The support
+	for A.OUT binaries is no longer maintained, especially the 2.6
+	emulation does not work with it but this does not cause any
+	problem, as the linux-base in ports probably do not support
+	A.OUT binaries at all.  This support will probably be removed
+	in future.  Most of the stuff necessary for loading &linux;
+	A.OUT binaries is in <filename>imgact_linux.c</filename>
+	file.</para>
     </sect2>
   </sect1>
 
@@ -1397,94 +1525,103 @@ translate_traps(int signal, int trap_code)
     <title>&linux; emulation layer -MI part</title>
 
     <para>This section talks about machine independent part of the
-      Linuxulator.  It covers the emulation infrastructure needed for &linux;
-      2.6 emulation, the thread local storage (TLS) implementation (on i386)
-      and futexes.  Then we talk briefly about some syscalls.</para>
+      Linuxulator.  It covers the emulation infrastructure needed for
+      &linux; 2.6 emulation, the thread local storage (TLS)
+      implementation (on i386) and futexes.  Then we talk briefly
+      about some syscalls.</para>
 
     <sect2 xml:id="nptl-desc">
       <title>Description of NPTL</title>
 
-      <para>One of the major areas of progress in development of &linux; 2.6
-	was threading.  Prior to 2.6, the &linux; threading support was
-	implemented in the <application>linuxthreads</application> library.
-	The library was a partial implementation of &posix; threading.  The
-	threading was implemented using separate processes for each thread
-	using the <function>clone</function> syscall to let them share the
-	address space (and other things).  The main weaknesses of this
-	approach was that every thread had a different PID, signal handling
-	was broken (from the pthreads perspective), etc.  Also the performance
-	was not very good (use of <literal>SIGUSR</literal> signals for
-	threads synchronization, kernel resource consumption, etc.) so to
-	overcome these problems a new threading system was developed and
-	named NPTL.</para>
+      <para>One of the major areas of progress in development of
+	&linux; 2.6 was threading.  Prior to 2.6, the &linux;
+	threading support was implemented in the
+	<application>linuxthreads</application> library.  The library
+	was a partial implementation of &posix; threading.  The
+	threading was implemented using separate processes for each
+	thread using the <function>clone</function> syscall to let
+	them share the address space (and other things).  The main
+	weaknesses of this approach was that every thread had a
+	different PID, signal handling was broken (from the pthreads
+	perspective), etc.  Also the performance was not very good
+	(use of <literal>SIGUSR</literal> signals for threads
+	synchronization, kernel resource consumption, etc.) so to
+	overcome these problems a new threading system was developed
+	and named NPTL.</para>
 
-      <para>The NPTL library focused on two things but a third thing came
-	along so it is usually considered a part of NPTL.  Those two things
-	were embedding of threads into a process structure and futexes.  The
-	additional third thing was TLS, which is not directly required by NPTL
-	but the whole NPTL userland library depends on it.  Those improvements
-	yielded in much improved performance and standards conformance.  NPTL
-	is a standard threading library in &linux; systems these days.</para>
+      <para>The NPTL library focused on two things but a third thing
+	came along so it is usually considered a part of NPTL.  Those
+	two things were embedding of threads into a process structure
+	and futexes.  The additional third thing was TLS, which is not
+	directly required by NPTL but the whole NPTL userland library
+	depends on it.  Those improvements yielded in much improved
+	performance and standards conformance.  NPTL is a standard
+	threading library in &linux; systems these days.</para>
 
-      <para>The &os; Linuxulator implementation approaches the NPTL in three
-	main areas.  The TLS, futexes and PID mangling, which is meant to
-	simulate the &linux; threads.  Further sections describe each of these
-	areas.</para>
+      <para>The &os; Linuxulator implementation approaches the NPTL in
+	three main areas.  The TLS, futexes and PID mangling, which is
+	meant to simulate the &linux; threads.  Further sections
+	describe each of these areas.</para>
     </sect2>
 
     <sect2 xml:id="linux26-emu">
       <title>&linux; 2.6 emulation infrastructure</title>
 
-      <para>These sections deal with the way &linux; threads are managed and
-	how we simulate that in &os;.</para>
+      <para>These sections deal with the way &linux; threads are
+	managed and how we simulate that in &os;.</para>
 
       <sect3 xml:id="linux26-runtime">
 	<title>Runtime determining of 2.6 emulation</title>
 
-	<para>The &linux; emulation layer in &os; supports runtime setting of
-	  the emulated version.  This is done via &man.sysctl.8;, namely
-	  <literal>compat.linux.osrelease</literal>.
-	  Setting this &man.sysctl.8; affects runtime
-	  behavior of the emulation layer.  When set to 2.6.x it sets the
-	  value of <literal>linux_use_linux26</literal> while setting to
-	  something else keeps it unset.  This variable (plus per-prison
-	  variables of the very same kind) determines whether 2.6
-	  infrastructure (mainly PID mangling) is used in the code or not.
-	  The version setting is done system-wide and this affects all &linux;
-	  processes.  The &man.sysctl.8; should not be changed when running any
-	  &linux; binary as it might harm things.</para>
+	<para>The &linux; emulation layer in &os; supports runtime
+	  setting of the emulated version.  This is done via
+	  &man.sysctl.8;, namely
+	  <literal>compat.linux.osrelease</literal>.  Setting this
+	  &man.sysctl.8; affects runtime behavior of the emulation
+	  layer.  When set to 2.6.x it sets the value of
+	  <literal>linux_use_linux26</literal> while setting to
+	  something else keeps it unset.  This variable (plus
+	  per-prison variables of the very same kind) determines
+	  whether 2.6 infrastructure (mainly PID mangling) is used in
+	  the code or not.  The version setting is done system-wide
+	  and this affects all &linux; processes.  The &man.sysctl.8;
+	  should not be changed when running any &linux; binary as it
+	  might harm things.</para>
       </sect3>
 
       <sect3 xml:id="linux-proc-thread">
 	<title>&linux; processes and thread identifiers</title>
 
-	<para>The semantics of &linux; threading are a little confusing and
-	  uses entirely different nomenclature to &os;.  A process in
-	  &linux; consists of a <literal>struct task</literal> embedding two
-	  identifier fields - PID and TGID.  PID is <emphasis>not</emphasis>
-	  a process ID but it is a thread ID.  The TGID identifies a thread
-	  group in other words a process.  For single-threaded process the
-	  PID equals the TGID.</para>
+	<para>The semantics of &linux; threading are a little
+	  confusing and uses entirely different nomenclature to &os;.
+	  A process in &linux; consists of a <literal>struct
+	    task</literal> embedding two identifier fields - PID and
+	  TGID.  PID is <emphasis>not</emphasis> a process ID but it
+	  is a thread ID.  The TGID identifies a thread group in other
+	  words a process.  For single-threaded process the PID equals
+	  the TGID.</para>
 
-	<para>The thread in NPTL is just an ordinary process that happens to
-	  have TGID not equal to PID and have a group leader not equal to
-	  itself (and shared VM etc. of course).  Everything else happens in
-	  the same way as to an ordinary process.  There is no separation of
-	  a shared status to some external structure like in &os;.  This
-	  creates some duplication of information and possible data
-	  inconsistency.  The &linux; kernel seems to use task -&gt; group
-	  information in some places and task information elsewhere and it is
+	<para>The thread in NPTL is just an ordinary process that
+	  happens to have TGID not equal to PID and have a group
+	  leader not equal to itself (and shared VM etc. of course).
+	  Everything else happens in the same way as to an ordinary
+	  process.  There is no separation of a shared status to some
+	  external structure like in &os;.  This creates some
+	  duplication of information and possible data inconsistency.
+	  The &linux; kernel seems to use task -&gt; group information
+	  in some places and task information elsewhere and it is
 	  really not very consistent and looks error-prone.</para>
 
 	<para>Every NPTL thread is created by a call to the
-	  <function>clone</function> syscall with a specific set of flags
-	  (more in the next subsection).  The NPTL implements strict
-	  1:1 threading.</para>
+	  <function>clone</function> syscall with a specific set of
+	  flags (more in the next subsection).  The NPTL implements
+	  strict 1:1 threading.</para>
 
-	<para>In &os; we emulate NPTL threads with ordinary &os; processes that
-	  share VM space, etc. and the PID gymnastic is just mimicked in the
-	  emulation specific structure attached to the process.  The
-	  structure attached to the process looks like:</para>
+	<para>In &os; we emulate NPTL threads with ordinary &os;
+	  processes that share VM space, etc. and the PID gymnastic is
+	  just mimicked in the emulation specific structure attached
+	  to the process.  The structure attached to the process looks
+	  like:</para>
 
 	<programlisting>struct linux_emuldata {
   pid_t pid;
@@ -1499,16 +1636,18 @@ translate_traps(int signal, int trap_code)
   LIST_ENTRY(linux_emuldata) threads; /* list of linux threads */
 };</programlisting>
 
-	<para>The PID is used to identify the &os; process that attaches this
-	  structure.  The <function>child_se_tid</function> and
-	  <function>child_clear_tid</function> are used for TID address
-	  copyout when a process exits and is created.  The
-	  <varname>shared</varname> pointer points to a structure shared
-	  among threads.  The <varname>pdeath_signal</varname> variable
-	  identifies the parent death signal  and the
-	  <varname>threads</varname> pointer is used to link this structure
-	  to the list of threads.  The <literal>linux_emuldata_shared</literal>
-	  structure looks like:</para>
+	<para>The PID is used to identify the &os; process that
+	  attaches this structure.  The
+	  <function>child_se_tid</function> and
+	  <function>child_clear_tid</function> are used for TID
+	  address copyout when a process exits and is created.  The
+	  <varname>shared</varname> pointer points to a structure
+	  shared among threads.  The <varname>pdeath_signal</varname>
+	  variable identifies the parent death signal  and the
+	  <varname>threads</varname> pointer is used to link this
+	  structure to the list of threads.  The
+	  <literal>linux_emuldata_shared</literal> structure looks
+	  like:</para>
 
 	<programlisting>struct linux_emuldata_shared {
 
@@ -1519,121 +1658,135 @@ translate_traps(int signal, int trap_code)
   LIST_HEAD(, linux_emuldata) threads; /* head of list of linux threads */
 };</programlisting>
 
-	<para>The <varname>refs</varname> is a reference counter being used
-	  to determine when we can free the structure to avoid memory leaks.
-	  The <varname>group_pid</varname> is to identify PID ( = TGID) of the
-	  whole process ( = thread group).  The <varname>threads</varname>
-	  pointer is the head of the list of threads in the process.</para>
+	<para>The <varname>refs</varname> is a reference counter being
+	  used to determine when we can free the structure to avoid
+	  memory leaks.  The <varname>group_pid</varname> is to
+	  identify PID ( = TGID) of the whole process ( = thread
+	  group).  The <varname>threads</varname> pointer is the head
+	  of the list of threads in the process.</para>
 
-	<para>The <literal>linux_emuldata</literal> structure can be obtained
-	  from the process using <function>em_find</function>.  The prototype
-	  of the function is:</para>
+	<para>The <literal>linux_emuldata</literal> structure can be
+	  obtained from the process using
+	  <function>em_find</function>.  The prototype of the function
+	  is:</para>
 
 	<programlisting>struct linux_emuldata *em_find(struct proc *, int locked);</programlisting>
 
-	<para>Here, <varname>proc</varname> is the process we want the emuldata
-	  structure from and the locked parameter determines whether we want to
-	  lock or not.  The accepted values are <literal>EMUL_DOLOCK</literal>
-	  and <literal>EMUL_DOUNLOCK</literal>.  More about locking
+	<para>Here, <varname>proc</varname> is the process we want the
+	  emuldata structure from and the locked parameter determines
+	  whether we want to lock or not.  The accepted values are
+	  <literal>EMUL_DOLOCK</literal> and
+	  <literal>EMUL_DOUNLOCK</literal>.  More about locking
 	  later.</para>
       </sect3>
 
       <sect3 xml:id="pid-mangling">
 	<title>PID mangling</title>
 
-	<para>Because of the described different view knowing what a process
-	  ID and thread ID is between &os; and &linux; we have to translate
-	  the view somehow.  We do it by PID mangling.  This means that we
-	  fake what a PID (=TGID) and TID (=PID) is between kernel and
-	  userland.  The rule of thumb is that in kernel (in Linuxulator)
-	  PID = PID and TGID = shared -&gt; group pid and to userland we
-	  present <literal>PID = shared -&gt; group_pid</literal> and
-	  <literal>TID = proc -&gt; p_pid</literal>.
-	  The PID member of <literal>linux_emuldata structure</literal> is
-	  a &os; PID.</para>
+	<para>Because of the described different view knowing what a
+	  process ID and thread ID is between &os; and &linux; we have
+	  to translate the view somehow.  We do it by PID mangling.
+	  This means that we fake what a PID (=TGID) and TID (=PID) is
+	  between kernel and userland.  The rule of thumb is that in
+	  kernel (in Linuxulator) PID = PID and TGID = shared -&gt;
+	  group pid and to userland we present <literal>PID = shared
+	    -&gt; group_pid</literal> and <literal>TID = proc -&gt;
+	    p_pid</literal>.  The PID member of
+	  <literal>linux_emuldata structure</literal> is a &os;
+	  PID.</para>
 
-	<para>The above affects mainly getpid, getppid, gettid syscalls.  Where
-	  we use PID/TGID respectively.  In copyout of TIDs in
-	  <function>child_clear_tid</function> and
-	  <function>child_set_tid</function> we copy out &os; PID.</para>
+	<para>The above affects mainly getpid, getppid, gettid
+	  syscalls.  Where we use PID/TGID respectively.  In copyout
+	  of TIDs in <function>child_clear_tid</function> and
+	  <function>child_set_tid</function> we copy out &os;
+	  PID.</para>
       </sect3>
 
       <sect3 xml:id="clone-syscall">
 	<title>Clone syscall</title>
 
-	<para>The <function>clone</function> syscall is the way threads are
-	  created in &linux;.  The syscall prototype looks like this:</para>
+	<para>The <function>clone</function> syscall is the way
+	  threads are created in &linux;.  The syscall prototype looks
+	  like this:</para>
 
 	<programlisting>int linux_clone(l_int flags, void *stack, void *parent_tidptr, int dummy,
 void * child_tidptr);</programlisting>
 
-	<para>The <varname>flags</varname> parameter tells the syscall how
-	  exactly the processes should be cloned.  As described above, &linux;
-	  can create processes sharing various things independently, for
-	  example two processes can share file descriptors but not VM, etc.
-	  Last byte of the <varname>flags</varname> parameter is the exit
-	  signal of the newly created process.  The <varname>stack</varname>
-	  parameter if non-<literal>NULL</literal> tells, where the thread
-	  stack is and if it is <literal>NULL</literal> we are supposed to
-	  copy-on-write the calling process stack (i.e. do what normal
-	  &man.fork.2; routine does).  The <varname>parent_tidptr</varname>
-	  parameter is used as an address for copying out process PID (i.e.
-	  thread id) once the process is sufficiently instantiated but is
-	  not runnable yet.  The <varname>dummy</varname> parameter is here
-	  because of the very strange calling convention of this syscall on
-	  i386.  It uses the registers directly and does not let the compiler
-	  do it what results in the need of a dummy syscall.  The
-	  <varname>child_tidptr</varname> parameter is used as an address
-	  for copying out PID once the process has finished forking and when
-	  the process exits.</para>
+	<para>The <varname>flags</varname> parameter tells the syscall
+	  how exactly the processes should be cloned.  As described
+	  above, &linux; can create processes sharing various things
+	  independently, for example two processes can share file
+	  descriptors but not VM, etc.  Last byte of the
+	  <varname>flags</varname> parameter is the exit signal of the
+	  newly created process.  The <varname>stack</varname>
+	  parameter if non-<literal>NULL</literal> tells, where the
+	  thread stack is and if it is <literal>NULL</literal> we are
+	  supposed to copy-on-write the calling process stack (i.e. do
+	  what normal &man.fork.2; routine does).  The
+	  <varname>parent_tidptr</varname> parameter is used as an
+	  address for copying out process PID (i.e.  thread id) once
+	  the process is sufficiently instantiated but is not runnable
+	  yet.  The <varname>dummy</varname> parameter is here because
+	  of the very strange calling convention of this syscall on
+	  i386.  It uses the registers directly and does not let the
+	  compiler do it what results in the need of a dummy syscall.
+	  The <varname>child_tidptr</varname> parameter is used as an
+	  address for copying out PID once the process has finished
+	  forking and when the process exits.</para>
 
-	<para>The syscall itself proceeds by setting corresponding flags
-	  depending on the flags passed in.  For example,
-	  <literal>CLONE_VM</literal> maps to RFMEM (sharing of VM), etc.
-	  The only nit here is <literal>CLONE_FS</literal> and
-	  <literal>CLONE_FILES</literal> because &os; does not allow setting
-	  this separately so we fake it by not setting RFFDG (copying of fd
-	  table and other fs information) if either of these is defined.  This
-	  does not cause any problems, because those flags are always set
-	  together.  After setting the flags the process is forked using
-	  the internal <function>fork1</function> routine, the process is
-	  instrumented not to be put on a run queue, i.e. not to be set
-	  runnable.  After the forking is done we possibly reparent the newly
-	  created process to emulate <literal>CLONE_PARENT</literal> semantics.
-	  Next part is creating the emulation data.  Threads in &linux; does
-	  not signal their parents so we set exit signal to be 0 to disable
-	  this.  After that setting of <varname>child_set_tid</varname> and
+	<para>The syscall itself proceeds by setting corresponding
+	  flags depending on the flags passed in.  For example,
+	  <literal>CLONE_VM</literal> maps to RFMEM (sharing of VM),
+	  etc.  The only nit here is <literal>CLONE_FS</literal> and
+	  <literal>CLONE_FILES</literal> because &os; does not allow
+	  setting this separately so we fake it by not setting RFFDG
+	  (copying of fd table and other fs information) if either of
+	  these is defined.  This does not cause any problems, because
+	  those flags are always set together.  After setting the
+	  flags the process is forked using the internal
+	  <function>fork1</function> routine, the process is
+	  instrumented not to be put on a run queue, i.e. not to be
+	  set runnable.  After the forking is done we possibly
+	  reparent the newly created process to emulate
+	  <literal>CLONE_PARENT</literal> semantics.  Next part is
+	  creating the emulation data.  Threads in &linux; does not
+	  signal their parents so we set exit signal to be 0 to
+	  disable this.  After that setting of
+	  <varname>child_set_tid</varname> and
 	  <varname>child_clear_tid</varname> is performed enabling the
-	  functionality later in the code.  At this point we copy out the PID
-	  to the address specified by <varname>parent_tidptr</varname>.  The
-	  setting of process stack is done by simply rewriting thread frame
-	  <varname>%esp</varname> register (<varname>%rsp</varname> on amd64).
-	  Next part is setting up TLS for the newly created process.  After
-	  this &man.vfork.2; semantics might be emulated and finally the newly
-	  created process is put on a run queue and copying out its PID to the
-	  parent process via <function>clone</function> return value is
-	  done.</para>
+	  functionality later in the code.  At this point we copy out
+	  the PID to the address specified by
+	  <varname>parent_tidptr</varname>.  The setting of process
+	  stack is done by simply rewriting thread frame
+	  <varname>%esp</varname> register (<varname>%rsp</varname> on
+	  amd64).  Next part is setting up TLS for the newly created
+	  process.  After this &man.vfork.2; semantics might be
+	  emulated and finally the newly created process is put on a
+	  run queue and copying out its PID to the parent process via
+	  <function>clone</function> return value is done.</para>
 
-	<para>The <function>clone</function> syscall is able and in fact is
-	  used for emulating classic &man.fork.2; and &man.vfork.2; syscalls.
-	  Newer glibc in a case of 2.6 kernel uses <function>clone</function>
-	  to implement &man.fork.2; and &man.vfork.2; syscalls.</para>
+	<para>The <function>clone</function> syscall is able and in
+	  fact is used for emulating classic &man.fork.2; and
+	  &man.vfork.2; syscalls.  Newer glibc in a case of 2.6 kernel
+	  uses <function>clone</function> to implement &man.fork.2;
+	  and &man.vfork.2; syscalls.</para>
       </sect3>
 
       <sect3 xml:id="locking">
 	<title>Locking</title>
 
-	<para>The locking is implemented to be per-subsystem because we do not
-	  expect a lot of contention on these.  There are two locks:
-	  <literal>emul_lock</literal> used to protect manipulating of
-	  <literal>linux_emuldata</literal> and
+	<para>The locking is implemented to be per-subsystem because
+	  we do not expect a lot of contention on these.  There are
+	  two locks: <literal>emul_lock</literal> used to protect
+	  manipulating of <literal>linux_emuldata</literal> and
 	  <literal>emul_shared_lock</literal> used to manipulate
 	  <literal>linux_emuldata_shared</literal>.  The
-	  <literal>emul_lock</literal> is a nonsleepable blocking mutex while
-	  <literal>emul_shared_lock</literal> is a sleepable blocking
-	  <literal>sx_lock</literal>.  Because of the per-subsystem locking we
-	  can coalesce some locks and that is why the em find offers the
-	  non-locking access.</para>
+	  <literal>emul_lock</literal> is a nonsleepable blocking
+	  mutex while <literal>emul_shared_lock</literal> is a
+	  sleepable blocking <literal>sx_lock</literal>.  Because of
+	  the per-subsystem locking we can coalesce some locks and
+	  that is why the em find offers the non-locking
+	  access.</para>
       </sect3>
     </sect2>
 
@@ -1646,39 +1799,45 @@ void * child_tidptr);</programlisting>
       <sect3 xml:id="trheading-intro">
 	<title>Introduction to threading</title>
 
-	<para>Threads in computer science are entities within a process that
-	  can be scheduled independently from each other.  The threads in the
-	  process share process wide data (file descriptors, etc.) but also
-	  have their own stack for their own data.  Sometimes there is a need
-	  for process-wide data specific to a given thread.  Imagine a name of
-	  the thread in execution or something like that.  The traditional
-	  &unix; threading API, <application>pthreads</application> provides
+	<para>Threads in computer science are entities within a
+	  process that can be scheduled independently from each other.
+	  The threads in the process share process wide data (file
+	  descriptors, etc.) but also have their own stack for their
+	  own data.  Sometimes there is a need for process-wide data
+	  specific to a given thread.  Imagine a name of the thread in
+	  execution or something like that.  The traditional &unix;
+	  threading API, <application>pthreads</application> provides
 	  a way to do it via &man.pthread.key.create.3;,
-	  &man.pthread.setspecific.3; and &man.pthread.getspecific.3; where a
-	  thread can create a key to the thread local data and using
-	  &man.pthread.getspecific.3; or &man.pthread.getspecific.3; to
-	  manipulate those data.  You can easily see that this is not the most
-	  comfortable way this could be accomplished.  So various producers of
-	  C/C++ compilers introduced a better way.  They defined a new modifier
-	  keyword thread that specifies that a variable is thread specific.  A
-	  new method of accessing such variables was developed as well (at
-	  least on i386).  The <application>pthreads</application> method tends
-	  to be implemented in userspace as a trivial lookup table.  The
-	  performance of such a solution is not very good.  So the new method
-	  uses (on i386) segment registers to address a segment, where TLS area
-	  is stored so the actual accessing of a thread variable is just
-	  appending the segment register to the address thus addressing via it.
-	  The segment registers are usually <varname>%gs</varname> and
-	  <varname>%fs</varname> acting like segment selectors.  Every thread
-	  has its own area where the thread local data are stored and the
-	  segment must be loaded on every context switch.  This method is very
-	  fast and used almost exclusively in the whole i386 &unix; world.
-	  Both &os; and &linux; implement this approach and it yields very good
-	  results.  The only drawback is the need to reload the segment on
-	  every context switch which can slowdown context switches.  &os; tries
-	  to avoid this overhead by using only 1 segment descriptor for this
-	  while &linux; uses 3.  Interesting thing is that almost nothing uses
-	  more than 1 descriptor (only <application>Wine</application> seems to
+	  &man.pthread.setspecific.3; and &man.pthread.getspecific.3;
+	  where a thread can create a key to the thread local data and
+	  using &man.pthread.getspecific.3; or
+	  &man.pthread.getspecific.3; to manipulate those data.  You
+	  can easily see that this is not the most comfortable way
+	  this could be accomplished.  So various producers of C/C++
+	  compilers introduced a better way.  They defined a new
+	  modifier keyword thread that specifies that a variable is
+	  thread specific.  A new method of accessing such variables
+	  was developed as well (at least on i386).  The
+	  <application>pthreads</application> method tends to be
+	  implemented in userspace as a trivial lookup table.  The
+	  performance of such a solution is not very good.  So the new
+	  method uses (on i386) segment registers to address a
+	  segment, where TLS area is stored so the actual accessing of
+	  a thread variable is just appending the segment register to
+	  the address thus addressing via it.  The segment registers
+	  are usually <varname>%gs</varname> and
+	  <varname>%fs</varname> acting like segment selectors.  Every
+	  thread has its own area where the thread local data are
+	  stored and the segment must be loaded on every context
+	  switch.  This method is very fast and used almost
+	  exclusively in the whole i386 &unix; world.  Both &os; and
+	  &linux; implement this approach and it yields very good
+	  results.  The only drawback is the need to reload the
+	  segment on every context switch which can slowdown context
+	  switches.  &os; tries to avoid this overhead by using only 1
+	  segment descriptor for this while &linux; uses 3.
+	  Interesting thing is that almost nothing uses more than 1
+	  descriptor (only <application>Wine</application> seems to
 	  use 2) so &linux; pays this unnecessary price for context
 	  switches.</para>
       </sect3>
@@ -1686,44 +1845,49 @@ void * child_tidptr);</programlisting>
       <sect3 xml:id="i386-segs">
 	<title>Segments on i386</title>
 
-	<para>The i386 architecture implements the so called segments.  A
-	  segment is a description of an area of memory.  The base address
-	  (bottom) of the memory area, the end of it (ceiling), type,
-	  protection, etc.  The memory described by a segment can be accessed
-	  using segment selector registers (<varname>%cs</varname>,
-	  <varname>%ds</varname>, <varname>%ss</varname>,
-	  <varname>%es</varname>, <varname>%fs</varname>,
-	  <varname>%gs</varname>).  For example let us suppose we have a
-	  segment which base address is 0x1234 and length and this code:</para>
+	<para>The i386 architecture implements the so called segments.
+	  A segment is a description of an area of memory.  The base
+	  address (bottom) of the memory area, the end of it
+	  (ceiling), type, protection, etc.  The memory described by a
+	  segment can be accessed using segment selector registers
+	  (<varname>%cs</varname>, <varname>%ds</varname>,
+	  <varname>%ss</varname>, <varname>%es</varname>,
+	  <varname>%fs</varname>, <varname>%gs</varname>).  For
+	  example let us suppose we have a segment which base address
+	  is 0x1234 and length and this code:</para>
 
 	<programlisting>mov %edx,%gs:0x10</programlisting>
 
-	<para>This will load the content of the <varname>%edx</varname>
-	  register into memory location 0x1244.  Some segment registers have
-	  a special use, for example <varname>%cs</varname> is used for code
-	  segment and <varname>%ss</varname> is used for stack segment but
-	  <varname>%fs</varname> and <varname>%gs</varname> are generally
-	  unused.  Segments are either stored in a global GDT table or in a
-	  local LDT table.  LDT is accessed via an entry in the GDT.  The
-	  LDT can store more types of segments.  LDT can be per process.
-	  Both tables define up to 8191 entries.</para>
+	<para>This will load the content of the
+	  <varname>%edx</varname> register into memory location
+	  0x1244.  Some segment registers have a special use, for
+	  example <varname>%cs</varname> is used for code segment and
+	  <varname>%ss</varname> is used for stack segment but
+	  <varname>%fs</varname> and <varname>%gs</varname> are
+	  generally unused.  Segments are either stored in a global
+	  GDT table or in a local LDT table.  LDT is accessed via an
+	  entry in the GDT.  The LDT can store more types of segments.
+	  LDT can be per process.  Both tables define up to 8191
+	  entries.</para>
       </sect3>
 
       <sect3 xml:id="linux-i386">
 	<title>Implementation on &linux; i386</title>
 
-	<para>There are two main ways of setting up TLS in &linux;.  It can be
-	  set when cloning a process using the <function>clone</function>
-	  syscall or it can call <function>set_thread_area</function>.  When a
-	  process passes <literal>CLONE_SETTLS</literal> flag to
-	  <function>clone</function>, the kernel expects the memory pointed to
-	  by the <varname>%esi</varname> register a &linux; user space
-	  representation of a segment, which gets translated to the machine
-	  representation of a segment and loaded into a GDT slot.  The
-	  GDT slot can be specified with a number or -1 can be used meaning
-	  that the system itself should choose the first free slot.  In
-	  practice, the vast majority of programs use only one TLS entry and
-	  does not care about the number of the entry.  We exploit this in the
+	<para>There are two main ways of setting up TLS in &linux;.
+	  It can be set when cloning a process using the
+	  <function>clone</function> syscall or it can call
+	  <function>set_thread_area</function>.  When a process passes
+	  <literal>CLONE_SETTLS</literal> flag to
+	  <function>clone</function>, the kernel expects the memory
+	  pointed to by the <varname>%esi</varname> register a &linux;
+	  user space representation of a segment, which gets
+	  translated to the machine representation of a segment and
+	  loaded into a GDT slot.  The GDT slot can be specified with
+	  a number or -1 can be used meaning that the system itself
+	  should choose the first free slot.  In practice, the vast
+	  majority of programs use only one TLS entry and does not
+	  care about the number of the entry.  We exploit this in the
 	  emulation and in fact depend on it.</para>
       </sect3>
 
@@ -1733,48 +1897,53 @@ void * child_tidptr);</programlisting>
 	<sect4 xml:id="tls-i386">
 	  <title>i386</title>
 
-	  <para>Loading of TLS for the current thread happens by calling
-	    <function>set_thread_area</function> while loading TLS for a
-	    second process in <function>clone</function> is done in the
-	    separate block in <function>clone</function>.  Those two functions
-	    are very similar.  The only difference being the actual loading of
-	    the GDT segment, which happens on the next context switch for the
-	    newly created process while <function>set_thread_area</function>
-	    must load this directly.  The code basically does this.  It copies
-	    the &linux; form segment descriptor from the userland.  The code
-	    checks for the number of the descriptor but because this differs
-	    between &os; and &linux; we fake it a little.  We only support
-	    indexes of 6, 3 and -1.  The 6 is genuine &linux; number, 3 is
-	    genuine &os; one and -1 means autoselection.  Then we set the
-	    descriptor number to constant 3 and copy out this to the
-	    userspace.  We rely on the userspace process using the number from
-	    the descriptor but this works most of the time (have never seen a
-	    case where this did not work) as the userspace process typically
-	    passes in 1.  Then we convert the descriptor from the &linux; form
-	    to a machine dependant form (i.e. operating system independent
-	    form) and copy this to the &os; defined segment descriptor.
-	    Finally we can load it.  We assign the descriptor to threads PCB
-	    (process control block) and load the <varname>%gs</varname>
-	    segment using <function>load_gs</function>.  This loading must be
-	    done in a critical section so that nothing can interrupt us.
-	    The <literal>CLONE_SETTLS</literal> case works exactly like this
-	    just the loading using <function>load_gs</function> is not
-	    performed.  The segment used for this (segment number 3) is
-	    shared for this use between &os; processes and &linux; processes
-	    so the &linux; emulation layer does not add any overhead over
+	  <para>Loading of TLS for the current thread happens by
+	    calling <function>set_thread_area</function> while loading
+	    TLS for a second process in <function>clone</function> is
+	    done in the separate block in <function>clone</function>.
+	    Those two functions are very similar.  The only difference
+	    being the actual loading of the GDT segment, which happens
+	    on the next context switch for the newly created process
+	    while <function>set_thread_area</function> must load this
+	    directly.  The code basically does this.  It copies the
+	    &linux; form segment descriptor from the userland.  The
+	    code checks for the number of the descriptor but because
+	    this differs between &os; and &linux; we fake it a little.
+	    We only support indexes of 6, 3 and -1.  The 6 is genuine
+	    &linux; number, 3 is genuine &os; one and -1 means
+	    autoselection.  Then we set the descriptor number to
+	    constant 3 and copy out this to the userspace.  We rely on
+	    the userspace process using the number from the descriptor
+	    but this works most of the time (have never seen a case
+	    where this did not work) as the userspace process
+	    typically passes in 1.  Then we convert the descriptor
+	    from the &linux; form to a machine dependant form (i.e.
+	    operating system independent form) and copy this to the
+	    &os; defined segment descriptor.  Finally we can load it.
+	    We assign the descriptor to threads PCB (process control
+	    block) and load the <varname>%gs</varname> segment using
+	    <function>load_gs</function>.  This loading must be done
+	    in a critical section so that nothing can interrupt us.
+	    The <literal>CLONE_SETTLS</literal> case works exactly
+	    like this just the loading using
+	    <function>load_gs</function> is not performed.  The
+	    segment used for this (segment number 3) is shared for
+	    this use between &os; processes and &linux; processes so
+	    the &linux; emulation layer does not add any overhead over
 	    plain &os;.</para>
 	</sect4>
 
 	<sect4 xml:id="tls-amd64">
 	  <title>amd64</title>
 
-	  <para>The amd64 implementation is similar to the i386 one but there
-	    was initially no 32bit segment descriptor used for this purpose
-	    (hence not even native 32bit TLS users worked) so we had to add
-	    such a segment and implement its loading on every context switch
-	    (when a flag signaling use of 32bit is set).  Apart from this the
-	    TLS loading is exactly the same just the segment numbers are
-	    different and the descriptor format and the loading differs
+	  <para>The amd64 implementation is similar to the i386 one
+	    but there was initially no 32bit segment descriptor used
+	    for this purpose (hence not even native 32bit TLS users
+	    worked) so we had to add such a segment and implement its
+	    loading on every context switch (when a flag signaling use
+	    of 32bit is set).  Apart from this the TLS loading is
+	    exactly the same just the segment numbers are different
+	    and the descriptor format and the loading differs
 	    slightly.</para>
 	</sect4>
       </sect3>
@@ -1786,56 +1955,60 @@ void * child_tidptr);</programlisting>
       <sect3 xml:id="sync-intro">
 	<title>Introduction to synchronization</title>
 
-	<para>Threads need some kind of synchronization and &posix; provides
-	  some of them: mutexes for mutual exclusion, read-write locks for
-	  mutual exclusion with biased ratio of reads and writes and condition
-	  variables for signaling a status change.  It is interesting to note
-	  that &posix; threading API lacks support for semaphores.  Those
-	  synchronization routines implementations are heavily dependant on
-	  the type threading support we have.  In pure 1:M (userspace) model
-	  the implementation can be solely done in userspace and thus be very
-	  fast (the condition variables will probably end up being implemented
-	  using signals, i.e. not fast) and simple.  In 1:1 model, the
-	  situation is also quite clear - the threads must be synchronized
-	  using kernel facilities (which is very slow because a syscall must be
-	  performed).  The mixed M:N scenario just combines the first and
-	  second approach or rely solely on kernel.  Threads synchronization is
-	  a vital part of thread-enabled programming and its performance can
-	  affect resulting program a lot.  Recent benchmarks on &os; operating
-	  system showed that an improved sx_lock implementation yielded 40%
-	  speedup in <firstterm>ZFS</firstterm> (a heavy sx user), this
-	  is in-kernel stuff but it shows clearly how important the performance
-	  of synchronization primitives is.</para>
+	<para>Threads need some kind of synchronization and &posix;
+	  provides some of them: mutexes for mutual exclusion,
+	  read-write locks for mutual exclusion with biased ratio of
+	  reads and writes and condition variables for signaling a
+	  status change.  It is interesting to note that &posix;
+	  threading API lacks support for semaphores.  Those
+	  synchronization routines implementations are heavily
+	  dependant on the type threading support we have.  In pure
+	  1:M (userspace) model the implementation can be solely done
+	  in userspace and thus be very fast (the condition variables
+	  will probably end up being implemented using signals, i.e.
+	  not fast) and simple.  In 1:1 model, the situation is also
+	  quite clear - the threads must be synchronized using kernel
+	  facilities (which is very slow because a syscall must be
+	  performed).  The mixed M:N scenario just combines the first
+	  and second approach or rely solely on kernel.  Threads
+	  synchronization is a vital part of thread-enabled
+	  programming and its performance can affect resulting program
+	  a lot.  Recent benchmarks on &os; operating system showed
+	  that an improved sx_lock implementation yielded 40% speedup
+	  in <firstterm>ZFS</firstterm> (a heavy sx user), this is
+	  in-kernel stuff but it shows clearly how important the
+	  performance of synchronization primitives is.</para>
 
-	<para>Threaded programs should be written with as little contention on
-	  locks as possible.  Otherwise, instead of doing useful work the
-	  thread just waits on a lock.  Because of this, the most well written
-	  threaded programs show little locks contention.</para>
+	<para>Threaded programs should be written with as little
+	  contention on locks as possible.  Otherwise, instead of
+	  doing useful work the thread just waits on a lock.  Because
+	  of this, the most well written threaded programs show little
+	  locks contention.</para>
       </sect3>
 
       <sect3 xml:id="futex-intro">
 	<title>Futexes introduction</title>
 
-	<para>&linux; implements 1:1 threading, i.e. it has to use in-kernel
-	  synchronization primitives.  As stated earlier, well written threaded
-	  programs have little lock contention.  So a typical sequence
-	  could be performed as two atomic increase/decrease mutex reference
-	  counter, which is very fast, as presented by the following
-	  example:</para>
+	<para>&linux; implements 1:1 threading, i.e. it has to use
+	  in-kernel synchronization primitives.  As stated earlier,
+	  well written threaded programs have little lock contention.
+	  So a typical sequence could be performed as two atomic
+	  increase/decrease mutex reference counter, which is very
+	  fast, as presented by the following example:</para>
 
 	<programlisting>pthread_mutex_lock(&amp;mutex);
 ....
 pthread_mutex_unlock(&amp;mutex);</programlisting>
 
-	<para>1:1 threading forces us to perform two syscalls for those mutex
-	  calls, which is very slow.</para>
+	<para>1:1 threading forces us to perform two syscalls for
+	  those mutex calls, which is very slow.</para>
 
-	<para>The solution &linux;&nbsp;2.6 implements is called futexes.
-	  Futexes implement the check for contention in userspace and call
-	  kernel primitives only in a case of contention.  Thus the typical
-	  case takes place without any kernel intervention.  This yields
-	  reasonably fast and flexible synchronization primitives
-	  implementation.</para>
+	<para>The solution &linux;&nbsp;2.6 implements is called
+	  futexes.  Futexes implement the check for contention in
+	  userspace and call kernel primitives only in a case of
+	  contention.  Thus the typical case takes place without any
+	  kernel intervention.  This yields reasonably fast and
+	  flexible synchronization primitives implementation.</para>
       </sect3>
 
       <sect3 xml:id="futex-api">
@@ -1845,10 +2018,10 @@ pthread_mutex_unlock(&amp;mutex);</programlisting>
 
 	<programlisting>int futex(void *uaddr, int op, int val, struct timespec *timeout, void *uaddr2, int val3);</programlisting>
 
-	<para>In this example <varname>uaddr</varname> is an address of the
-	  mutex in userspace, <varname>op</varname> is an operation we are
-	  about to perform and the other parameters have per-operation
-	  meaning.</para>
+	<para>In this example <varname>uaddr</varname> is an address
+	  of the mutex in userspace, <varname>op</varname> is an
+	  operation we are about to perform and the other parameters
+	  have per-operation meaning.</para>
 
 	<para>Futexes implement the following operations:</para>
 
@@ -1879,34 +2052,37 @@ pthread_mutex_unlock(&amp;mutex);</programlisting>
 	  <para>This operation verifies that on address
 	    <varname>uaddr</varname> the value <varname>val</varname>
 	    is written.  If not, <literal>EWOULDBLOCK</literal> is
-	    returned, otherwise the thread is queued on the futex and gets
-	    suspended.  If the argument <varname>timeout</varname> is
-	    non-zero it specifies the maximum time for the sleeping,
-	    otherwise the sleeping is infinite.</para>
+	    returned, otherwise the thread is queued on the futex and
+	    gets suspended.  If the argument
+	    <varname>timeout</varname> is non-zero it specifies the
+	    maximum time for the sleeping, otherwise the sleeping is
+	    infinite.</para>
 	</sect4>
 
 	<sect4 xml:id="futex-wake">
 	  <title>FUTEX_WAKE</title>
 
-	  <para>This operation takes a futex at <varname>uaddr</varname>
-	    and wakes up <varname>val</varname> first futexes queued
-	    on this futex.</para>
+	  <para>This operation takes a futex at
+	    <varname>uaddr</varname> and wakes up
+	    <varname>val</varname> first futexes queued on this
+	    futex.</para>
 	</sect4>
 
 	<sect4 xml:id="futex-fd">
 	  <title>FUTEX_FD</title>
 
-	  <para>This operations associates a file descriptor with a given
-	    futex.</para>
+	  <para>This operations associates a file descriptor with a
+	    given futex.</para>
 	</sect4>
 
 	<sect4 xml:id="futex-requeue">
 	  <title>FUTEX_REQUEUE</title>
 
 	  <para>This operation takes <varname>val</varname> threads
-	    queued on futex at <varname>uaddr</varname>, wakes them up,
-	    and takes <varname>val2</varname> next threads and requeues them
-	    on futex at <varname>uaddr2</varname>.</para>
+	    queued on futex at <varname>uaddr</varname>, wakes them
+	    up, and takes <varname>val2</varname> next threads and
+	    requeues them on futex at
+	    <varname>uaddr2</varname>.</para>
 	</sect4>
 
 	<sect4 xml:id="futex-cmp-requeue">
@@ -1922,12 +2098,13 @@ pthread_mutex_unlock(&amp;mutex);</programlisting>
 	  <title>FUTEX_WAKE_OP</title>
 
 	  <para>This operation performs an atomic operation on
-	    <varname>val3</varname> (which contains coded some other value)
-	    and <varname>uaddr</varname>.  Then it wakes up
+	    <varname>val3</varname> (which contains coded some other
+	    value) and <varname>uaddr</varname>.  Then it wakes up
 	    <varname>val</varname> threads on futex at
-	    <varname>uaddr</varname> and if the atomic operation returned a
-	    positive number it wakes up <varname>val2</varname> threads on
-	    futex at <varname>uaddr2</varname>.</para>
+	    <varname>uaddr</varname> and if the atomic operation
+	    returned a positive number it wakes up
+	    <varname>val2</varname> threads on futex at
+	    <varname>uaddr2</varname>.</para>
 
 	  <para>The operations implemented in
 	    <literal>FUTEX_WAKE_OP</literal>:</para>
@@ -1952,9 +2129,10 @@ pthread_mutex_unlock(&amp;mutex);</programlisting>
 
 	  <note>
 	    <para>There is no <varname>val2</varname> parameter in the
-	      futex prototype.  The <varname>val2</varname> is taken from the
-	      <varname>struct timespec *timeout</varname> parameter
-	      for operations <literal>FUTEX_REQUEUE</literal>,
+	      futex prototype.  The <varname>val2</varname> is taken
+	      from the <varname>struct timespec *timeout</varname>
+	      parameter for operations
+	      <literal>FUTEX_REQUEUE</literal>,
 	      <literal>FUTEX_CMP_REQUEUE</literal> and
 	      <literal>FUTEX_WAKE_OP</literal>.</para>
 	  </note>
@@ -1964,9 +2142,10 @@ pthread_mutex_unlock(&amp;mutex);</programlisting>
       <sect3 xml:id="futex-emu">
 	<title>Futex emulation in &os;</title>
 
-	<para>The futex emulation in &os; is taken from NetBSD and further
-	  extended by us.  It is placed in <filename>linux_futex.c</filename>
-	  and <filename>linux_futex.h</filename> files.  The
+	<para>The futex emulation in &os; is taken from NetBSD and
+	  further extended by us.  It is placed in
+	  <filename>linux_futex.c</filename> and
+	  <filename>linux_futex.h</filename> files.  The
 	  <literal>futex</literal> structure looks like:</para>
 
 	<programlisting>struct futex {
@@ -1978,7 +2157,8 @@ pthread_mutex_unlock(&amp;mutex);</programlisting>
   TAILQ_HEAD(lf_waiting_paroc, waiting_proc) f_waiting_proc;
 };</programlisting>
 
-	<para>And the structure <literal>waiting_proc</literal> is:</para>
+	<para>And the structure <literal>waiting_proc</literal>
+	  is:</para>
 
 	<programlisting>struct waiting_proc {
 
@@ -1992,84 +2172,93 @@ pthread_mutex_unlock(&amp;mutex);</programlisting>
 	<sect4 xml:id="futex-get">
 	  <title>futex_get / futex_put</title>
 
-	  <para>A futex is obtained using the <function>futex_get</function>
-	    function, which searches a linear list of futexes and returns the
-	    found one or creates a new futex.  When releasing a futex from the
-	    use we call the <function>futex_put</function> function, which
-	    decreases a reference counter of the futex and if the refcount
-	    reaches zero it is released.</para>
+	  <para>A futex is obtained using the
+	    <function>futex_get</function> function, which searches a
+	    linear list of futexes and returns the found one or
+	    creates a new futex.  When releasing a futex from the use
+	    we call the <function>futex_put</function> function, which
+	    decreases a reference counter of the futex and if the
+	    refcount reaches zero it is released.</para>
 	</sect4>
 
 	<sect4 xml:id="futex-sleep">
 	  <title>futex_sleep</title>
 
 	  <para>When a futex queues a thread for sleeping it creates a
-	    <literal>working_proc</literal> structure and puts this structure
-	    to the list inside the futex structure then it just performs a
-	    &man.tsleep.9; to suspend the thread.  The sleep can be timed out.
-	    After &man.tsleep.9; returns (the thread was woken up or it timed
-	    out) the <literal>working_proc</literal> structure is removed
-	    from the list and is destroyed.  All this is done in the
-	    <function>futex_sleep</function> function.  If we got woken up
-	    from <function>futex_wake</function> we have
-	    <varname>wp_new_futex</varname> set so we sleep on it.  This way
-	    the actual requeueing is done in this function.</para>
+	    <literal>working_proc</literal> structure and puts this
+	    structure to the list inside the futex structure then it
+	    just performs a &man.tsleep.9; to suspend the thread.  The
+	    sleep can be timed out.  After &man.tsleep.9; returns (the
+	    thread was woken up or it timed out) the
+	    <literal>working_proc</literal> structure is removed from
+	    the list and is destroyed.  All this is done in the
+	    <function>futex_sleep</function> function.  If we got
+	    woken up from <function>futex_wake</function> we have
+	    <varname>wp_new_futex</varname> set so we sleep on it.
+	    This way the actual requeueing is done in this
+	    function.</para>
 	</sect4>
 
 	<sect4 xml:id="futex-wake-2">
 	  <title>futex_wake</title>
 
-	  <para>Waking up a thread sleeping on a futex is performed in the
-	    <function>futex_wake</function> function.  First in this function
-	    we mimic the strange &linux; behavior, where it wakes up N threads
-	    for all operations, the only exception is that the REQUEUE
-	    operations are performed on N+1 threads.  But this usually does not
-	    make any difference as we are waking up all threads.  Next in the
-	    function in the loop we wake up n threads, after this we check if
-	    there is a new futex for requeueing.  If so, we requeue up to n2
-	    threads on the new futex.  This cooperates with
-	    <function>futex_sleep</function>.</para>
+	  <para>Waking up a thread sleeping on a futex is performed in
+	    the <function>futex_wake</function> function.  First in
+	    this function we mimic the strange &linux; behavior, where
+	    it wakes up N threads for all operations, the only
+	    exception is that the REQUEUE operations are performed on
+	    N+1 threads.  But this usually does not make any
+	    difference as we are waking up all threads.  Next in the
+	    function in the loop we wake up n threads, after this we
+	    check if there is a new futex for requeueing.  If so, we
+	    requeue up to n2 threads on the new futex.  This
+	    cooperates with <function>futex_sleep</function>.</para>
 	</sect4>
 
 	<sect4 xml:id="futex-wake-op-2">
 	  <title>futex_wake_op</title>
 
-	  <para>The <literal>FUTEX_WAKE_OP</literal> operation is quite
-	    complicated.  First we obtain two futexes at addresses
-	    <varname>uaddr</varname> and <varname>uaddr2</varname> then we
-	    perform the atomic operation using <varname>val3</varname> and
-	    <varname>uaddr2</varname>.  Then <varname>val</varname> waiters
-	    on the first futex is woken up and if the atomic operation
-	    condition holds we wake up <varname>val2</varname> (i.e.
-	    <varname>timeout</varname>) waiter on the second futex.</para>
+	  <para>The <literal>FUTEX_WAKE_OP</literal> operation is
+	    quite complicated.  First we obtain two futexes at
+	    addresses <varname>uaddr</varname> and
+	    <varname>uaddr2</varname> then we perform the atomic
+	    operation using <varname>val3</varname> and
+	    <varname>uaddr2</varname>.  Then <varname>val</varname>
+	    waiters on the first futex is woken up and if the atomic
+	    operation condition holds we wake up
+	    <varname>val2</varname> (i.e.  <varname>timeout</varname>)
+	    waiter on the second futex.</para>
 	</sect4>
 
 	<sect4 xml:id="futex-atomic-op">
 	  <title>futex atomic operation</title>
 
 	  <para>The atomic operation takes two parameters
-	    <varname>encoded_op</varname> and <varname>uaddr</varname>.
-	    The encoded operation encodes the operation itself,
-	    comparing value, operation argument, and comparing argument.
-	    The pseudocode for the operation is like this one:</para>
+	    <varname>encoded_op</varname> and
+	    <varname>uaddr</varname>.  The encoded operation encodes
+	    the operation itself, comparing value, operation argument,
+	    and comparing argument.  The pseudocode for the operation
+	    is like this one:</para>
 
 	  <programlisting>oldval = *uaddr2
 *uaddr2 = oldval OP oparg</programlisting>
 
-	  <para>And this is done atomically.  First a copying in of the number
-	    at <varname>uaddr</varname> is performed and the operation is
-	    done.  The code handles page faults and if no page fault occurs
-	    <varname>oldval</varname> is compared to
-	    <varname>cmparg</varname> argument with cmp comparator.</para>
+	  <para>And this is done atomically.  First a copying in of
+	    the number at <varname>uaddr</varname> is performed and
+	    the operation is done.  The code handles page faults and
+	    if no page fault occurs <varname>oldval</varname> is
+	    compared to <varname>cmparg</varname> argument with cmp
+	    comparator.</para>
 	</sect4>
 
 	<sect4 xml:id="futex-locking">
 	  <title>Futex locking</title>
 
 	  <para>Futex implementation uses two lock lists protecting
-	    <function>sx_lock</function> and global locks (either Giant
-	    or another <function>sx_lock</function>).  Every operation is
-	    performed locked from the start to the very end.</para>
+	    <function>sx_lock</function> and global locks (either
+	    Giant or another <function>sx_lock</function>).  Every
+	    operation is performed locked from the start to the very
+	    end.</para>
 	</sect4>
       </sect3>
     </sect2>
@@ -2077,26 +2266,29 @@ pthread_mutex_unlock(&amp;mutex);</programlisting>
     <sect2 xml:id="syscall-impl">
       <title>Various syscalls implementation</title>
 
-      <para>In this section I am going to describe some smaller syscalls that
-	are worth mentioning because their implementation is not obvious or
-	those syscalls are interesting from other point of view.</para>
+      <para>In this section I am going to describe some smaller
+	syscalls that are worth mentioning because their
+	implementation is not obvious or those syscalls are
+	interesting from other point of view.</para>
 
       <sect3 xml:id="syscall-at">
 	<title>*at family of syscalls</title>
 
-	<para>During development of &linux; 2.6.16 kernel, the *at syscalls
-	  were added.  Those syscalls (<function>openat</function> for example)
-	  work exactly like their at-less counterparts with the slight
-	  exception of the <varname>dirfd</varname> parameter.  This
-	  parameter changes where the given file, on which the syscall is to be
-	  performed, is.  When the <varname>filename</varname> parameter is
-	  absolute <varname>dirfd</varname> is ignored but when the path to
-	  the file is relative, it comes to the play.  The
-	  <varname>dirfd</varname> parameter is a directory relative to which
-	  the relative pathname is checked.  The <varname>dirfd</varname>
-	  parameter is a file descriptor of some directory or
-	  <literal>AT_FDCWD</literal>.  So for example the
-	  <function>openat</function> syscall can be like this:</para>
+	<para>During development of &linux; 2.6.16 kernel, the *at
+	  syscalls were added.  Those syscalls
+	  (<function>openat</function> for example) work exactly like
+	  their at-less counterparts with the slight exception of the
+	  <varname>dirfd</varname> parameter.  This parameter changes
+	  where the given file, on which the syscall is to be
+	  performed, is.  When the <varname>filename</varname>
+	  parameter is absolute <varname>dirfd</varname> is ignored
+	  but when the path to the file is relative, it comes to the
+	  play.  The <varname>dirfd</varname> parameter is a directory
+	  relative to which the relative pathname is checked.  The
+	  <varname>dirfd</varname> parameter is a file descriptor of
+	  some directory or <literal>AT_FDCWD</literal>.  So for
+	  example the <function>openat</function> syscall can be like
+	  this:</para>
 
 	<programlisting>file descriptor 123 = /tmp/foo/, current working directory = /tmp/
 
@@ -2105,15 +2297,17 @@ openat(123, bah\, flags, mode)		/* opens /tmp/foo/bah */
 openat(AT_FDWCWD, bah\, flags, mode)	/* opens /tmp/bah */
 openat(stdio, bah\, flags, mode)	/* returns error because stdio is not a directory */</programlisting>
 
-	<para>This infrastructure is necessary to avoid races when opening
-	  files outside the working directory.  Imagine that a process consists
-	  of two threads, thread&nbsp;A and thread&nbsp;B.  Thread&nbsp;A
-	  issues <literal>open(./tmp/foo/bah., flags, mode)</literal> and
+	<para>This infrastructure is necessary to avoid races when
+	  opening files outside the working directory.  Imagine that a
+	  process consists of two threads, thread&nbsp;A and
+	  thread&nbsp;B.  Thread&nbsp;A issues
+	  <literal>open(./tmp/foo/bah., flags, mode)</literal> and
 	  before returning it gets preempted and thread&nbsp;B runs.
-	  Thread&nbsp;B does not care about the needs of thread&nbsp;A and
-	  renames or removes <filename>/tmp/foo/</filename>.  We got a race.
-	  To avoid this we can open <filename>/tmp/foo</filename> and use it
-	  as <varname>dirfd</varname> for <function>openat</function>
+	  Thread&nbsp;B does not care about the needs of thread&nbsp;A
+	  and renames or removes <filename>/tmp/foo/</filename>.  We
+	  got a race.  To avoid this we can open
+	  <filename>/tmp/foo</filename> and use it as
+	  <varname>dirfd</varname> for <function>openat</function>
 	  syscall.  This also enables user to implement per-thread
 	  working directories.</para>
 
@@ -2130,84 +2324,91 @@ openat(stdio, bah\, flags, mode)	/* returns error because stdio is not a directo
 	  <function>linux_symlinkat</function>,
 	  <function>linux_readlinkat</function>,
 	  <function>linux_fchmodat</function> and
-	  <function>linux_faccessat</function>.  All these are implemented
-	  using the modified &man.namei.9; routine and simple
-	  wrapping layer.</para>
+	  <function>linux_faccessat</function>.  All these are
+	  implemented using the modified &man.namei.9; routine and
+	  simple wrapping layer.</para>
 
 	<sect4 xml:id="implementation">
 	  <title>Implementation</title>
 
 	  <para>The implementation is done by altering the
-	     &man.namei.9; routine (described above) to take
-	     additional parameter <varname>dirfd</varname> in its
-	     <literal>nameidata</literal> structure, which specifies the
-	     starting point of the pathname lookup instead of using the
-	     current working directory every time.  The resolution of
-	     <varname>dirfd</varname> from file descriptor number to a
-	     vnode is done in native *at syscalls.  When
-	     <varname>dirfd</varname> is <literal>AT_FDCWD</literal> the
-	     <varname>dvp</varname> entry in <literal>nameidata</literal>
-	     structure is <literal>NULL</literal> but when
-	     <varname>dirfd</varname> is a different number we obtain a
-	     file for this file descriptor, check whether this file
-	     is valid and if there is vnode attached to it then we get a vnode.
-	     Then we check this vnode for being a directory.  In the actual
-	     &man.namei.9; routine we simply substitute the
-	     <varname>dvp</varname> vnode for <varname>dp</varname> variable
-	     in the &man.namei.9; function, which determines the
-	     starting point.  The &man.namei.9; is not used
-	     directly but via a trace of different functions on various
-	     levels.  For example the <function>openat</function> goes like
-	     this:</para>
+	    &man.namei.9; routine (described above) to take additional
+	    parameter <varname>dirfd</varname> in its
+	    <literal>nameidata</literal> structure, which specifies
+	    the starting point of the pathname lookup instead of using
+	    the current working directory every time.  The resolution
+	    of <varname>dirfd</varname> from file descriptor number to
+	    a vnode is done in native *at syscalls.  When
+	    <varname>dirfd</varname> is <literal>AT_FDCWD</literal>
+	    the <varname>dvp</varname> entry in
+	    <literal>nameidata</literal> structure is
+	    <literal>NULL</literal> but when <varname>dirfd</varname>
+	    is a different number we obtain a file for this file
+	    descriptor, check whether this file is valid and if there
+	    is vnode attached to it then we get a vnode.  Then we
+	    check this vnode for being a directory.  In the actual
+	    &man.namei.9; routine we simply substitute the
+	    <varname>dvp</varname> vnode for <varname>dp</varname>
+	    variable in the &man.namei.9; function, which determines
+	    the starting point.  The &man.namei.9; is not used
+	    directly but via a trace of different functions on various
+	    levels.  For example the <function>openat</function> goes
+	    like this:</para>
 
 	  <programlisting>openat() --&gt; kern_openat() --&gt; vn_open() -&gt; namei()</programlisting>
 
 	  <para>For this reason <function>kern_open</function> and
-	    <function>vn_open</function> must be altered to incorporate
-	    the additional <varname>dirfd</varname> parameter.  No compat
-	    layer is created for those because there are not many users of
-	    this and the users can be easily converted.  This general
-	    implementation enables &os; to implement their own *at syscalls.
-	    This is being discussed right now.</para>
+	    <function>vn_open</function> must be altered to
+	    incorporate the additional <varname>dirfd</varname>
+	    parameter.  No compat layer is created for those because
+	    there are not many users of this and the users can be
+	    easily converted.  This general implementation enables
+	    &os; to implement their own *at syscalls.  This is being
+	    discussed right now.</para>
 	</sect4>
       </sect3>
 
       <sect3 xml:id="ioctl">
 	<title>Ioctl</title>
 
-	<para>The ioctl interface is quite fragile due to its generality.
-	  We have to bear in mind that devices differ between &linux; and &os;
-	  so some care must be applied to do ioctl emulation work right.  The
-	  ioctl handling is implemented in <filename>linux_ioctl.c</filename>,
-	  where <function>linux_ioctl</function> function is defined.  This
-	  function simply iterates over sets of ioctl handlers to find a
-	  handler that implements a given command.  The ioctl syscall has three
-	  parameters, the file descriptor, command and an argument.  The
-	  command is a 16-bit number, which in theory is divided into high
-	  8&nbsp;bits determining class of the ioctl command and low
-	  8&nbsp;bits, which are the actual command within the given set.
-	  The emulation takes advantage of this division.  We implement
-	  handlers for each set, like <function>sound_handler</function>
-	  or <function>disk_handler</function>.  Each handler has a maximum
-	  command and a minimum command defined, which is used for determining
-	  what handler is used.  There are slight problems with this approach
-	  because &linux; does not use the set division consistently so
-	  sometimes ioctls for a different set are inside a set they should
-	  not belong to (SCSI generic ioctls inside cdrom set, etc.).  &os;
-	  currently does not implement many &linux; ioctls (compared to
-	  NetBSD, for example) but the plan is to port those from NetBSD.
-	  The trend is to use &linux; ioctls even in the native &os; drivers
-	  because of the easy porting of applications.</para>
+	<para>The ioctl interface is quite fragile due to its
+	  generality.  We have to bear in mind that devices differ
+	  between &linux; and &os; so some care must be applied to do
+	  ioctl emulation work right.  The ioctl handling is
+	  implemented in <filename>linux_ioctl.c</filename>, where
+	  <function>linux_ioctl</function> function is defined.  This
+	  function simply iterates over sets of ioctl handlers to find
+	  a handler that implements a given command.  The ioctl
+	  syscall has three parameters, the file descriptor, command
+	  and an argument.  The command is a 16-bit number, which in
+	  theory is divided into high 8&nbsp;bits determining class of
+	  the ioctl command and low 8&nbsp;bits, which are the actual
+	  command within the given set.  The emulation takes advantage
+	  of this division.  We implement handlers for each set, like
+	  <function>sound_handler</function> or
+	  <function>disk_handler</function>.  Each handler has a
+	  maximum command and a minimum command defined, which is used
+	  for determining what handler is used.  There are slight
+	  problems with this approach because &linux; does not use the
+	  set division consistently so sometimes ioctls for a
+	  different set are inside a set they should not belong to
+	  (SCSI generic ioctls inside cdrom set, etc.).  &os;
+	  currently does not implement many &linux; ioctls (compared
+	  to NetBSD, for example) but the plan is to port those from
+	  NetBSD.  The trend is to use &linux; ioctls even in the
+	  native &os; drivers because of the easy porting of
+	  applications.</para>
       </sect3>
 
       <sect3 xml:id="debugging">
 	<title>Debugging</title>
 
 	<para>Every syscall should be debuggable.  For this purpose we
-	  introduce a small infrastructure.  We have the ldebug facility, which
-	  tells whether a given syscall should be debugged (settable via a
-	  sysctl).  For printing we have LMSG and ARGS macros.  Those are used
-	  for altering a printable string for uniform debugging messages.</para>
+	  introduce a small infrastructure.  We have the ldebug
+	  facility, which tells whether a given syscall should be
+	  debugged (settable via a sysctl).  For printing we have LMSG
+	  and ARGS macros.  Those are used for altering a printable
+	  string for uniform debugging messages.</para>
       </sect3>
     </sect2>
   </sect1>
@@ -2219,67 +2420,74 @@ openat(stdio, bah\, flags, mode)	/* returns error because stdio is not a directo
       <title>Results</title>
 
       <para>As of April 2007 the &linux; emulation layer is capable of
-	emulating the &linux;&nbsp;2.6.16 kernel quite well.  The remaining
-	problems concern futexes, unfinished *at family of syscalls,
-	problematic signals delivery, missing <function>epoll</function> and
-	<function>inotify</function> and probably some bugs we have not
-	discovered yet.  Despite this we are capable of running basically all
-	the &linux; programs included in &os; Ports&nbsp;Collection with
-	Fedora&nbsp;Core&nbsp;4 at 2.6.16 and there are some rudimentary
-	reports of success with Fedora&nbsp;Core&nbsp;6 at 2.6.16.  The
-	Fedora&nbsp;Core&nbsp;6	linux_base was recently committed enabling
-	some further testing of the emulation layer and giving us some more
-	hints where we should put our effort in implementing missing
-	stuff.</para>
+	emulating the &linux;&nbsp;2.6.16 kernel quite well.  The
+	remaining problems concern futexes, unfinished *at family of
+	syscalls, problematic signals delivery, missing
+	<function>epoll</function> and <function>inotify</function>
+	and probably some bugs we have not discovered yet.  Despite
+	this we are capable of running basically all the &linux;
+	programs included in &os; Ports&nbsp;Collection with
+	Fedora&nbsp;Core&nbsp;4 at 2.6.16 and there are some
+	rudimentary reports of success with Fedora&nbsp;Core&nbsp;6 at
+	2.6.16.  The Fedora&nbsp;Core&nbsp;6 linux_base was recently
+	committed enabling some further testing of the emulation layer
+	and giving us some more hints where we should put our effort
+	in implementing missing stuff.</para>
 
       <para>We are able to run the most used applications like
 	<package>www/linux-firefox</package>,
 	<package>www/linux-opera</package>,
-	<package>net-im/skype</package> and some games from
-	the Ports&nbsp;Collection.  Some of the programs exhibit bad behavior
-	under 2.6 emulation but this is currently under investigation and
-	hopefully will be fixed soon.  The only big application that is
-	known not to work is the &linux; &java; Development Kit and this is
-	because of the requirement of <function>epoll</function>
-	facility which is not directly related to the &linux;
-	kernel 2.6.</para>
+	<package>net-im/skype</package> and some games from the
+	Ports&nbsp;Collection.  Some of the programs exhibit bad
+	behavior under 2.6 emulation but this is currently under
+	investigation and hopefully will be fixed soon.  The only big
+	application that is known not to work is the &linux; &java;
+	Development Kit and this is because of the requirement of
+	<function>epoll</function> facility which is not directly
+	related to the &linux; kernel 2.6.</para>
 
-      <para>We hope to enable 2.6.16 emulation by default some time after
-	&os; 7.0 is released at least to expose the 2.6 emulation parts for
-	some wider testing.  Once this is done we can switch to
-	Fedora&nbsp;Core&nbsp;6 linux_base, which is the ultimate plan.</para>
+      <para>We hope to enable 2.6.16 emulation by default some time
+	after &os; 7.0 is released at least to expose the 2.6
+	emulation parts for some wider testing.  Once this is done we
+	can switch to Fedora&nbsp;Core&nbsp;6 linux_base, which is the
+	ultimate plan.</para>
     </sect2>
 
     <sect2 xml:id="future-work">
       <title>Future work</title>
 
-      <para>Future work should focus on fixing the remaining issues with
-	futexes, implement the rest of the *at family of syscalls, fix the
-	signal delivery and possibly implement the <function>epoll</function>
-	and <function>inotify</function> facilities.</para>
+      <para>Future work should focus on fixing the remaining issues
+	with futexes, implement the rest of the *at family of
+	syscalls, fix the signal delivery and possibly implement the
+	<function>epoll</function> and <function>inotify</function>
+	facilities.</para>
 
-      <para>We hope to be able to run the most important programs flawlessly
-	soon, so we will be able to switch to the 2.6 emulation by default and
-	make the Fedora&nbsp;Core&nbsp;6 the default linux_base because our
-	currently used Fedora&nbsp;Core&nbsp;4 is not supported any
-	more.</para>
+      <para>We hope to be able to run the most important programs
+	flawlessly soon, so we will be able to switch to the 2.6
+	emulation by default and make the Fedora&nbsp;Core&nbsp;6 the
+	default linux_base because our currently used
+	Fedora&nbsp;Core&nbsp;4 is not supported any more.</para>
 
-      <para>The other possible goal is to share our code with NetBSD and
-	DragonflyBSD.  NetBSD has some support for 2.6 emulation but its far
-	from finished and not really tested.  DragonflyBSD has expressed some
-	interest in porting the 2.6 improvements.</para>
+      <para>The other possible goal is to share our code with NetBSD
+	and DragonflyBSD.  NetBSD has some support for 2.6 emulation
+	but its far from finished and not really tested.  DragonflyBSD
+	has expressed some interest in porting the 2.6
+	improvements.</para>
 
-      <para>Generally, as &linux; develops we would like to keep up with their
-	development, implementing newly added syscalls.  Splice comes to mind
-	first.  Some already implemented syscalls are also heavily crippled,
-	for example <function>mremap</function> and others.  Some performance
-	improvements can also be made, finer grained locking and others.</para>
+      <para>Generally, as &linux; develops we would like to keep up
+	with their development, implementing newly added syscalls.
+	Splice comes to mind first.  Some already implemented syscalls
+	are also heavily crippled, for example
+	<function>mremap</function> and others.  Some performance
+	improvements can also be made, finer grained locking and
+	others.</para>
     </sect2>
 
     <sect2 xml:id="team">
       <title>Team</title>
 
-      <para>I cooperated on this project with (in alphabetical order):</para>
+      <para>I cooperated on this project with (in alphabetical
+	order):</para>
 
       <itemizedlist>
 	<listitem>
@@ -2311,8 +2519,8 @@ openat(stdio, bah\, flags, mode)	/* returns error because stdio is not a directo
 	</listitem>
       </itemizedlist>
 
-      <para>I would like to thank all those people for their advice, code
-	reviews and general support.</para>
+      <para>I would like to thank all those people for their advice,
+	code reviews and general support.</para>
     </sect2>
   </sect1>
 
@@ -2321,16 +2529,18 @@ openat(stdio, bah\, flags, mode)	/* returns error because stdio is not a directo
 
     <orderedlist>
       <listitem>
-	<para>Marshall Kirk McKusick - George V. Nevile-Neil. Design
-	  and Implementation of the &os; operating system. Addison-Wesley,
-	  2005.</para>
+	<para>Marshall Kirk McKusick - George V. Nevile-Neil.  Design
+	  and Implementation of the &os; operating system.
+	  Addison-Wesley, 2005.</para>
       </listitem>
       <listitem>
-	<para><uri xlink:href="https://tldp.org">https://tldp.org</uri></para>
+	<para><uri
+	    xlink:href="https://tldp.org">https://tldp.org</uri></para>
       </listitem>
       <listitem>
-	<para><uri xlink:href="https://www.kernel.org">https://www.kernel.org</uri></para>
-     </listitem>
+	<para><uri
+	    xlink:href="https://www.kernel.org">https://www.kernel.org</uri></para>
+      </listitem>
     </orderedlist>
   </sect1>
 </article>