Update this chapter with respect to our current state of jail

infrastructure, plus several markup corrections. Submitted by: MQ <antinvidia at gmail dot com>
svn path=/head/; revision=31407
2008-02-01 18:54:32 +00:00 · 2008-02-01 18:54:32 +00:00 · 0b444add8d · 2020-12-08 03:00:23 +00:00
commit 0b444add8d
parent 26fad0c4c1
1 changed files with 329 additions and 259 deletions
--- a/en_US.ISO8859-1/books/arch-handbook/jail/chapter.sgml
+++ b/en_US.ISO8859-1/books/arch-handbook/jail/chapter.sgml
@ -24,62 +24,65 @@
  <indexterm><primary>Jail</primary></indexterm>
  <indexterm><primary>root</primary></indexterm>

-  <para>On most &unix; systems, root has omnipotent power. This promotes
-    insecurity. If an attacker were to gain root on a system, he would
-    have every function at his fingertips. In FreeBSD there are
-    sysctls which dilute the power of root, in order to minimize the
-    damage caused by an attacker. Specifically, one of these functions
-    is called secure levels. Similarly, another function which is
-    present from FreeBSD 4.0 and onward, is a utility called
-    &man.jail.8;. <application>Jail</application> chroots an
-    environment and sets certain restrictions on processes which are
-    forked from within. For example, a jailed process cannot affect
-    processes outside of the jail, utilize certain system calls, or
-    inflict any damage on the main computer.</para>
+  <para>On most &unix; systems, <literal>root</literal> has omnipotent power.
+    This promotes insecurity. If an attacker gained <literal>root</literal>
+    on a system, he would have every function at his fingertips. In FreeBSD
+    there are sysctls which dilute the power of <literal>root</literal>, in
+    order to minimize the damage caused by an attacker. Specifically, one of
+    these functions is called <literal>secure levels</literal>. Similarly,
+    another function which is present from FreeBSD 4.0 and onward, is a utility
+    called &man.jail.8;. <application>Jail</application> chroots an environment
+    and sets certain restrictions on processes which are forked within
+    the <application>jail</application>. For example, a jailed process
+    cannot affect processes outside the <application>jail</application>,
+    utilize certain system calls, or inflict any damage on the host
+    environment.</para>

  <para><application>Jail</application> is becoming the new security
    model. People are running potentially vulnerable servers such as
-    Apache, BIND, and sendmail within jails, so that if an attacker
-    gains root within the <application>Jail</application>, it is only
-    an annoyance, and not a devastation. This article focuses on the
-    internals (source code) of <application>Jail</application>.
-    It will also suggest improvements upon the jail code base which
-    are already being worked on. If you are looking for a how-to on
-    setting up a <application>Jail</application>, I suggest you look
-    at my other article in Sys Admin Magazine, May 2001, entitled
-    "Securing FreeBSD using <application>Jail</application>."</para>
+    <application>Apache</application>, <application>BIND</application>, and
+    <application>sendmail</application> within jails, so that if an attacker
+    gains <literal>root</literal> within the <application>jail</application>,
+    it is only an annoyance, and not a devastation. This article mainly
+    focuses on the internals (source code) of <application>jail</application>.
+    If you are looking for a how-to on setting up a
+    <application>jail</application>, I suggest you look at my other article
+    in Sys Admin Magazine, May 2001, entitled "Securing FreeBSD using
+    <application>Jail</application>."</para>

  <sect1 id="jail-arch">
    <title>Architecture</title>

    <para>
      <application>Jail</application> consists of two realms: the
-      user-space program, jail, and the code implemented within the
-      kernel: the <literal>jail</literal> system call and associated
-      restrictions. I will be discussing the user-space program and
-      then how jail is implemented within the kernel.</para>
+      userland program, &man.jail.8;, and the code implemented within
+      the kernel: the &man.jail.2; system call and associated
+      restrictions. I will be discussing the userland program and
+      then how <application>jail</application> is implemented within
+      the kernel.</para>

    <sect2>
-      <title>Userland code</title>
+      <title>Userland Code</title>

      <indexterm><primary>Jail</primary>
-	<secondary>userland program</secondary></indexterm>
+	<secondary>Userland Program</secondary></indexterm>

-      <para>The source for the user-land jail is located in
-        <filename>/usr/src/usr.sbin/jail</filename>, consisting of
-        one file, <filename>jail.c</filename>. The program takes these
-        arguments: the path of the jail, hostname, ip address, and the
-        command to be executed.</para>
+      <para>The source for the userland <application>jail</application>
+        is located in <filename>/usr/src/usr.sbin/jail</filename>,
+        consisting of one file, <filename>jail.c</filename>. The program
+        takes these arguments: the path of the <application>jail</application>,
+        hostname, IP address, and the command to be executed.</para>

      <sect3>
        <title>Data Structures</title>

        <para>In <filename>jail.c</filename>, the first thing I would
          note is the declaration of an important structure
-          <literal>struct jail j</literal>; which was included from
+          <literal>struct jail j;</literal> which was included from
          <filename>/usr/include/sys/jail.h</filename>.</para>

-        <para>The definition of the jail structure is:</para>
+        <para>The definition of the <literal>jail</literal> structure is:
+</para>

 <programlisting><filename>/usr/include/sys/jail.h</filename>: 

@ -91,8 +94,8 @@ struct jail {
 };</programlisting>

        <para>As you can see, there is an entry for each of the
-          arguments passed to the jail program, and indeed, they are
-          set during its execution.</para>
+          arguments passed to the &man.jail.8; program, and indeed,
+          they are set during its execution.</para>

        <programlisting><filename>/usr/src/usr.sbin/jail/jail.c</filename>
 char path[PATH_MAX];
@ -111,10 +114,11 @@ j.hostname = argv[1];</programlisting>
      <sect3>
        <title>Networking</title>

-        <para>One of the arguments passed to the Jail program is an IP
-          address with which the jail can be accessed over the
-          network. Jail translates the ip address given into host
-          byte order and then stores it in j (the jail structure).</para>
+        <para>One of the arguments passed to the &man.jail.8; program is
+          an IP address with which the <application>jail</application>
+          can be accessed over the network. &man.jail.8; translates the
+          IP address given into host byte order and then stores it in
+          <literal>j</literal> (the <literal>jail</literal> structure).</para>

        <programlisting><filename>/usr/src/usr.sbin/jail/jail.c</filename>:
 struct in_addr in; 
@ -123,37 +127,35 @@ if (inet_aton(argv[2], &amp;in) == 0)
    errx(1, "Could not make sense of ip-number: %s", argv[2]);
 j.ip_number = ntohl(in.s_addr);</programlisting>

-        <para>The
-          <citerefentry><refentrytitle>inet_aton</refentrytitle>
-          <manvolnum>3</manvolnum></citerefentry>
-          function "interprets the specified character string as an
-          Internet address, placing the address into the structure
-          provided." The ip number node in the jail structure is set
-          only when the ip address placed onto the in structure by
-          inet_aton is translated into host byte order by
-          <function>ntohl()</function>.</para>
+        <para>The &man.inet.aton.3; function "interprets the specified
+          character string as an Internet address, placing the address
+          into the structure provided." The <literal>ip_number</literal>
+          member in the <literal>jail</literal> structure is set only
+          when the IP address placed onto the <literal>in</literal>
+          structure by &man.inet.aton.3; is translated into host byte
+          order by &man.ntohl.3;.</para>

      </sect3>

      <sect3>
        <title>Jailing The Process</title>

-        <para>Finally, the userland program jails the process, and
-          executes the command specified. Jail now becomes an
-          imprisoned process itself and then executes the command
-          given using &man.execv.3;</para>
-
-        <programlisting><filename>/usr/src/sys/usr.sbin/jail/jail.c</filename>
+        <para>Finally, the userland program jails the process.
+          <application>Jail</application> now becomes an imprisoned
+          process itself and then executes the command given using
+          &man.execv.3;.</para>
+        <programlisting><filename>/usr/src/usr.sbin/jail/jail.c</filename>
 i = jail(&amp;j); 
 ... 
 if (execv(argv[3], argv + 3) != 0)
    err(1, "execv: %s", argv[3]);</programlisting>

-        <para>As you can see, the jail function is being called, and
-          its argument is the jail structure which has been filled
-          with the arguments given to the program. Finally, the
-          program you specify is executed. I will now discuss how Jail
-          is implemented within the kernel.</para>
+        <para>As you can see, the <literal>jail()</literal> function is
+          called, and its argument is the <literal>jail</literal> structure
+          which has been filled with the arguments given to the program.
+          Finally, the program you specify is executed. I will now discuss
+          how <application>jail</application> is implemented within the
+          kernel.</para>
      </sect3>
    </sect2>

@ -161,12 +163,12 @@ if (execv(argv[3], argv + 3) != 0)
      <title>Kernel Space</title>

      <indexterm><primary>Jail</primary>
-	<secondary>kernel architecture</secondary></indexterm>
+	<secondary>Kernel Architecture</secondary></indexterm>

      <para>We will now be looking at the file
        <filename>/usr/src/sys/kern/kern_jail.c</filename>.  This is
-        the file where the jail system call, appropriate sysctls, and
-        networking functions are defined.</para>
+        the file where the &man.jail.2; system call, appropriate sysctls,
+        and networking functions are defined.</para>

      <sect3>
        <title>sysctls</title>
@ -186,7 +188,7 @@ SYSCTL_INT(_security_jail, OID_AUTO, set_hostname_allowed, CTLFLAG_RW,
 int     jail_socket_unixiproute_only = 1;
 SYSCTL_INT(_security_jail, OID_AUTO, socket_unixiproute_only, CTLFLAG_RW,
    &amp;jail_socket_unixiproute_only, 0,
-    "Processes in jail are limited to creating &unix;/IPv4/route sockets only");
+    "Processes in jail are limited to creating UNIX/IPv4/route sockets only");

 int     jail_sysvipc_allowed = 0;
 SYSCTL_INT(_security_jail, OID_AUTO, sysvipc_allowed, CTLFLAG_RW,
@ -206,10 +208,15 @@ SYSCTL_INT(_security_jail, OID_AUTO, allow_raw_sockets, CTLFLAG_RW,
 int    jail_chflags_allowed = 0;
 SYSCTL_INT(_security_jail, OID_AUTO, chflags_allowed, CTLFLAG_RW,
    &amp;jail_chflags_allowed, 0,
-    "Processes in jail can alter system file flags");</programlisting>
+    "Processes in jail can alter system file flags");
+
+int     jail_mount_allowed = 0;
+SYSCTL_INT(_security_jail, OID_AUTO, mount_allowed, CTLFLAG_RW,
+    &amp;jail_mount_allowed, 0,
+    "Processes in jail can mount/unmount jail-friendly file systems");</programlisting>

        <para>Each of these sysctls can be accessed by the user
-          through the sysctl program. Throughout the kernel, these
+          through the &man.sysctl.8; program. Throughout the kernel, these
          specific sysctls are recognized by their name. For example,
          the name of the first sysctl is
          <literal>security.jail.set_hostname_allowed</literal>.</para>
@ -221,18 +228,17 @@ SYSCTL_INT(_security_jail, OID_AUTO, chflags_allowed, CTLFLAG_RW,
        <para>Like all system calls, the &man.jail.2; system call takes
          two arguments, <literal>struct thread *td</literal> and
          <literal>struct jail_args *uap</literal>.
-          <literal>td</literal> is a pointer to the thread
+          <literal>td</literal> is a pointer to the <literal>thread</literal>
          structure which describes the calling thread. In this
-          context, uap is a pointer to the structure which specifies the
-          arguments given to &man.jail.2; from the userland program
-          <filename>jail.c</filename>. When I described the userland
-          program before, you saw that the &man.jail.2; system call was
-          given a jail structure as its own argument.</para>
+          context, <literal>uap</literal> is a pointer to the structure
+          in which a pointer to the <literal>jail</literal> structure
+          passed by the userland <filename>jail.c</filename> is contained.
+          When I described the userland program before, you saw that the
+          &man.jail.2; system call was given a <literal>jail</literal>
+          structure as its own argument.</para>

        <programlisting><filename>/usr/src/sys/kern/kern_jail.c:</filename>
 /*
- * MPSAFE
- *  
 * struct jail_args {
 *  struct jail *jail;
 * };
@ -240,46 +246,48 @@ SYSCTL_INT(_security_jail, OID_AUTO, chflags_allowed, CTLFLAG_RW,
 int 
 jail(struct thread *td, struct jail_args *uap)</programlisting>

-        <para>Therefore, <literal>uap-&gt;jail</literal> would access the
-          jail structure which was passed to the system call. Next,
-          the system call copies the jail structure into kernel space
-          using the <literal>copyin()</literal>
-          function. <literal>copyin()</literal> takes three arguments:
-          the data which is to be copied into kernel space,
+        <para>Therefore, <literal>uap-&gt;jail</literal> can be used to
+          access the <literal>jail</literal> structure which was passed
+          to the system call. Next, the system call copies the
+          <literal>jail</literal> structure into kernel space using
+          the &man.copyin.9; function. &man.copyin.9; takes three arguments:
+          the address of the data which is to be copied into kernel space,
          <literal>uap-&gt;jail</literal>, where to store it,
-          <literal>j</literal> and the size of the storage. The jail
-          structure <literal>uap-&gt;jail</literal> is copied into kernel
-          space and stored in another jail structure,
+          <literal>j</literal> and the size of the storage. The
+          <literal>jail</literal> structure pointed by
+          <literal>uap-&gt;jail</literal> is copied into kernel space and
+          is stored in another <literal>jail</literal> structure,
          <literal>j</literal>.</para>

        <programlisting><filename>/usr/src/sys/kern/kern_jail.c: </filename>
 error = copyin(uap-&gt;jail, &amp;j, sizeof(j));</programlisting>

        <para>There is another important structure defined in
-          jail.h. It is the prison structure
-          (<literal>pr</literal>). The prison structure is used
-          exclusively within kernel space. The &man.jail.2; system call
-          copies everything from the jail structure onto the prison
-          structure. Here is the definition of the prison structure.</para>
+          <filename>jail.h</filename>. It is the <literal>prison</literal>
+          structure. The <literal>prison</literal> structure is used
+          exclusively within kernel space. Here is the definition of the
+          <literal>prison</literal> structure.</para>

        <programlisting><filename>/usr/include/sys/jail.h</filename>:
 struct prison {
-    LIST_ENTRY(prison) pr_list;         /* (a) all prisons */
-    int      pr_id;             /* (c) prison id */
-    int      pr_ref;            /* (p) refcount */
-    char         pr_path[MAXPATHLEN];       /* (c) chroot path */
-    struct vnode    *pr_root;           /* (c) vnode to rdir */
-    char         pr_host[MAXHOSTNAMELEN];   /* (p) jail hostname */
-    u_int32_t    pr_ip;             /* (c) ip addr host */
-    void        *pr_linux;          /* (p) linux abi */
-    int      pr_securelevel;        /* (p) securelevel */
-    struct task  pr_task;           /* (d) destroy task */
-    struct mtx   pr_mtx;
+        LIST_ENTRY(prison) pr_list;                     /* (a) all prisons */
+        int              pr_id;                         /* (c) prison id */
+        int              pr_ref;                        /* (p) refcount */
+        char             pr_path[MAXPATHLEN];           /* (c) chroot path */
+        struct vnode    *pr_root;                       /* (c) vnode to rdir */
+        char             pr_host[MAXHOSTNAMELEN];       /* (p) jail hostname */
+        u_int32_t        pr_ip;                         /* (c) ip addr host */
+        void            *pr_linux;                      /* (p) linux abi */
+        int              pr_securelevel;                /* (p) securelevel */
+        struct task      pr_task;                       /* (d) destroy task */
+        struct mtx       pr_mtx;
+      void            **pr_slots;                     /* (p) additional data */
 };</programlisting>

-        <para>The jail() system call then allocates memory for a
-        pointer to a prison structure and copies data between the two
-        structures.</para>
+        <para>The &man.jail.2; system call then allocates memory for
+        a <literal>prison</literal> structure and copies data between
+        the <literal>jail</literal> and <literal>prison</literal>
+        structure.</para>

        <programlisting><filename>/usr/src/sys/kern/kern_jail.c</filename>:
 MALLOC(pr, struct prison *, sizeof(*pr), M_PRISON, M_WAITOK | M_ZERO);
@ -289,49 +297,86 @@ if (error)
    goto e_killmtx;
 ...
 error = copyinstr(j.hostname, &amp;pr-&gt;pr_host, sizeof(pr-&gt;pr_host), 0);
- if (error)
-     goto e_dropvnref;</programlisting>
-        <para>These next three lines in the source are very important,
-          as they specify how the kernel recognizes a process as
-          jailed. Each process on a &unix; system is described by its
-          own proc structure. You can see the whole proc structure in
-          <filename>/usr/include/sys/proc.h</filename>. For example,
-          the td argument in any system call is actually a pointer to
-          that calling thread's thread structure, as stated before. The
-          td-&gt;td_proc is a pointer to the calling process' process
-          structure.  The proc structure contains nodes which can describe
-          the owner's identity (<literal>p_ucred</literal>), the process
-          resource limits (<literal>p_limit</literal>), and so on. In the
-          definition of the ucred structure, there is a pointer to a
-          prison structure. (<literal>cr_prison</literal>).</para>
+if (error)
+     goto e_dropvnref;
+pr-&gt;pr_ip = j.ip_number;</programlisting>
+        <para>Next, we will discuss another important system call
+          &man.jail.attach.2;, which implements the function to put
+          a process into the <application>jail</application>.</para>
+        <programlisting><filename>/usr/src/sys/kern/kern_jail.c</filename>:
+/*
+ * struct jail_attach_args {
+ *      int jid;
+ * };
+ */
+int
+jail_attach(struct thread *td, struct jail_attach_args *uap)</programlisting>
+        <para>This system call makes the changes that can distinguish
+          a jailed process from those unjailed ones.
+          To understand what &man.jail.attach.2; does for us, certain
+          background information is needed.</para>
+        <para>
+          On FreeBSD, each kernel visible thread is identified by its
+          <literal>thread</literal> structure, while the processes are
+          described by their <literal>proc</literal> structures. You can
+          find the definitions of the <literal>thread</literal> and
+          <literal>proc</literal> structure in
+          <filename>/usr/include/sys/proc.h</filename>.
+          For example, the <literal>td</literal> argument in any system
+          call is actually a pointer to the calling thread's
+          <literal>thread</literal> structure, as stated before.
+          The <literal>td_proc</literal> member in the
+          <literal>thread</literal> structure pointed by <literal>td</literal>
+          is a pointer to the <literal>proc</literal> structure which
+          represents the process that contains the thread represented by
+          <literal>td</literal>. The <literal>proc</literal> structure
+          contains members which can describe the owner's
+          identity(<literal>p_ucred</literal>), the process resource
+          limits(<literal>p_limit</literal>), and so on. In the
+          <literal>ucred</literal> structure pointed by
+          <literal>p_ucred</literal> member in the <literal>proc</literal>
+          structure, there is a pointer to the <literal>prison</literal>
+          structure(<literal>cr_prison</literal>).</para>

        <programlisting><filename>/usr/include/sys/proc.h: </filename>
+struct thread {
+    ...
+    struct proc *td_proc;
+    ...
+};
 struct proc { 
-...
-struct ucred *p_ucred; 
-...
+    ...
+    struct ucred *p_ucred; 
+    ...
 };
 <filename>/usr/include/sys/ucred.h</filename>
 struct ucred {
-...
-struct prison *cr_prison;
-...
+    ...
+    struct prison *cr_prison;
+    ...
 };</programlisting>

-        <para>In <filename>kern_jail.c</filename>, the function then
-          calls function jail_attach with a given jid. And the jail_attach
-          calls function change_root to change the root directory of the
-          calling process.  The jail_attach function then creates a new ucred
-          structure, and attaches the newly created ucred structure to the
-          calling process after it has successfully attaches the prison on the
-          cred structure. From then on, the calling process is recognized as
-          jailed. When calls function jailed with the newly created ucred
-          structure as the argument, it returns 1 to tell that the credential
-          is in a jail. The parent process of each process, forked within 
-          the jail, is the program jail itself, as it calls the &man.jail.2;
-          system call.  When the program is executed through execve, it
-          inherits the properties of its parent's ucred structure, therefore it
-          has the jailed ucred structure.</para>
+        <para>In <filename>kern_jail.c</filename>, the function
+          <literal>jail()</literal> then calls function
+          <literal>jail_attach()</literal> with a given <literal>jid</literal>.
+          And <literal>jail_attach()</literal> calls function
+          <literal>change_root()</literal> to change the root directory of the
+          calling process. The <literal>jail_attach()</literal> then creates
+          a new <literal>ucred</literal> structure, and attaches the newly
+          created <literal>ucred</literal> structure to the calling process
+          after it has successfully attached the <literal>prison</literal>
+          structure to the <literal>ucred</literal> structure. From then on,
+          the calling process is recognized as jailed. When the kernel routine
+          <literal>jailed()</literal> is called in the kernel with the newly
+          created <literal>ucred</literal> structure as its argument, it
+          returns 1 to tell that the credential is connected
+          with a <application>jail</application>. The public ancestor process
+          of all the process forked within the <application>jail</application>,
+          is the process which runs &man.jail.8;, as it calls the
+          &man.jail.2; system call. When a program is executed through
+          &man.execve.2;, it inherits the jailed property of its parent's
+          <literal>ucred</literal> structure, therefore it has a jailed
+          <literal>ucred</literal> structure.</para>

        <programlisting><filename>/usr/src/sys/kern/kern_jail.c</filename>
 int
@ -363,14 +408,15 @@ jail_attach(struct thread *td, struct jail_attach_args *uap)
    p-&gt;p_ucred = newcred;
 ...
 }</programlisting>
-        <para>When a process is forked from a parent process, the
-          &man.fork.2; system call uses crhold to maintain the credential
-          for the newly forked process. It inherently keep the newly forked
-          child's credential consistent with its parent, so the child process
-          is also jailed.</para>
+        <para>When a process is forked from its parent process, the
+          &man.fork.2; system call uses <literal>crhold()</literal> to
+          maintain the credential for the newly forked process. It inherently
+          keep the newly forked child's credential consistent with its parent,
+          so the child process is also jailed.</para>

        <programlisting><filename>/usr/src/sys/kern/kern_fork.c</filename>:
 p2-&gt;p_ucred = crhold(td-&gt;td_ucred);
+...
 td2-&gt;td_ucred = crhold(p2-&gt;p_ucred);</programlisting>

      </sect3>
@ -381,12 +427,13 @@ td2-&gt;td_ucred = crhold(p2-&gt;p_ucred);</programlisting>
    <title>Restrictions</title>

    <para>Throughout the kernel there are access restrictions relating
-      to jailed processes. Usually, these restrictions only check if
+      to jailed processes. Usually, these restrictions only check whether
      the process is jailed, and if so, returns an error. For
      example:</para>

-    <programlisting>if (jailed(td-&gt;td_ucred))
-        return (EPERM);</programlisting>
+    <programlisting>
+if (jailed(td-&gt;td_ucred))
+    return (EPERM);</programlisting>

    <sect2>
      <title>SysV IPC</title>
@ -395,43 +442,46 @@ td2-&gt;td_ucred = crhold(p2-&gt;p_ucred);</programlisting>

      <para>System V IPC is based on messages. Processes can send each
        other these messages which tell them how to act. The functions
-        which deal with messages are: <literal>msgsys</literal>,
-        <literal>msgctl</literal>, <literal>msgget</literal>,
-        <literal>msgsend</literal> and <literal>msgrcv</literal>.
+        which deal with messages are: 
+        &man.msgctl.3;, &man.msgget.3;, &man.msgsnd.3; and &man.msgrcv.3;.
        Earlier, I mentioned that there were certain sysctls you could
-        turn on or off in order to affect the behavior of Jail. One of
-        these sysctls was <literal>security.jail.sysvipc_allowed</literal>.
-        On most systems, this sysctl is set to 0. If it were set to 1,
-        it would defeat the whole purpose of having a jail; privileged
-        users from within the jail would be able to affect processes
-        outside of the environment. The difference between a message
-        and a signal is that the message only consists of the signal
-        number.</para>
+        turn on or off in order to affect the behavior of
+        <application>jail</application>. One of these sysctls was
+        <literal>security.jail.sysvipc_allowed</literal>.  By default,
+        this sysctl is set to 0. If it were set to 1, it would defeat the
+        whole purpose of having a <application>jail</application>; privileged
+        users from the <application>jail</application> would be able to
+        affect processes outside the jailed environment. The difference
+        between a message and a signal is that the message only consists
+        of the signal number.</para>

      <para><filename>/usr/src/sys/kern/sysv_msg.c</filename>:</para>

      <itemizedlist>
-        <listitem> <para>&man.msgget.3;: msgget returns (and possibly
-        creates) a message descriptor that designates a message queue
-        for use in other system calls.</para></listitem>
+        <listitem> <para><literal>msgget(key, msgflg)</literal>:
+        <literal>msgget</literal> returns (and possibly creates) a message
+        descriptor that designates a message queue for use in other
+        functions.</para></listitem>

-        <listitem> <para>&man.msgctl.3;: Using this function, a process
-        can query the status of a message
+        <listitem> <para><literal>msgctl(msgid, cmd, buf)</literal>:
+        Using this function, a process can query the status of a message
        descriptor.</para></listitem>

-        <listitem> <para>&man.msgsnd.3;: msgsnd sends a message to a
+        <listitem> <para><literal>msgsnd(msgid, msgp, msgsz, msgflg)</literal>:
+        <literal>msgsnd</literal> sends a message to a
        process.</para></listitem>

-        <listitem> <para>&man.msgrcv.3;: a process receives messages using
+        <listitem> <para><literal>msgrcv(msgid, msgp, msgsz, msgtyp,
+        msgflg)</literal>: a process receives messages using
        this function</para></listitem>

      </itemizedlist>

-      <para>In each of these system calls, there is this
-        conditional:</para>
+      <para>In each of the system calls corresponding to these functions,
+        there is this conditional:</para>

      <programlisting><filename>/usr/src/sys/kern/sysv_msg.c</filename>:
-if (!jail_sysvipc_allowed &amp;&amp; jailed(td-&gt;td_ucred)
+if (!jail_sysvipc_allowed &amp;&amp; jailed(td-&gt;td_ucred))
    return (ENOSYS);</programlisting>

      <indexterm><primary>semaphores</primary></indexterm>
@ -441,30 +491,30 @@ if (!jail_sysvipc_allowed &amp;&amp; jailed(td-&gt;td_ucred)
        processes lock resources. However, process waiting on a
        semaphore, that is being used, will sleep until the resources
        are relinquished. The following semaphore system calls are
-        blocked inside a jail: <literal>semsys</literal>,
-        <literal>semget</literal>, <literal>semctl</literal> and
-        <literal>semop</literal>.</para>
+        blocked inside a <application>jail</application>: &man.semget.2;,
+        &man.semctl.2; and &man.semop.2;.</para>

      <para><filename>/usr/src/sys/kern/sysv_sem.c</filename>:</para>

      <itemizedlist>
        <listitem>
-          <para>&man.semctl.2;<literal>(id, num, cmd, arg)</literal>:
-            Semctl does the specified cmd on the semaphore queue
-            indicated by id.</para></listitem>
+          <para><literal>semctl(semid, semnum, cmd, ...)</literal>:
+            <literal>semctl</literal> does the specified <literal>cmd</literal>
+            on the semaphore queue indicated by
+            <literal>semid</literal>.</para></listitem>

        <listitem>
-           <para>&man.semget.2;<literal>(key, nsems, flag)</literal>:
-           Semget creates an array of semaphores, corresponding to
-           key.</para>
+           <para><literal>semget(key, nsems, flag)</literal>:
+            <literal>semget</literal> creates an array of semaphores,
+            corresponding to <literal>key</literal>.</para>

-          <para><literal>Key and flag take on the same meaning as they
+          <para><literal>key and flag take on the same meaning as they
          do in msgget.</literal></para></listitem>

-        <listitem><para>&man.semop.2;<literal>(semid, sops, nsops)</literal>:
-          Semop does the set of semaphore operations in the array of
-          structures ops, to the set of semaphores identified by
-          id.</para></listitem>
+        <listitem><para><literal>semop(semid, array, nops)</literal>:
+          <literal>semop</literal> performs a group of operations indicated
+          by <literal>array</literal>, to the set of semaphores identified by
+          <literal>semid</literal>.</para></listitem>
      </itemizedlist>

      <indexterm><primary>shared memory</primary></indexterm>
@ -472,28 +522,29 @@ if (!jail_sysvipc_allowed &amp;&amp; jailed(td-&gt;td_ucred)
        memory. Processes can communicate directly with each other by
        sharing parts of their virtual address space and then reading
        and writing data stored in the shared memory. These system
-        calls are blocked within a jailed environment: <literal>shmdt,
-        shmat, oshmctl, shmctl, shmget</literal>, and
-        <literal>shmsys</literal>.</para>
+        calls are blocked within a jailed environment: &man.shmdt.2;,
+        &man.shmat.2;, &man.shmctl.2; and &man.shmget.2;.</para>

      <para><filename>/usr/src/sys/kern/sysv_shm.c</filename>:</para>

      <itemizedlist>
-        <listitem><para>&man.shmctl.2;<literal>(shmid, cmd, buf)</literal>:
-        shmctl does various control operations on the shared memory
-        region identified by id.</para></listitem>
+        <listitem><para><literal>shmctl(shmid, cmd, buf)</literal>:
+        <literal>shmctl</literal> does various control operations on the
+        shared memory region identified by
+        <literal>shmid</literal>.</para></listitem>

-        <listitem><para>&man.shmget.2;<literal>(key, size,
-        shmflg)</literal>: shmget accesses or creates a shared memory
-        region of size bytes.</para></listitem>
+        <listitem><para><literal>shmget(key, size, flag)</literal>:
+        <literal>shmget</literal> accesses or creates a shared memory
+        region of <literal>size</literal> bytes.</para></listitem>

-        <listitem><para>&man.shmat.2;<literal>(shmid, shmaddr, shmflg)</literal>:
-        shmat attaches a shared memory region identified by id to the
-        address space of a process.</para></listitem>
+        <listitem><para><literal>shmat(shmid, addr, flag)</literal>:
+        <literal>shmat</literal> attaches a shared memory region identified
+        by <literal>shmid</literal> to the address space of a
+        process.</para></listitem>

-        <listitem><para>&man.shmdt.2;<literal>(shmaddr)</literal>: shmdt
-        detaches the shared memory region previously attached at
-        addr.</para></listitem>
+        <listitem><para><literal>shmdt(addr)</literal>:
+        <literal>shmdt</literal> detaches the shared memory region
+        previously attached at <literal>addr</literal>.</para></listitem>

      </itemizedlist>
    </sect2>
@ -502,10 +553,10 @@ if (!jail_sysvipc_allowed &amp;&amp; jailed(td-&gt;td_ucred)
      <title>Sockets</title>

      <indexterm><primary>sockets</primary></indexterm>
-      <para>Jail treats the &man.socket.2; system call and related
-        lower-level socket functions in a special manner. In order to
-        determine whether a certain socket is allowed to be created,
-        it first checks to see if the sysctl
+      <para><application>Jail</application> treats the &man.socket.2; system
+        call and related lower-level socket functions in a special manner.
+        In order to determine whether a certain socket is allowed to be
+        created, it first checks to see if the sysctl
        <literal>security.jail.socket_unixiproute_only</literal> is set. If
        set, sockets are only allowed to be created if the family
        specified is either <literal>PF_LOCAL</literal>,
@ -515,8 +566,8 @@ if (!jail_sysvipc_allowed &amp;&amp; jailed(td-&gt;td_ucred)

      <programlisting><filename>/usr/src/sys/kern/uipc_socket.c</filename>:
 int
-socreate(dom, aso, type, proto, cred, td)
-...
+socreate(int dom, struct socket **aso, int type, int proto,
+    struct ucred *cred, struct thread *td)
 {
    struct protosw *prp;
 ...
@ -537,11 +588,10 @@ socreate(dom, aso, type, proto, cred, td)
      <indexterm><primary>Berkeley Packet Filter</primary></indexterm>
      <indexterm><primary>data link layer</primary></indexterm>

-      <para>The Berkeley Packet Filter provides a raw interface to
-        data link layers in a protocol independent fashion. The
-        function <literal>bpfopen()</literal> opens an Ethernet
-        device. It's now controlled by the devfs whether can be used
-        in the jail.
+      <para>The <application>Berkeley Packet Filter</application> provides
+        a raw interface to data link layers in a protocol independent
+        fashion. <application>BPF</application> is now controlled by the
+        &man.devfs.8; whether it can be used in a jailed environment.</para>

    </sect2>

@ -554,17 +604,20 @@ socreate(dom, aso, type, proto, cred, td)
        TCP, UDP, IP and ICMP. IP and ICMP are on the same level: the
        network layer 2. There are certain precautions which are
        taken in order to prevent a jailed process from binding a
-        protocol to a certain port only if the <literal>nam</literal>
-        parameter is set. nam is a pointer to a sockaddr structure,
+        protocol to a certain address only if the <literal>nam</literal>
+        parameter is set. <literal>nam</literal> is a pointer to a
+        <literal>sockaddr</literal> structure,
        which describes the address on which to bind the service. A
-        more exact definition is that sockaddr "may be used as a
-        template for referring to the identifying tag and length of
-        each address"[2]. In the function
-        <literal>in_pcbbind_setup</literal>, <literal>sin</literal> is a
-        pointer to a sockaddr_in structure, which contains the port,
-        address, length and domain family of the socket which is to be
-        bound. Basically, this disallows any processes from jail to be
-        able to specify the domain family.</para>
+        more exact definition is that <literal>sockaddr</literal> "may be
+        used as a template for referring to the identifying tag and length of
+        each address". In the function
+        <literal>in_pcbbind_setup()</literal>, <literal>sin</literal> is a
+        pointer to a <literal>sockaddr_in</literal> structure, which
+        contains the port, address, length and domain family of the socket
+        which is to be bound. Basically, this disallows any processes from
+        <application>jail</application> to be able to specify the address
+        that doesn't belong to the <application>jail</application> in which
+        the calling process exists.</para>

      <programlisting><filename>/usr/src/sys/netinet/in_pcb.c</filename>: 
 int
@ -577,30 +630,39 @@ in_pcbbind_setup(struct inpcb *inp, struct sockaddr *nam, in_addr_t *laddrp,
    if (nam) {
        sin = (struct sockaddr_in *)nam;
        ...
-#ifdef notdef
-        /*
-         * We should check the family, but old programs
-         * incorrectly fail to initialize it.
-         */
-        if (sin->sin_family != AF_INET)
-            return (EAFNOSUPPORT);
-#endif
        if (sin-&gt;sin_addr.s_addr != INADDR_ANY)
            if (prison_ip(cred, 0, &amp;sin-&gt;sin_addr.s_addr))
                return(EINVAL);
        ...
+        if (lport) {
+            ...
+            if (prison &amp;&amp; prison_ip(cred, 0, &amp;sin-&gt;sin_addr.s_addr))
+                return (EADDRNOTAVAIL);
+            ...
+        }
    }
+    if (lport == 0) {
+        ...
+        if (laddr.s_addr != INADDR_ANY)
+            if (prison_ip(cred, 0, &amp;laddr.s_addr))
+                return (EINVAL);
+        ...
+    }
+...
+    if (prison_ip(cred, 0, &amp;laddr.s_addr))
+        return (EINVAL);
 ...
 }</programlisting>

      <para>You might be wondering what function
-        <literal>prison_ip()</literal> does. prison_ip is given three
-        arguments, a pointer to the credential(represented by
-        <literal>cred</literal>), any flags, and an ip address. It
-        returns 1 if the ip address does NOT belong to the jail or
-        0 otherwise.  As you can see from the code, if it is indeed
-        an ip address not belonging to the jail, the protcol is
-        not allowed to bind to a certain port.</para>
+        <literal>prison_ip()</literal> does. <literal>prison_ip()</literal>
+        is given three arguments, a pointer to the credential(represented by
+        <literal>cred</literal>), any flags, and an IP address. It
+        returns 1 if the IP address does NOT belong to the
+        <application>jail</application> or 0 otherwise.  As you can see
+        from the code, if it is indeed an IP address not belonging to the
+        <application>jail</application>, the protcol is not allowed to bind
+        to that address.</para>

      <programlisting><filename>/usr/src/sys/kern/kern_jail.c:</filename>
 int
@ -632,51 +694,59 @@ prison_ip(struct ucred *cred, int flag, u_int32_t *ip)
        return (1);
    return (0);
 }</programlisting>
-
-      <para>Jailed users are not allowed to bind services to an ip
-        which does not belong to the jail. The restriction is also
-        written within the function <literal>in_pcbbind_setup</literal>:</para>
-
-      <programlisting><filename>/usr/src/sys/netinet/in_pcb.c</filename>
-        if (nam) {
-               ... 
-               lport = sin-&gt;sin.port; 
-               ... if (lport) { 
-                         ... 
-                         if (jailed(cred))
-                                prison = 1; 
-                         ...
-                         if (prison &amp;&amp;
-                             prison_ip(cred, 0, &amp;sin-&gt;sin_addr.s_addr))
-			            return (EADDRNOTAVAIL);</programlisting>
-
    </sect2>

    <sect2>
      <title>Filesystem</title>

      <indexterm><primary>filesystem</primary></indexterm>
-      <para>Even root users within the jail are not allowed to set any
-        file flags, such as immutable, append, and no unlink flags, if
-        the securelevel is greater than 0.</para>
+      <para>Even <literal>root</literal> users within the
+        <application>jail</application> are not allowed to unset or modify
+        any file flags, such as immutable, append-only, and undeleteable
+        flags, if the securelevel is greater than 0.</para>

-      <programlisting>/usr/src/sys/ufs/ufs/ufs_vnops.c:
+      <programlisting><filename>/usr/src/sys/ufs/ufs/ufs_vnops.c:</filename>
 static int
 ufs_setattr(ap)
    ...
 {
    ...
-        if (!suser_cred(cred,
-            jail_chflags_allowed ? SUSER_ALLOWJAIL : 0)) {
+        if (!priv_check_cred(cred, PRIV_VFS_SYSFLAGS, 0)) {
            if (ip-&gt;i_flags
                &amp; (SF_NOUNLINK | SF_IMMUTABLE | SF_APPEND)) {
-                error = securelevel_gt(cred, 0);
-                if (error)
-                    return (error);
+                    error = securelevel_gt(cred, 0);
+                    if (error)
+                        return (error);
            }
            ...
+        }
+}
+<filename>/usr/src/sys/kern/kern_priv.c</filename>
+int
+priv_check_cred(struct ucred *cred, int priv, int flags)
+{
+    ...
+    error = prison_priv_check(cred, priv);
+    if (error)
+        return (error);
+    ...
+}
+<filename>/usr/src/sys/kern/kern_jail.c</filename>
+int
+prison_priv_check(struct ucred *cred, int priv)
+{
+    ...
+    switch (priv) {
+    ...
+    case PRIV_VFS_SYSFLAGS:
+        if (jail_chflags_allowed)
+            return (0);
+        else
+            return (EPERM);
+    ...
+    }
+    ...
 }</programlisting>
-
    </sect2>

  </sect1>