Update this chapter with respect to our current state of jail

infrastructure, plus several markup corrections.

Submitted by:	MQ <antinvidia at gmail dot com>
This commit is contained in:
Xin LI 2008-02-01 18:54:32 +00:00
parent 26fad0c4c1
commit 0b444add8d
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=31407

View file

@ -24,62 +24,65 @@
<indexterm><primary>Jail</primary></indexterm>
<indexterm><primary>root</primary></indexterm>
<para>On most &unix; systems, root has omnipotent power. This promotes
insecurity. If an attacker were to gain root on a system, he would
have every function at his fingertips. In FreeBSD there are
sysctls which dilute the power of root, in order to minimize the
damage caused by an attacker. Specifically, one of these functions
is called secure levels. Similarly, another function which is
present from FreeBSD 4.0 and onward, is a utility called
&man.jail.8;. <application>Jail</application> chroots an
environment and sets certain restrictions on processes which are
forked from within. For example, a jailed process cannot affect
processes outside of the jail, utilize certain system calls, or
inflict any damage on the main computer.</para>
<para>On most &unix; systems, <literal>root</literal> has omnipotent power.
This promotes insecurity. If an attacker gained <literal>root</literal>
on a system, he would have every function at his fingertips. In FreeBSD
there are sysctls which dilute the power of <literal>root</literal>, in
order to minimize the damage caused by an attacker. Specifically, one of
these functions is called <literal>secure levels</literal>. Similarly,
another function which is present from FreeBSD 4.0 and onward, is a utility
called &man.jail.8;. <application>Jail</application> chroots an environment
and sets certain restrictions on processes which are forked within
the <application>jail</application>. For example, a jailed process
cannot affect processes outside the <application>jail</application>,
utilize certain system calls, or inflict any damage on the host
environment.</para>
<para><application>Jail</application> is becoming the new security
model. People are running potentially vulnerable servers such as
Apache, BIND, and sendmail within jails, so that if an attacker
gains root within the <application>Jail</application>, it is only
an annoyance, and not a devastation. This article focuses on the
internals (source code) of <application>Jail</application>.
It will also suggest improvements upon the jail code base which
are already being worked on. If you are looking for a how-to on
setting up a <application>Jail</application>, I suggest you look
at my other article in Sys Admin Magazine, May 2001, entitled
"Securing FreeBSD using <application>Jail</application>."</para>
<application>Apache</application>, <application>BIND</application>, and
<application>sendmail</application> within jails, so that if an attacker
gains <literal>root</literal> within the <application>jail</application>,
it is only an annoyance, and not a devastation. This article mainly
focuses on the internals (source code) of <application>jail</application>.
If you are looking for a how-to on setting up a
<application>jail</application>, I suggest you look at my other article
in Sys Admin Magazine, May 2001, entitled "Securing FreeBSD using
<application>Jail</application>."</para>
<sect1 id="jail-arch">
<title>Architecture</title>
<para>
<application>Jail</application> consists of two realms: the
user-space program, jail, and the code implemented within the
kernel: the <literal>jail</literal> system call and associated
restrictions. I will be discussing the user-space program and
then how jail is implemented within the kernel.</para>
userland program, &man.jail.8;, and the code implemented within
the kernel: the &man.jail.2; system call and associated
restrictions. I will be discussing the userland program and
then how <application>jail</application> is implemented within
the kernel.</para>
<sect2>
<title>Userland code</title>
<title>Userland Code</title>
<indexterm><primary>Jail</primary>
<secondary>userland program</secondary></indexterm>
<secondary>Userland Program</secondary></indexterm>
<para>The source for the user-land jail is located in
<filename>/usr/src/usr.sbin/jail</filename>, consisting of
one file, <filename>jail.c</filename>. The program takes these
arguments: the path of the jail, hostname, ip address, and the
command to be executed.</para>
<para>The source for the userland <application>jail</application>
is located in <filename>/usr/src/usr.sbin/jail</filename>,
consisting of one file, <filename>jail.c</filename>. The program
takes these arguments: the path of the <application>jail</application>,
hostname, IP address, and the command to be executed.</para>
<sect3>
<title>Data Structures</title>
<para>In <filename>jail.c</filename>, the first thing I would
note is the declaration of an important structure
<literal>struct jail j</literal>; which was included from
<literal>struct jail j;</literal> which was included from
<filename>/usr/include/sys/jail.h</filename>.</para>
<para>The definition of the jail structure is:</para>
<para>The definition of the <literal>jail</literal> structure is:
</para>
<programlisting><filename>/usr/include/sys/jail.h</filename>:
@ -91,8 +94,8 @@ struct jail {
};</programlisting>
<para>As you can see, there is an entry for each of the
arguments passed to the jail program, and indeed, they are
set during its execution.</para>
arguments passed to the &man.jail.8; program, and indeed,
they are set during its execution.</para>
<programlisting><filename>/usr/src/usr.sbin/jail/jail.c</filename>
char path[PATH_MAX];
@ -111,10 +114,11 @@ j.hostname = argv[1];</programlisting>
<sect3>
<title>Networking</title>
<para>One of the arguments passed to the Jail program is an IP
address with which the jail can be accessed over the
network. Jail translates the ip address given into host
byte order and then stores it in j (the jail structure).</para>
<para>One of the arguments passed to the &man.jail.8; program is
an IP address with which the <application>jail</application>
can be accessed over the network. &man.jail.8; translates the
IP address given into host byte order and then stores it in
<literal>j</literal> (the <literal>jail</literal> structure).</para>
<programlisting><filename>/usr/src/usr.sbin/jail/jail.c</filename>:
struct in_addr in;
@ -123,37 +127,35 @@ if (inet_aton(argv[2], &amp;in) == 0)
errx(1, "Could not make sense of ip-number: %s", argv[2]);
j.ip_number = ntohl(in.s_addr);</programlisting>
<para>The
<citerefentry><refentrytitle>inet_aton</refentrytitle>
<manvolnum>3</manvolnum></citerefentry>
function "interprets the specified character string as an
Internet address, placing the address into the structure
provided." The ip number node in the jail structure is set
only when the ip address placed onto the in structure by
inet_aton is translated into host byte order by
<function>ntohl()</function>.</para>
<para>The &man.inet.aton.3; function "interprets the specified
character string as an Internet address, placing the address
into the structure provided." The <literal>ip_number</literal>
member in the <literal>jail</literal> structure is set only
when the IP address placed onto the <literal>in</literal>
structure by &man.inet.aton.3; is translated into host byte
order by &man.ntohl.3;.</para>
</sect3>
<sect3>
<title>Jailing The Process</title>
<para>Finally, the userland program jails the process, and
executes the command specified. Jail now becomes an
imprisoned process itself and then executes the command
given using &man.execv.3;</para>
<programlisting><filename>/usr/src/sys/usr.sbin/jail/jail.c</filename>
<para>Finally, the userland program jails the process.
<application>Jail</application> now becomes an imprisoned
process itself and then executes the command given using
&man.execv.3;.</para>
<programlisting><filename>/usr/src/usr.sbin/jail/jail.c</filename>
i = jail(&amp;j);
...
if (execv(argv[3], argv + 3) != 0)
err(1, "execv: %s", argv[3]);</programlisting>
<para>As you can see, the jail function is being called, and
its argument is the jail structure which has been filled
with the arguments given to the program. Finally, the
program you specify is executed. I will now discuss how Jail
is implemented within the kernel.</para>
<para>As you can see, the <literal>jail()</literal> function is
called, and its argument is the <literal>jail</literal> structure
which has been filled with the arguments given to the program.
Finally, the program you specify is executed. I will now discuss
how <application>jail</application> is implemented within the
kernel.</para>
</sect3>
</sect2>
@ -161,12 +163,12 @@ if (execv(argv[3], argv + 3) != 0)
<title>Kernel Space</title>
<indexterm><primary>Jail</primary>
<secondary>kernel architecture</secondary></indexterm>
<secondary>Kernel Architecture</secondary></indexterm>
<para>We will now be looking at the file
<filename>/usr/src/sys/kern/kern_jail.c</filename>. This is
the file where the jail system call, appropriate sysctls, and
networking functions are defined.</para>
the file where the &man.jail.2; system call, appropriate sysctls,
and networking functions are defined.</para>
<sect3>
<title>sysctls</title>
@ -186,7 +188,7 @@ SYSCTL_INT(_security_jail, OID_AUTO, set_hostname_allowed, CTLFLAG_RW,
int jail_socket_unixiproute_only = 1;
SYSCTL_INT(_security_jail, OID_AUTO, socket_unixiproute_only, CTLFLAG_RW,
&amp;jail_socket_unixiproute_only, 0,
"Processes in jail are limited to creating &unix;/IPv4/route sockets only");
"Processes in jail are limited to creating UNIX/IPv4/route sockets only");
int jail_sysvipc_allowed = 0;
SYSCTL_INT(_security_jail, OID_AUTO, sysvipc_allowed, CTLFLAG_RW,
@ -206,10 +208,15 @@ SYSCTL_INT(_security_jail, OID_AUTO, allow_raw_sockets, CTLFLAG_RW,
int jail_chflags_allowed = 0;
SYSCTL_INT(_security_jail, OID_AUTO, chflags_allowed, CTLFLAG_RW,
&amp;jail_chflags_allowed, 0,
"Processes in jail can alter system file flags");</programlisting>
"Processes in jail can alter system file flags");
int jail_mount_allowed = 0;
SYSCTL_INT(_security_jail, OID_AUTO, mount_allowed, CTLFLAG_RW,
&amp;jail_mount_allowed, 0,
"Processes in jail can mount/unmount jail-friendly file systems");</programlisting>
<para>Each of these sysctls can be accessed by the user
through the sysctl program. Throughout the kernel, these
through the &man.sysctl.8; program. Throughout the kernel, these
specific sysctls are recognized by their name. For example,
the name of the first sysctl is
<literal>security.jail.set_hostname_allowed</literal>.</para>
@ -221,18 +228,17 @@ SYSCTL_INT(_security_jail, OID_AUTO, chflags_allowed, CTLFLAG_RW,
<para>Like all system calls, the &man.jail.2; system call takes
two arguments, <literal>struct thread *td</literal> and
<literal>struct jail_args *uap</literal>.
<literal>td</literal> is a pointer to the thread
<literal>td</literal> is a pointer to the <literal>thread</literal>
structure which describes the calling thread. In this
context, uap is a pointer to the structure which specifies the
arguments given to &man.jail.2; from the userland program
<filename>jail.c</filename>. When I described the userland
program before, you saw that the &man.jail.2; system call was
given a jail structure as its own argument.</para>
context, <literal>uap</literal> is a pointer to the structure
in which a pointer to the <literal>jail</literal> structure
passed by the userland <filename>jail.c</filename> is contained.
When I described the userland program before, you saw that the
&man.jail.2; system call was given a <literal>jail</literal>
structure as its own argument.</para>
<programlisting><filename>/usr/src/sys/kern/kern_jail.c:</filename>
/*
* MPSAFE
*
* struct jail_args {
* struct jail *jail;
* };
@ -240,46 +246,48 @@ SYSCTL_INT(_security_jail, OID_AUTO, chflags_allowed, CTLFLAG_RW,
int
jail(struct thread *td, struct jail_args *uap)</programlisting>
<para>Therefore, <literal>uap-&gt;jail</literal> would access the
jail structure which was passed to the system call. Next,
the system call copies the jail structure into kernel space
using the <literal>copyin()</literal>
function. <literal>copyin()</literal> takes three arguments:
the data which is to be copied into kernel space,
<para>Therefore, <literal>uap-&gt;jail</literal> can be used to
access the <literal>jail</literal> structure which was passed
to the system call. Next, the system call copies the
<literal>jail</literal> structure into kernel space using
the &man.copyin.9; function. &man.copyin.9; takes three arguments:
the address of the data which is to be copied into kernel space,
<literal>uap-&gt;jail</literal>, where to store it,
<literal>j</literal> and the size of the storage. The jail
structure <literal>uap-&gt;jail</literal> is copied into kernel
space and stored in another jail structure,
<literal>j</literal> and the size of the storage. The
<literal>jail</literal> structure pointed by
<literal>uap-&gt;jail</literal> is copied into kernel space and
is stored in another <literal>jail</literal> structure,
<literal>j</literal>.</para>
<programlisting><filename>/usr/src/sys/kern/kern_jail.c: </filename>
error = copyin(uap-&gt;jail, &amp;j, sizeof(j));</programlisting>
<para>There is another important structure defined in
jail.h. It is the prison structure
(<literal>pr</literal>). The prison structure is used
exclusively within kernel space. The &man.jail.2; system call
copies everything from the jail structure onto the prison
structure. Here is the definition of the prison structure.</para>
<filename>jail.h</filename>. It is the <literal>prison</literal>
structure. The <literal>prison</literal> structure is used
exclusively within kernel space. Here is the definition of the
<literal>prison</literal> structure.</para>
<programlisting><filename>/usr/include/sys/jail.h</filename>:
struct prison {
LIST_ENTRY(prison) pr_list; /* (a) all prisons */
int pr_id; /* (c) prison id */
int pr_ref; /* (p) refcount */
char pr_path[MAXPATHLEN]; /* (c) chroot path */
struct vnode *pr_root; /* (c) vnode to rdir */
char pr_host[MAXHOSTNAMELEN]; /* (p) jail hostname */
u_int32_t pr_ip; /* (c) ip addr host */
void *pr_linux; /* (p) linux abi */
int pr_securelevel; /* (p) securelevel */
struct task pr_task; /* (d) destroy task */
struct mtx pr_mtx;
LIST_ENTRY(prison) pr_list; /* (a) all prisons */
int pr_id; /* (c) prison id */
int pr_ref; /* (p) refcount */
char pr_path[MAXPATHLEN]; /* (c) chroot path */
struct vnode *pr_root; /* (c) vnode to rdir */
char pr_host[MAXHOSTNAMELEN]; /* (p) jail hostname */
u_int32_t pr_ip; /* (c) ip addr host */
void *pr_linux; /* (p) linux abi */
int pr_securelevel; /* (p) securelevel */
struct task pr_task; /* (d) destroy task */
struct mtx pr_mtx;
void **pr_slots; /* (p) additional data */
};</programlisting>
<para>The jail() system call then allocates memory for a
pointer to a prison structure and copies data between the two
structures.</para>
<para>The &man.jail.2; system call then allocates memory for
a <literal>prison</literal> structure and copies data between
the <literal>jail</literal> and <literal>prison</literal>
structure.</para>
<programlisting><filename>/usr/src/sys/kern/kern_jail.c</filename>:
MALLOC(pr, struct prison *, sizeof(*pr), M_PRISON, M_WAITOK | M_ZERO);
@ -289,49 +297,86 @@ if (error)
goto e_killmtx;
...
error = copyinstr(j.hostname, &amp;pr-&gt;pr_host, sizeof(pr-&gt;pr_host), 0);
if (error)
goto e_dropvnref;</programlisting>
<para>These next three lines in the source are very important,
as they specify how the kernel recognizes a process as
jailed. Each process on a &unix; system is described by its
own proc structure. You can see the whole proc structure in
<filename>/usr/include/sys/proc.h</filename>. For example,
the td argument in any system call is actually a pointer to
that calling thread's thread structure, as stated before. The
td-&gt;td_proc is a pointer to the calling process' process
structure. The proc structure contains nodes which can describe
the owner's identity (<literal>p_ucred</literal>), the process
resource limits (<literal>p_limit</literal>), and so on. In the
definition of the ucred structure, there is a pointer to a
prison structure. (<literal>cr_prison</literal>).</para>
if (error)
goto e_dropvnref;
pr-&gt;pr_ip = j.ip_number;</programlisting>
<para>Next, we will discuss another important system call
&man.jail.attach.2;, which implements the function to put
a process into the <application>jail</application>.</para>
<programlisting><filename>/usr/src/sys/kern/kern_jail.c</filename>:
/*
* struct jail_attach_args {
* int jid;
* };
*/
int
jail_attach(struct thread *td, struct jail_attach_args *uap)</programlisting>
<para>This system call makes the changes that can distinguish
a jailed process from those unjailed ones.
To understand what &man.jail.attach.2; does for us, certain
background information is needed.</para>
<para>
On FreeBSD, each kernel visible thread is identified by its
<literal>thread</literal> structure, while the processes are
described by their <literal>proc</literal> structures. You can
find the definitions of the <literal>thread</literal> and
<literal>proc</literal> structure in
<filename>/usr/include/sys/proc.h</filename>.
For example, the <literal>td</literal> argument in any system
call is actually a pointer to the calling thread's
<literal>thread</literal> structure, as stated before.
The <literal>td_proc</literal> member in the
<literal>thread</literal> structure pointed by <literal>td</literal>
is a pointer to the <literal>proc</literal> structure which
represents the process that contains the thread represented by
<literal>td</literal>. The <literal>proc</literal> structure
contains members which can describe the owner's
identity(<literal>p_ucred</literal>), the process resource
limits(<literal>p_limit</literal>), and so on. In the
<literal>ucred</literal> structure pointed by
<literal>p_ucred</literal> member in the <literal>proc</literal>
structure, there is a pointer to the <literal>prison</literal>
structure(<literal>cr_prison</literal>).</para>
<programlisting><filename>/usr/include/sys/proc.h: </filename>
struct thread {
...
struct proc *td_proc;
...
};
struct proc {
...
struct ucred *p_ucred;
...
...
struct ucred *p_ucred;
...
};
<filename>/usr/include/sys/ucred.h</filename>
struct ucred {
...
struct prison *cr_prison;
...
...
struct prison *cr_prison;
...
};</programlisting>
<para>In <filename>kern_jail.c</filename>, the function then
calls function jail_attach with a given jid. And the jail_attach
calls function change_root to change the root directory of the
calling process. The jail_attach function then creates a new ucred
structure, and attaches the newly created ucred structure to the
calling process after it has successfully attaches the prison on the
cred structure. From then on, the calling process is recognized as
jailed. When calls function jailed with the newly created ucred
structure as the argument, it returns 1 to tell that the credential
is in a jail. The parent process of each process, forked within
the jail, is the program jail itself, as it calls the &man.jail.2;
system call. When the program is executed through execve, it
inherits the properties of its parent's ucred structure, therefore it
has the jailed ucred structure.</para>
<para>In <filename>kern_jail.c</filename>, the function
<literal>jail()</literal> then calls function
<literal>jail_attach()</literal> with a given <literal>jid</literal>.
And <literal>jail_attach()</literal> calls function
<literal>change_root()</literal> to change the root directory of the
calling process. The <literal>jail_attach()</literal> then creates
a new <literal>ucred</literal> structure, and attaches the newly
created <literal>ucred</literal> structure to the calling process
after it has successfully attached the <literal>prison</literal>
structure to the <literal>ucred</literal> structure. From then on,
the calling process is recognized as jailed. When the kernel routine
<literal>jailed()</literal> is called in the kernel with the newly
created <literal>ucred</literal> structure as its argument, it
returns 1 to tell that the credential is connected
with a <application>jail</application>. The public ancestor process
of all the process forked within the <application>jail</application>,
is the process which runs &man.jail.8;, as it calls the
&man.jail.2; system call. When a program is executed through
&man.execve.2;, it inherits the jailed property of its parent's
<literal>ucred</literal> structure, therefore it has a jailed
<literal>ucred</literal> structure.</para>
<programlisting><filename>/usr/src/sys/kern/kern_jail.c</filename>
int
@ -363,14 +408,15 @@ jail_attach(struct thread *td, struct jail_attach_args *uap)
p-&gt;p_ucred = newcred;
...
}</programlisting>
<para>When a process is forked from a parent process, the
&man.fork.2; system call uses crhold to maintain the credential
for the newly forked process. It inherently keep the newly forked
child's credential consistent with its parent, so the child process
is also jailed.</para>
<para>When a process is forked from its parent process, the
&man.fork.2; system call uses <literal>crhold()</literal> to
maintain the credential for the newly forked process. It inherently
keep the newly forked child's credential consistent with its parent,
so the child process is also jailed.</para>
<programlisting><filename>/usr/src/sys/kern/kern_fork.c</filename>:
p2-&gt;p_ucred = crhold(td-&gt;td_ucred);
...
td2-&gt;td_ucred = crhold(p2-&gt;p_ucred);</programlisting>
</sect3>
@ -381,12 +427,13 @@ td2-&gt;td_ucred = crhold(p2-&gt;p_ucred);</programlisting>
<title>Restrictions</title>
<para>Throughout the kernel there are access restrictions relating
to jailed processes. Usually, these restrictions only check if
to jailed processes. Usually, these restrictions only check whether
the process is jailed, and if so, returns an error. For
example:</para>
<programlisting>if (jailed(td-&gt;td_ucred))
return (EPERM);</programlisting>
<programlisting>
if (jailed(td-&gt;td_ucred))
return (EPERM);</programlisting>
<sect2>
<title>SysV IPC</title>
@ -395,43 +442,46 @@ td2-&gt;td_ucred = crhold(p2-&gt;p_ucred);</programlisting>
<para>System V IPC is based on messages. Processes can send each
other these messages which tell them how to act. The functions
which deal with messages are: <literal>msgsys</literal>,
<literal>msgctl</literal>, <literal>msgget</literal>,
<literal>msgsend</literal> and <literal>msgrcv</literal>.
which deal with messages are:
&man.msgctl.3;, &man.msgget.3;, &man.msgsnd.3; and &man.msgrcv.3;.
Earlier, I mentioned that there were certain sysctls you could
turn on or off in order to affect the behavior of Jail. One of
these sysctls was <literal>security.jail.sysvipc_allowed</literal>.
On most systems, this sysctl is set to 0. If it were set to 1,
it would defeat the whole purpose of having a jail; privileged
users from within the jail would be able to affect processes
outside of the environment. The difference between a message
and a signal is that the message only consists of the signal
number.</para>
turn on or off in order to affect the behavior of
<application>jail</application>. One of these sysctls was
<literal>security.jail.sysvipc_allowed</literal>. By default,
this sysctl is set to 0. If it were set to 1, it would defeat the
whole purpose of having a <application>jail</application>; privileged
users from the <application>jail</application> would be able to
affect processes outside the jailed environment. The difference
between a message and a signal is that the message only consists
of the signal number.</para>
<para><filename>/usr/src/sys/kern/sysv_msg.c</filename>:</para>
<itemizedlist>
<listitem> <para>&man.msgget.3;: msgget returns (and possibly
creates) a message descriptor that designates a message queue
for use in other system calls.</para></listitem>
<listitem> <para><literal>msgget(key, msgflg)</literal>:
<literal>msgget</literal> returns (and possibly creates) a message
descriptor that designates a message queue for use in other
functions.</para></listitem>
<listitem> <para>&man.msgctl.3;: Using this function, a process
can query the status of a message
<listitem> <para><literal>msgctl(msgid, cmd, buf)</literal>:
Using this function, a process can query the status of a message
descriptor.</para></listitem>
<listitem> <para>&man.msgsnd.3;: msgsnd sends a message to a
<listitem> <para><literal>msgsnd(msgid, msgp, msgsz, msgflg)</literal>:
<literal>msgsnd</literal> sends a message to a
process.</para></listitem>
<listitem> <para>&man.msgrcv.3;: a process receives messages using
<listitem> <para><literal>msgrcv(msgid, msgp, msgsz, msgtyp,
msgflg)</literal>: a process receives messages using
this function</para></listitem>
</itemizedlist>
<para>In each of these system calls, there is this
conditional:</para>
<para>In each of the system calls corresponding to these functions,
there is this conditional:</para>
<programlisting><filename>/usr/src/sys/kern/sysv_msg.c</filename>:
if (!jail_sysvipc_allowed &amp;&amp; jailed(td-&gt;td_ucred)
if (!jail_sysvipc_allowed &amp;&amp; jailed(td-&gt;td_ucred))
return (ENOSYS);</programlisting>
<indexterm><primary>semaphores</primary></indexterm>
@ -441,30 +491,30 @@ if (!jail_sysvipc_allowed &amp;&amp; jailed(td-&gt;td_ucred)
processes lock resources. However, process waiting on a
semaphore, that is being used, will sleep until the resources
are relinquished. The following semaphore system calls are
blocked inside a jail: <literal>semsys</literal>,
<literal>semget</literal>, <literal>semctl</literal> and
<literal>semop</literal>.</para>
blocked inside a <application>jail</application>: &man.semget.2;,
&man.semctl.2; and &man.semop.2;.</para>
<para><filename>/usr/src/sys/kern/sysv_sem.c</filename>:</para>
<itemizedlist>
<listitem>
<para>&man.semctl.2;<literal>(id, num, cmd, arg)</literal>:
Semctl does the specified cmd on the semaphore queue
indicated by id.</para></listitem>
<para><literal>semctl(semid, semnum, cmd, ...)</literal>:
<literal>semctl</literal> does the specified <literal>cmd</literal>
on the semaphore queue indicated by
<literal>semid</literal>.</para></listitem>
<listitem>
<para>&man.semget.2;<literal>(key, nsems, flag)</literal>:
Semget creates an array of semaphores, corresponding to
key.</para>
<para><literal>semget(key, nsems, flag)</literal>:
<literal>semget</literal> creates an array of semaphores,
corresponding to <literal>key</literal>.</para>
<para><literal>Key and flag take on the same meaning as they
<para><literal>key and flag take on the same meaning as they
do in msgget.</literal></para></listitem>
<listitem><para>&man.semop.2;<literal>(semid, sops, nsops)</literal>:
Semop does the set of semaphore operations in the array of
structures ops, to the set of semaphores identified by
id.</para></listitem>
<listitem><para><literal>semop(semid, array, nops)</literal>:
<literal>semop</literal> performs a group of operations indicated
by <literal>array</literal>, to the set of semaphores identified by
<literal>semid</literal>.</para></listitem>
</itemizedlist>
<indexterm><primary>shared memory</primary></indexterm>
@ -472,28 +522,29 @@ if (!jail_sysvipc_allowed &amp;&amp; jailed(td-&gt;td_ucred)
memory. Processes can communicate directly with each other by
sharing parts of their virtual address space and then reading
and writing data stored in the shared memory. These system
calls are blocked within a jailed environment: <literal>shmdt,
shmat, oshmctl, shmctl, shmget</literal>, and
<literal>shmsys</literal>.</para>
calls are blocked within a jailed environment: &man.shmdt.2;,
&man.shmat.2;, &man.shmctl.2; and &man.shmget.2;.</para>
<para><filename>/usr/src/sys/kern/sysv_shm.c</filename>:</para>
<itemizedlist>
<listitem><para>&man.shmctl.2;<literal>(shmid, cmd, buf)</literal>:
shmctl does various control operations on the shared memory
region identified by id.</para></listitem>
<listitem><para><literal>shmctl(shmid, cmd, buf)</literal>:
<literal>shmctl</literal> does various control operations on the
shared memory region identified by
<literal>shmid</literal>.</para></listitem>
<listitem><para>&man.shmget.2;<literal>(key, size,
shmflg)</literal>: shmget accesses or creates a shared memory
region of size bytes.</para></listitem>
<listitem><para><literal>shmget(key, size, flag)</literal>:
<literal>shmget</literal> accesses or creates a shared memory
region of <literal>size</literal> bytes.</para></listitem>
<listitem><para>&man.shmat.2;<literal>(shmid, shmaddr, shmflg)</literal>:
shmat attaches a shared memory region identified by id to the
address space of a process.</para></listitem>
<listitem><para><literal>shmat(shmid, addr, flag)</literal>:
<literal>shmat</literal> attaches a shared memory region identified
by <literal>shmid</literal> to the address space of a
process.</para></listitem>
<listitem><para>&man.shmdt.2;<literal>(shmaddr)</literal>: shmdt
detaches the shared memory region previously attached at
addr.</para></listitem>
<listitem><para><literal>shmdt(addr)</literal>:
<literal>shmdt</literal> detaches the shared memory region
previously attached at <literal>addr</literal>.</para></listitem>
</itemizedlist>
</sect2>
@ -502,10 +553,10 @@ if (!jail_sysvipc_allowed &amp;&amp; jailed(td-&gt;td_ucred)
<title>Sockets</title>
<indexterm><primary>sockets</primary></indexterm>
<para>Jail treats the &man.socket.2; system call and related
lower-level socket functions in a special manner. In order to
determine whether a certain socket is allowed to be created,
it first checks to see if the sysctl
<para><application>Jail</application> treats the &man.socket.2; system
call and related lower-level socket functions in a special manner.
In order to determine whether a certain socket is allowed to be
created, it first checks to see if the sysctl
<literal>security.jail.socket_unixiproute_only</literal> is set. If
set, sockets are only allowed to be created if the family
specified is either <literal>PF_LOCAL</literal>,
@ -515,8 +566,8 @@ if (!jail_sysvipc_allowed &amp;&amp; jailed(td-&gt;td_ucred)
<programlisting><filename>/usr/src/sys/kern/uipc_socket.c</filename>:
int
socreate(dom, aso, type, proto, cred, td)
...
socreate(int dom, struct socket **aso, int type, int proto,
struct ucred *cred, struct thread *td)
{
struct protosw *prp;
...
@ -537,11 +588,10 @@ socreate(dom, aso, type, proto, cred, td)
<indexterm><primary>Berkeley Packet Filter</primary></indexterm>
<indexterm><primary>data link layer</primary></indexterm>
<para>The Berkeley Packet Filter provides a raw interface to
data link layers in a protocol independent fashion. The
function <literal>bpfopen()</literal> opens an Ethernet
device. It's now controlled by the devfs whether can be used
in the jail.
<para>The <application>Berkeley Packet Filter</application> provides
a raw interface to data link layers in a protocol independent
fashion. <application>BPF</application> is now controlled by the
&man.devfs.8; whether it can be used in a jailed environment.</para>
</sect2>
@ -554,17 +604,20 @@ socreate(dom, aso, type, proto, cred, td)
TCP, UDP, IP and ICMP. IP and ICMP are on the same level: the
network layer 2. There are certain precautions which are
taken in order to prevent a jailed process from binding a
protocol to a certain port only if the <literal>nam</literal>
parameter is set. nam is a pointer to a sockaddr structure,
protocol to a certain address only if the <literal>nam</literal>
parameter is set. <literal>nam</literal> is a pointer to a
<literal>sockaddr</literal> structure,
which describes the address on which to bind the service. A
more exact definition is that sockaddr "may be used as a
template for referring to the identifying tag and length of
each address"[2]. In the function
<literal>in_pcbbind_setup</literal>, <literal>sin</literal> is a
pointer to a sockaddr_in structure, which contains the port,
address, length and domain family of the socket which is to be
bound. Basically, this disallows any processes from jail to be
able to specify the domain family.</para>
more exact definition is that <literal>sockaddr</literal> "may be
used as a template for referring to the identifying tag and length of
each address". In the function
<literal>in_pcbbind_setup()</literal>, <literal>sin</literal> is a
pointer to a <literal>sockaddr_in</literal> structure, which
contains the port, address, length and domain family of the socket
which is to be bound. Basically, this disallows any processes from
<application>jail</application> to be able to specify the address
that doesn't belong to the <application>jail</application> in which
the calling process exists.</para>
<programlisting><filename>/usr/src/sys/netinet/in_pcb.c</filename>:
int
@ -577,30 +630,39 @@ in_pcbbind_setup(struct inpcb *inp, struct sockaddr *nam, in_addr_t *laddrp,
if (nam) {
sin = (struct sockaddr_in *)nam;
...
#ifdef notdef
/*
* We should check the family, but old programs
* incorrectly fail to initialize it.
*/
if (sin->sin_family != AF_INET)
return (EAFNOSUPPORT);
#endif
if (sin-&gt;sin_addr.s_addr != INADDR_ANY)
if (prison_ip(cred, 0, &amp;sin-&gt;sin_addr.s_addr))
return(EINVAL);
...
if (lport) {
...
if (prison &amp;&amp; prison_ip(cred, 0, &amp;sin-&gt;sin_addr.s_addr))
return (EADDRNOTAVAIL);
...
}
}
if (lport == 0) {
...
if (laddr.s_addr != INADDR_ANY)
if (prison_ip(cred, 0, &amp;laddr.s_addr))
return (EINVAL);
...
}
...
if (prison_ip(cred, 0, &amp;laddr.s_addr))
return (EINVAL);
...
}</programlisting>
<para>You might be wondering what function
<literal>prison_ip()</literal> does. prison_ip is given three
arguments, a pointer to the credential(represented by
<literal>cred</literal>), any flags, and an ip address. It
returns 1 if the ip address does NOT belong to the jail or
0 otherwise. As you can see from the code, if it is indeed
an ip address not belonging to the jail, the protcol is
not allowed to bind to a certain port.</para>
<literal>prison_ip()</literal> does. <literal>prison_ip()</literal>
is given three arguments, a pointer to the credential(represented by
<literal>cred</literal>), any flags, and an IP address. It
returns 1 if the IP address does NOT belong to the
<application>jail</application> or 0 otherwise. As you can see
from the code, if it is indeed an IP address not belonging to the
<application>jail</application>, the protcol is not allowed to bind
to that address.</para>
<programlisting><filename>/usr/src/sys/kern/kern_jail.c:</filename>
int
@ -632,51 +694,59 @@ prison_ip(struct ucred *cred, int flag, u_int32_t *ip)
return (1);
return (0);
}</programlisting>
<para>Jailed users are not allowed to bind services to an ip
which does not belong to the jail. The restriction is also
written within the function <literal>in_pcbbind_setup</literal>:</para>
<programlisting><filename>/usr/src/sys/netinet/in_pcb.c</filename>
if (nam) {
...
lport = sin-&gt;sin.port;
... if (lport) {
...
if (jailed(cred))
prison = 1;
...
if (prison &amp;&amp;
prison_ip(cred, 0, &amp;sin-&gt;sin_addr.s_addr))
return (EADDRNOTAVAIL);</programlisting>
</sect2>
<sect2>
<title>Filesystem</title>
<indexterm><primary>filesystem</primary></indexterm>
<para>Even root users within the jail are not allowed to set any
file flags, such as immutable, append, and no unlink flags, if
the securelevel is greater than 0.</para>
<para>Even <literal>root</literal> users within the
<application>jail</application> are not allowed to unset or modify
any file flags, such as immutable, append-only, and undeleteable
flags, if the securelevel is greater than 0.</para>
<programlisting>/usr/src/sys/ufs/ufs/ufs_vnops.c:
<programlisting><filename>/usr/src/sys/ufs/ufs/ufs_vnops.c:</filename>
static int
ufs_setattr(ap)
...
{
...
if (!suser_cred(cred,
jail_chflags_allowed ? SUSER_ALLOWJAIL : 0)) {
if (!priv_check_cred(cred, PRIV_VFS_SYSFLAGS, 0)) {
if (ip-&gt;i_flags
&amp; (SF_NOUNLINK | SF_IMMUTABLE | SF_APPEND)) {
error = securelevel_gt(cred, 0);
if (error)
return (error);
error = securelevel_gt(cred, 0);
if (error)
return (error);
}
...
}
}
<filename>/usr/src/sys/kern/kern_priv.c</filename>
int
priv_check_cred(struct ucred *cred, int priv, int flags)
{
...
error = prison_priv_check(cred, priv);
if (error)
return (error);
...
}
<filename>/usr/src/sys/kern/kern_jail.c</filename>
int
prison_priv_check(struct ucred *cred, int priv)
{
...
switch (priv) {
...
case PRIV_VFS_SYSFLAGS:
if (jail_chflags_allowed)
return (0);
else
return (EPERM);
...
}
...
}</programlisting>
</sect2>
</sect1>