Update this chapter with respect to our current state of jail
infrastructure, plus several markup corrections. Submitted by: MQ <antinvidia at gmail dot com>
This commit is contained in:
		
							parent
							
								
									26fad0c4c1
								
							
						
					
					
						commit
						0b444add8d
					
				
				
				Notes:
				
					svn2git
				
				2020-12-08 03:00:23 +00:00 
				
			
			svn path=/head/; revision=31407
					 1 changed files with 329 additions and 259 deletions
				
			
		|  | @ -24,62 +24,65 @@ | |||
|   <indexterm><primary>Jail</primary></indexterm> | ||||
|   <indexterm><primary>root</primary></indexterm> | ||||
| 
 | ||||
|   <para>On most &unix; systems, root has omnipotent power. This promotes | ||||
|     insecurity. If an attacker were to gain root on a system, he would | ||||
|     have every function at his fingertips. In FreeBSD there are | ||||
|     sysctls which dilute the power of root, in order to minimize the | ||||
|     damage caused by an attacker. Specifically, one of these functions | ||||
|     is called secure levels. Similarly, another function which is | ||||
|     present from FreeBSD 4.0 and onward, is a utility called | ||||
|     &man.jail.8;. <application>Jail</application> chroots an | ||||
|     environment and sets certain restrictions on processes which are | ||||
|     forked from within. For example, a jailed process cannot affect | ||||
|     processes outside of the jail, utilize certain system calls, or | ||||
|     inflict any damage on the main computer.</para> | ||||
|   <para>On most &unix; systems, <literal>root</literal> has omnipotent power. | ||||
|     This promotes insecurity. If an attacker gained <literal>root</literal> | ||||
|     on a system, he would have every function at his fingertips. In FreeBSD | ||||
|     there are sysctls which dilute the power of <literal>root</literal>, in | ||||
|     order to minimize the damage caused by an attacker. Specifically, one of | ||||
|     these functions is called <literal>secure levels</literal>. Similarly, | ||||
|     another function which is present from FreeBSD 4.0 and onward, is a utility | ||||
|     called &man.jail.8;. <application>Jail</application> chroots an environment | ||||
|     and sets certain restrictions on processes which are forked within | ||||
|     the <application>jail</application>. For example, a jailed process | ||||
|     cannot affect processes outside the <application>jail</application>, | ||||
|     utilize certain system calls, or inflict any damage on the host | ||||
|     environment.</para> | ||||
| 
 | ||||
|   <para><application>Jail</application> is becoming the new security | ||||
|     model. People are running potentially vulnerable servers such as | ||||
|     Apache, BIND, and sendmail within jails, so that if an attacker | ||||
|     gains root within the <application>Jail</application>, it is only | ||||
|     an annoyance, and not a devastation. This article focuses on the | ||||
|     internals (source code) of <application>Jail</application>. | ||||
|     It will also suggest improvements upon the jail code base which | ||||
|     are already being worked on. If you are looking for a how-to on | ||||
|     setting up a <application>Jail</application>, I suggest you look | ||||
|     at my other article in Sys Admin Magazine, May 2001, entitled | ||||
|     "Securing FreeBSD using <application>Jail</application>."</para> | ||||
|     <application>Apache</application>, <application>BIND</application>, and | ||||
|     <application>sendmail</application> within jails, so that if an attacker | ||||
|     gains <literal>root</literal> within the <application>jail</application>, | ||||
|     it is only an annoyance, and not a devastation. This article mainly | ||||
|     focuses on the internals (source code) of <application>jail</application>. | ||||
|     If you are looking for a how-to on setting up a | ||||
|     <application>jail</application>, I suggest you look at my other article | ||||
|     in Sys Admin Magazine, May 2001, entitled "Securing FreeBSD using | ||||
|     <application>Jail</application>."</para> | ||||
| 
 | ||||
|   <sect1 id="jail-arch"> | ||||
|     <title>Architecture</title> | ||||
| 
 | ||||
|     <para> | ||||
|       <application>Jail</application> consists of two realms: the | ||||
|       user-space program, jail, and the code implemented within the | ||||
|       kernel: the <literal>jail</literal> system call and associated | ||||
|       restrictions. I will be discussing the user-space program and | ||||
|       then how jail is implemented within the kernel.</para> | ||||
|       userland program, &man.jail.8;, and the code implemented within | ||||
|       the kernel: the &man.jail.2; system call and associated | ||||
|       restrictions. I will be discussing the userland program and | ||||
|       then how <application>jail</application> is implemented within | ||||
|       the kernel.</para> | ||||
| 
 | ||||
|     <sect2> | ||||
|       <title>Userland code</title> | ||||
|       <title>Userland Code</title> | ||||
| 
 | ||||
|       <indexterm><primary>Jail</primary> | ||||
| 	<secondary>userland program</secondary></indexterm> | ||||
| 	<secondary>Userland Program</secondary></indexterm> | ||||
| 
 | ||||
|       <para>The source for the user-land jail is located in | ||||
|         <filename>/usr/src/usr.sbin/jail</filename>, consisting of | ||||
|         one file, <filename>jail.c</filename>. The program takes these | ||||
|         arguments: the path of the jail, hostname, ip address, and the | ||||
|         command to be executed.</para> | ||||
|       <para>The source for the userland <application>jail</application> | ||||
|         is located in <filename>/usr/src/usr.sbin/jail</filename>, | ||||
|         consisting of one file, <filename>jail.c</filename>. The program | ||||
|         takes these arguments: the path of the <application>jail</application>, | ||||
|         hostname, IP address, and the command to be executed.</para> | ||||
| 
 | ||||
|       <sect3> | ||||
|         <title>Data Structures</title> | ||||
| 
 | ||||
|         <para>In <filename>jail.c</filename>, the first thing I would | ||||
|           note is the declaration of an important structure | ||||
|           <literal>struct jail j</literal>; which was included from | ||||
|           <literal>struct jail j;</literal> which was included from | ||||
|           <filename>/usr/include/sys/jail.h</filename>.</para> | ||||
| 
 | ||||
|         <para>The definition of the jail structure is:</para> | ||||
|         <para>The definition of the <literal>jail</literal> structure is: | ||||
| </para> | ||||
| 
 | ||||
| <programlisting><filename>/usr/include/sys/jail.h</filename>:  | ||||
| 
 | ||||
|  | @ -91,8 +94,8 @@ struct jail { | |||
| };</programlisting> | ||||
| 
 | ||||
|         <para>As you can see, there is an entry for each of the | ||||
|           arguments passed to the jail program, and indeed, they are | ||||
|           set during its execution.</para> | ||||
|           arguments passed to the &man.jail.8; program, and indeed, | ||||
|           they are set during its execution.</para> | ||||
| 
 | ||||
|         <programlisting><filename>/usr/src/usr.sbin/jail/jail.c</filename> | ||||
| char path[PATH_MAX]; | ||||
|  | @ -111,10 +114,11 @@ j.hostname = argv[1];</programlisting> | |||
|       <sect3> | ||||
|         <title>Networking</title> | ||||
| 
 | ||||
|         <para>One of the arguments passed to the Jail program is an IP | ||||
|           address with which the jail can be accessed over the | ||||
|           network. Jail translates the ip address given into host | ||||
|           byte order and then stores it in j (the jail structure).</para> | ||||
|         <para>One of the arguments passed to the &man.jail.8; program is | ||||
|           an IP address with which the <application>jail</application> | ||||
|           can be accessed over the network. &man.jail.8; translates the | ||||
|           IP address given into host byte order and then stores it in | ||||
|           <literal>j</literal> (the <literal>jail</literal> structure).</para> | ||||
| 
 | ||||
|         <programlisting><filename>/usr/src/usr.sbin/jail/jail.c</filename>: | ||||
| struct in_addr in;  | ||||
|  | @ -123,37 +127,35 @@ if (inet_aton(argv[2], &in) == 0) | |||
|     errx(1, "Could not make sense of ip-number: %s", argv[2]); | ||||
| j.ip_number = ntohl(in.s_addr);</programlisting> | ||||
| 
 | ||||
|         <para>The | ||||
|           <citerefentry><refentrytitle>inet_aton</refentrytitle> | ||||
|           <manvolnum>3</manvolnum></citerefentry> | ||||
|           function "interprets the specified character string as an | ||||
|           Internet address, placing the address into the structure | ||||
|           provided." The ip number node in the jail structure is set | ||||
|           only when the ip address placed onto the in structure by | ||||
|           inet_aton is translated into host byte order by | ||||
|           <function>ntohl()</function>.</para> | ||||
|         <para>The &man.inet.aton.3; function "interprets the specified | ||||
|           character string as an Internet address, placing the address | ||||
|           into the structure provided." The <literal>ip_number</literal> | ||||
|           member in the <literal>jail</literal> structure is set only | ||||
|           when the IP address placed onto the <literal>in</literal> | ||||
|           structure by &man.inet.aton.3; is translated into host byte | ||||
|           order by &man.ntohl.3;.</para> | ||||
| 
 | ||||
|       </sect3> | ||||
| 
 | ||||
|       <sect3> | ||||
|         <title>Jailing The Process</title> | ||||
| 
 | ||||
|         <para>Finally, the userland program jails the process, and | ||||
|           executes the command specified. Jail now becomes an | ||||
|           imprisoned process itself and then executes the command | ||||
|           given using &man.execv.3;</para> | ||||
| 
 | ||||
|         <programlisting><filename>/usr/src/sys/usr.sbin/jail/jail.c</filename> | ||||
|         <para>Finally, the userland program jails the process. | ||||
|           <application>Jail</application> now becomes an imprisoned | ||||
|           process itself and then executes the command given using | ||||
|           &man.execv.3;.</para> | ||||
|         <programlisting><filename>/usr/src/usr.sbin/jail/jail.c</filename> | ||||
| i = jail(&j);  | ||||
| ...  | ||||
| if (execv(argv[3], argv + 3) != 0) | ||||
|     err(1, "execv: %s", argv[3]);</programlisting> | ||||
| 
 | ||||
|         <para>As you can see, the jail function is being called, and | ||||
|           its argument is the jail structure which has been filled | ||||
|           with the arguments given to the program. Finally, the | ||||
|           program you specify is executed. I will now discuss how Jail | ||||
|           is implemented within the kernel.</para> | ||||
|         <para>As you can see, the <literal>jail()</literal> function is | ||||
|           called, and its argument is the <literal>jail</literal> structure | ||||
|           which has been filled with the arguments given to the program. | ||||
|           Finally, the program you specify is executed. I will now discuss | ||||
|           how <application>jail</application> is implemented within the | ||||
|           kernel.</para> | ||||
|       </sect3> | ||||
|     </sect2> | ||||
| 
 | ||||
|  | @ -161,12 +163,12 @@ if (execv(argv[3], argv + 3) != 0) | |||
|       <title>Kernel Space</title> | ||||
| 
 | ||||
|       <indexterm><primary>Jail</primary> | ||||
| 	<secondary>kernel architecture</secondary></indexterm> | ||||
| 	<secondary>Kernel Architecture</secondary></indexterm> | ||||
| 
 | ||||
|       <para>We will now be looking at the file | ||||
|         <filename>/usr/src/sys/kern/kern_jail.c</filename>.  This is | ||||
|         the file where the jail system call, appropriate sysctls, and | ||||
|         networking functions are defined.</para> | ||||
|         the file where the &man.jail.2; system call, appropriate sysctls, | ||||
|         and networking functions are defined.</para> | ||||
| 
 | ||||
|       <sect3> | ||||
|         <title>sysctls</title> | ||||
|  | @ -186,7 +188,7 @@ SYSCTL_INT(_security_jail, OID_AUTO, set_hostname_allowed, CTLFLAG_RW, | |||
| int     jail_socket_unixiproute_only = 1; | ||||
| SYSCTL_INT(_security_jail, OID_AUTO, socket_unixiproute_only, CTLFLAG_RW, | ||||
|     &jail_socket_unixiproute_only, 0, | ||||
|     "Processes in jail are limited to creating &unix;/IPv4/route sockets only"); | ||||
|     "Processes in jail are limited to creating UNIX/IPv4/route sockets only"); | ||||
| 
 | ||||
| int     jail_sysvipc_allowed = 0; | ||||
| SYSCTL_INT(_security_jail, OID_AUTO, sysvipc_allowed, CTLFLAG_RW, | ||||
|  | @ -206,10 +208,15 @@ SYSCTL_INT(_security_jail, OID_AUTO, allow_raw_sockets, CTLFLAG_RW, | |||
| int    jail_chflags_allowed = 0; | ||||
| SYSCTL_INT(_security_jail, OID_AUTO, chflags_allowed, CTLFLAG_RW, | ||||
|     &jail_chflags_allowed, 0, | ||||
|     "Processes in jail can alter system file flags");</programlisting> | ||||
|     "Processes in jail can alter system file flags"); | ||||
| 
 | ||||
| int     jail_mount_allowed = 0; | ||||
| SYSCTL_INT(_security_jail, OID_AUTO, mount_allowed, CTLFLAG_RW, | ||||
|     &jail_mount_allowed, 0, | ||||
|     "Processes in jail can mount/unmount jail-friendly file systems");</programlisting> | ||||
| 
 | ||||
|         <para>Each of these sysctls can be accessed by the user | ||||
|           through the sysctl program. Throughout the kernel, these | ||||
|           through the &man.sysctl.8; program. Throughout the kernel, these | ||||
|           specific sysctls are recognized by their name. For example, | ||||
|           the name of the first sysctl is | ||||
|           <literal>security.jail.set_hostname_allowed</literal>.</para> | ||||
|  | @ -221,18 +228,17 @@ SYSCTL_INT(_security_jail, OID_AUTO, chflags_allowed, CTLFLAG_RW, | |||
|         <para>Like all system calls, the &man.jail.2; system call takes | ||||
|           two arguments, <literal>struct thread *td</literal> and | ||||
|           <literal>struct jail_args *uap</literal>. | ||||
|           <literal>td</literal> is a pointer to the thread | ||||
|           <literal>td</literal> is a pointer to the <literal>thread</literal> | ||||
|           structure which describes the calling thread. In this | ||||
|           context, uap is a pointer to the structure which specifies the | ||||
|           arguments given to &man.jail.2; from the userland program | ||||
|           <filename>jail.c</filename>. When I described the userland | ||||
|           program before, you saw that the &man.jail.2; system call was | ||||
|           given a jail structure as its own argument.</para> | ||||
|           context, <literal>uap</literal> is a pointer to the structure | ||||
|           in which a pointer to the <literal>jail</literal> structure | ||||
|           passed by the userland <filename>jail.c</filename> is contained. | ||||
|           When I described the userland program before, you saw that the | ||||
|           &man.jail.2; system call was given a <literal>jail</literal> | ||||
|           structure as its own argument.</para> | ||||
| 
 | ||||
|         <programlisting><filename>/usr/src/sys/kern/kern_jail.c:</filename> | ||||
| /* | ||||
|  * MPSAFE | ||||
|  *   | ||||
|  * struct jail_args { | ||||
|  *  struct jail *jail; | ||||
|  * }; | ||||
|  | @ -240,46 +246,48 @@ SYSCTL_INT(_security_jail, OID_AUTO, chflags_allowed, CTLFLAG_RW, | |||
| int  | ||||
| jail(struct thread *td, struct jail_args *uap)</programlisting> | ||||
| 
 | ||||
|         <para>Therefore, <literal>uap->jail</literal> would access the | ||||
|           jail structure which was passed to the system call. Next, | ||||
|           the system call copies the jail structure into kernel space | ||||
|           using the <literal>copyin()</literal> | ||||
|           function. <literal>copyin()</literal> takes three arguments: | ||||
|           the data which is to be copied into kernel space, | ||||
|         <para>Therefore, <literal>uap->jail</literal> can be used to | ||||
|           access the <literal>jail</literal> structure which was passed | ||||
|           to the system call. Next, the system call copies the | ||||
|           <literal>jail</literal> structure into kernel space using | ||||
|           the &man.copyin.9; function. &man.copyin.9; takes three arguments: | ||||
|           the address of the data which is to be copied into kernel space, | ||||
|           <literal>uap->jail</literal>, where to store it, | ||||
|           <literal>j</literal> and the size of the storage. The jail | ||||
|           structure <literal>uap->jail</literal> is copied into kernel | ||||
|           space and stored in another jail structure, | ||||
|           <literal>j</literal> and the size of the storage. The | ||||
|           <literal>jail</literal> structure pointed by | ||||
|           <literal>uap->jail</literal> is copied into kernel space and | ||||
|           is stored in another <literal>jail</literal> structure, | ||||
|           <literal>j</literal>.</para> | ||||
| 
 | ||||
|         <programlisting><filename>/usr/src/sys/kern/kern_jail.c: </filename> | ||||
| error = copyin(uap->jail, &j, sizeof(j));</programlisting> | ||||
| 
 | ||||
|         <para>There is another important structure defined in | ||||
|           jail.h. It is the prison structure | ||||
|           (<literal>pr</literal>). The prison structure is used | ||||
|           exclusively within kernel space. The &man.jail.2; system call | ||||
|           copies everything from the jail structure onto the prison | ||||
|           structure. Here is the definition of the prison structure.</para> | ||||
|           <filename>jail.h</filename>. It is the <literal>prison</literal> | ||||
|           structure. The <literal>prison</literal> structure is used | ||||
|           exclusively within kernel space. Here is the definition of the | ||||
|           <literal>prison</literal> structure.</para> | ||||
| 
 | ||||
|         <programlisting><filename>/usr/include/sys/jail.h</filename>: | ||||
| struct prison { | ||||
|     LIST_ENTRY(prison) pr_list;         /* (a) all prisons */ | ||||
|     int      pr_id;             /* (c) prison id */ | ||||
|     int      pr_ref;            /* (p) refcount */ | ||||
|     char         pr_path[MAXPATHLEN];       /* (c) chroot path */ | ||||
|     struct vnode    *pr_root;           /* (c) vnode to rdir */ | ||||
|     char         pr_host[MAXHOSTNAMELEN];   /* (p) jail hostname */ | ||||
|     u_int32_t    pr_ip;             /* (c) ip addr host */ | ||||
|     void        *pr_linux;          /* (p) linux abi */ | ||||
|     int      pr_securelevel;        /* (p) securelevel */ | ||||
|     struct task  pr_task;           /* (d) destroy task */ | ||||
|     struct mtx   pr_mtx; | ||||
|         LIST_ENTRY(prison) pr_list;                     /* (a) all prisons */ | ||||
|         int              pr_id;                         /* (c) prison id */ | ||||
|         int              pr_ref;                        /* (p) refcount */ | ||||
|         char             pr_path[MAXPATHLEN];           /* (c) chroot path */ | ||||
|         struct vnode    *pr_root;                       /* (c) vnode to rdir */ | ||||
|         char             pr_host[MAXHOSTNAMELEN];       /* (p) jail hostname */ | ||||
|         u_int32_t        pr_ip;                         /* (c) ip addr host */ | ||||
|         void            *pr_linux;                      /* (p) linux abi */ | ||||
|         int              pr_securelevel;                /* (p) securelevel */ | ||||
|         struct task      pr_task;                       /* (d) destroy task */ | ||||
|         struct mtx       pr_mtx; | ||||
|       void            **pr_slots;                     /* (p) additional data */ | ||||
| };</programlisting> | ||||
| 
 | ||||
|         <para>The jail() system call then allocates memory for a | ||||
|         pointer to a prison structure and copies data between the two | ||||
|         structures.</para> | ||||
|         <para>The &man.jail.2; system call then allocates memory for | ||||
|         a <literal>prison</literal> structure and copies data between | ||||
|         the <literal>jail</literal> and <literal>prison</literal> | ||||
|         structure.</para> | ||||
| 
 | ||||
|         <programlisting><filename>/usr/src/sys/kern/kern_jail.c</filename>: | ||||
| MALLOC(pr, struct prison *, sizeof(*pr), M_PRISON, M_WAITOK | M_ZERO); | ||||
|  | @ -289,49 +297,86 @@ if (error) | |||
|     goto e_killmtx; | ||||
| ... | ||||
| error = copyinstr(j.hostname, &pr->pr_host, sizeof(pr->pr_host), 0); | ||||
|  if (error) | ||||
|      goto e_dropvnref;</programlisting> | ||||
|         <para>These next three lines in the source are very important, | ||||
|           as they specify how the kernel recognizes a process as | ||||
|           jailed. Each process on a &unix; system is described by its | ||||
|           own proc structure. You can see the whole proc structure in | ||||
|           <filename>/usr/include/sys/proc.h</filename>. For example, | ||||
|           the td argument in any system call is actually a pointer to | ||||
|           that calling thread's thread structure, as stated before. The | ||||
|           td->td_proc is a pointer to the calling process' process | ||||
|           structure.  The proc structure contains nodes which can describe | ||||
|           the owner's identity (<literal>p_ucred</literal>), the process | ||||
|           resource limits (<literal>p_limit</literal>), and so on. In the | ||||
|           definition of the ucred structure, there is a pointer to a | ||||
|           prison structure. (<literal>cr_prison</literal>).</para> | ||||
| if (error) | ||||
|      goto e_dropvnref; | ||||
| pr->pr_ip = j.ip_number;</programlisting> | ||||
|         <para>Next, we will discuss another important system call | ||||
|           &man.jail.attach.2;, which implements the function to put | ||||
|           a process into the <application>jail</application>.</para> | ||||
|         <programlisting><filename>/usr/src/sys/kern/kern_jail.c</filename>: | ||||
| /* | ||||
|  * struct jail_attach_args { | ||||
|  *      int jid; | ||||
|  * }; | ||||
|  */ | ||||
| int | ||||
| jail_attach(struct thread *td, struct jail_attach_args *uap)</programlisting> | ||||
|         <para>This system call makes the changes that can distinguish | ||||
|           a jailed process from those unjailed ones. | ||||
|           To understand what &man.jail.attach.2; does for us, certain | ||||
|           background information is needed.</para> | ||||
|         <para> | ||||
|           On FreeBSD, each kernel visible thread is identified by its | ||||
|           <literal>thread</literal> structure, while the processes are | ||||
|           described by their <literal>proc</literal> structures. You can | ||||
|           find the definitions of the <literal>thread</literal> and | ||||
|           <literal>proc</literal> structure in | ||||
|           <filename>/usr/include/sys/proc.h</filename>. | ||||
|           For example, the <literal>td</literal> argument in any system | ||||
|           call is actually a pointer to the calling thread's | ||||
|           <literal>thread</literal> structure, as stated before. | ||||
|           The <literal>td_proc</literal> member in the | ||||
|           <literal>thread</literal> structure pointed by <literal>td</literal> | ||||
|           is a pointer to the <literal>proc</literal> structure which | ||||
|           represents the process that contains the thread represented by | ||||
|           <literal>td</literal>. The <literal>proc</literal> structure | ||||
|           contains members which can describe the owner's | ||||
|           identity(<literal>p_ucred</literal>), the process resource | ||||
|           limits(<literal>p_limit</literal>), and so on. In the | ||||
|           <literal>ucred</literal> structure pointed by | ||||
|           <literal>p_ucred</literal> member in the <literal>proc</literal> | ||||
|           structure, there is a pointer to the <literal>prison</literal> | ||||
|           structure(<literal>cr_prison</literal>).</para> | ||||
| 
 | ||||
|         <programlisting><filename>/usr/include/sys/proc.h: </filename> | ||||
| struct thread { | ||||
|     ... | ||||
|     struct proc *td_proc; | ||||
|     ... | ||||
| }; | ||||
| struct proc {  | ||||
| ... | ||||
| struct ucred *p_ucred;  | ||||
| ... | ||||
|     ... | ||||
|     struct ucred *p_ucred;  | ||||
|     ... | ||||
| }; | ||||
| <filename>/usr/include/sys/ucred.h</filename> | ||||
| struct ucred { | ||||
| ... | ||||
| struct prison *cr_prison; | ||||
| ... | ||||
|     ... | ||||
|     struct prison *cr_prison; | ||||
|     ... | ||||
| };</programlisting> | ||||
| 
 | ||||
|         <para>In <filename>kern_jail.c</filename>, the function then | ||||
|           calls function jail_attach with a given jid. And the jail_attach | ||||
|           calls function change_root to change the root directory of the | ||||
|           calling process.  The jail_attach function then creates a new ucred | ||||
|           structure, and attaches the newly created ucred structure to the | ||||
|           calling process after it has successfully attaches the prison on the | ||||
|           cred structure. From then on, the calling process is recognized as | ||||
|           jailed. When calls function jailed with the newly created ucred | ||||
|           structure as the argument, it returns 1 to tell that the credential | ||||
|           is in a jail. The parent process of each process, forked within  | ||||
|           the jail, is the program jail itself, as it calls the &man.jail.2; | ||||
|           system call.  When the program is executed through execve, it | ||||
|           inherits the properties of its parent's ucred structure, therefore it | ||||
|           has the jailed ucred structure.</para> | ||||
|         <para>In <filename>kern_jail.c</filename>, the function | ||||
|           <literal>jail()</literal> then calls function | ||||
|           <literal>jail_attach()</literal> with a given <literal>jid</literal>. | ||||
|           And <literal>jail_attach()</literal> calls function | ||||
|           <literal>change_root()</literal> to change the root directory of the | ||||
|           calling process. The <literal>jail_attach()</literal> then creates | ||||
|           a new <literal>ucred</literal> structure, and attaches the newly | ||||
|           created <literal>ucred</literal> structure to the calling process | ||||
|           after it has successfully attached the <literal>prison</literal> | ||||
|           structure to the <literal>ucred</literal> structure. From then on, | ||||
|           the calling process is recognized as jailed. When the kernel routine | ||||
|           <literal>jailed()</literal> is called in the kernel with the newly | ||||
|           created <literal>ucred</literal> structure as its argument, it | ||||
|           returns 1 to tell that the credential is connected | ||||
|           with a <application>jail</application>. The public ancestor process | ||||
|           of all the process forked within the <application>jail</application>, | ||||
|           is the process which runs &man.jail.8;, as it calls the | ||||
|           &man.jail.2; system call. When a program is executed through | ||||
|           &man.execve.2;, it inherits the jailed property of its parent's | ||||
|           <literal>ucred</literal> structure, therefore it has a jailed | ||||
|           <literal>ucred</literal> structure.</para> | ||||
| 
 | ||||
|         <programlisting><filename>/usr/src/sys/kern/kern_jail.c</filename> | ||||
| int | ||||
|  | @ -363,14 +408,15 @@ jail_attach(struct thread *td, struct jail_attach_args *uap) | |||
|     p->p_ucred = newcred; | ||||
| ... | ||||
| }</programlisting> | ||||
|         <para>When a process is forked from a parent process, the | ||||
|           &man.fork.2; system call uses crhold to maintain the credential | ||||
|           for the newly forked process. It inherently keep the newly forked | ||||
|           child's credential consistent with its parent, so the child process | ||||
|           is also jailed.</para> | ||||
|         <para>When a process is forked from its parent process, the | ||||
|           &man.fork.2; system call uses <literal>crhold()</literal> to | ||||
|           maintain the credential for the newly forked process. It inherently | ||||
|           keep the newly forked child's credential consistent with its parent, | ||||
|           so the child process is also jailed.</para> | ||||
| 
 | ||||
|         <programlisting><filename>/usr/src/sys/kern/kern_fork.c</filename>: | ||||
| p2->p_ucred = crhold(td->td_ucred); | ||||
| ... | ||||
| td2->td_ucred = crhold(p2->p_ucred);</programlisting> | ||||
| 
 | ||||
|       </sect3> | ||||
|  | @ -381,12 +427,13 @@ td2->td_ucred = crhold(p2->p_ucred);</programlisting> | |||
|     <title>Restrictions</title> | ||||
| 
 | ||||
|     <para>Throughout the kernel there are access restrictions relating | ||||
|       to jailed processes. Usually, these restrictions only check if | ||||
|       to jailed processes. Usually, these restrictions only check whether | ||||
|       the process is jailed, and if so, returns an error. For | ||||
|       example:</para> | ||||
| 
 | ||||
|     <programlisting>if (jailed(td->td_ucred)) | ||||
|         return (EPERM);</programlisting> | ||||
|     <programlisting> | ||||
| if (jailed(td->td_ucred)) | ||||
|     return (EPERM);</programlisting> | ||||
| 
 | ||||
|     <sect2> | ||||
|       <title>SysV IPC</title> | ||||
|  | @ -395,43 +442,46 @@ td2->td_ucred = crhold(p2->p_ucred);</programlisting> | |||
| 
 | ||||
|       <para>System V IPC is based on messages. Processes can send each | ||||
|         other these messages which tell them how to act. The functions | ||||
|         which deal with messages are: <literal>msgsys</literal>, | ||||
|         <literal>msgctl</literal>, <literal>msgget</literal>, | ||||
|         <literal>msgsend</literal> and <literal>msgrcv</literal>. | ||||
|         which deal with messages are:  | ||||
|         &man.msgctl.3;, &man.msgget.3;, &man.msgsnd.3; and &man.msgrcv.3;. | ||||
|         Earlier, I mentioned that there were certain sysctls you could | ||||
|         turn on or off in order to affect the behavior of Jail. One of | ||||
|         these sysctls was <literal>security.jail.sysvipc_allowed</literal>. | ||||
|         On most systems, this sysctl is set to 0. If it were set to 1, | ||||
|         it would defeat the whole purpose of having a jail; privileged | ||||
|         users from within the jail would be able to affect processes | ||||
|         outside of the environment. The difference between a message | ||||
|         and a signal is that the message only consists of the signal | ||||
|         number.</para> | ||||
|         turn on or off in order to affect the behavior of | ||||
|         <application>jail</application>. One of these sysctls was | ||||
|         <literal>security.jail.sysvipc_allowed</literal>.  By default, | ||||
|         this sysctl is set to 0. If it were set to 1, it would defeat the | ||||
|         whole purpose of having a <application>jail</application>; privileged | ||||
|         users from the <application>jail</application> would be able to | ||||
|         affect processes outside the jailed environment. The difference | ||||
|         between a message and a signal is that the message only consists | ||||
|         of the signal number.</para> | ||||
| 
 | ||||
|       <para><filename>/usr/src/sys/kern/sysv_msg.c</filename>:</para> | ||||
| 
 | ||||
|       <itemizedlist> | ||||
|         <listitem> <para>&man.msgget.3;: msgget returns (and possibly | ||||
|         creates) a message descriptor that designates a message queue | ||||
|         for use in other system calls.</para></listitem> | ||||
|         <listitem> <para><literal>msgget(key, msgflg)</literal>: | ||||
|         <literal>msgget</literal> returns (and possibly creates) a message | ||||
|         descriptor that designates a message queue for use in other | ||||
|         functions.</para></listitem> | ||||
| 
 | ||||
|         <listitem> <para>&man.msgctl.3;: Using this function, a process | ||||
|         can query the status of a message | ||||
|         <listitem> <para><literal>msgctl(msgid, cmd, buf)</literal>: | ||||
|         Using this function, a process can query the status of a message | ||||
|         descriptor.</para></listitem> | ||||
| 
 | ||||
|         <listitem> <para>&man.msgsnd.3;: msgsnd sends a message to a | ||||
|         <listitem> <para><literal>msgsnd(msgid, msgp, msgsz, msgflg)</literal>: | ||||
|         <literal>msgsnd</literal> sends a message to a | ||||
|         process.</para></listitem> | ||||
| 
 | ||||
|         <listitem> <para>&man.msgrcv.3;: a process receives messages using | ||||
|         <listitem> <para><literal>msgrcv(msgid, msgp, msgsz, msgtyp, | ||||
|         msgflg)</literal>: a process receives messages using | ||||
|         this function</para></listitem> | ||||
| 
 | ||||
|       </itemizedlist> | ||||
| 
 | ||||
|       <para>In each of these system calls, there is this | ||||
|         conditional:</para> | ||||
|       <para>In each of the system calls corresponding to these functions, | ||||
|         there is this conditional:</para> | ||||
| 
 | ||||
|       <programlisting><filename>/usr/src/sys/kern/sysv_msg.c</filename>: | ||||
| if (!jail_sysvipc_allowed && jailed(td->td_ucred) | ||||
| if (!jail_sysvipc_allowed && jailed(td->td_ucred)) | ||||
|     return (ENOSYS);</programlisting> | ||||
| 
 | ||||
|       <indexterm><primary>semaphores</primary></indexterm> | ||||
|  | @ -441,30 +491,30 @@ if (!jail_sysvipc_allowed && jailed(td->td_ucred) | |||
|         processes lock resources. However, process waiting on a | ||||
|         semaphore, that is being used, will sleep until the resources | ||||
|         are relinquished. The following semaphore system calls are | ||||
|         blocked inside a jail: <literal>semsys</literal>, | ||||
|         <literal>semget</literal>, <literal>semctl</literal> and | ||||
|         <literal>semop</literal>.</para> | ||||
|         blocked inside a <application>jail</application>: &man.semget.2;, | ||||
|         &man.semctl.2; and &man.semop.2;.</para> | ||||
| 
 | ||||
|       <para><filename>/usr/src/sys/kern/sysv_sem.c</filename>:</para> | ||||
| 
 | ||||
|       <itemizedlist> | ||||
|         <listitem> | ||||
|           <para>&man.semctl.2;<literal>(id, num, cmd, arg)</literal>: | ||||
|             Semctl does the specified cmd on the semaphore queue | ||||
|             indicated by id.</para></listitem> | ||||
|           <para><literal>semctl(semid, semnum, cmd, ...)</literal>: | ||||
|             <literal>semctl</literal> does the specified <literal>cmd</literal> | ||||
|             on the semaphore queue indicated by | ||||
|             <literal>semid</literal>.</para></listitem> | ||||
| 
 | ||||
|         <listitem> | ||||
|            <para>&man.semget.2;<literal>(key, nsems, flag)</literal>: | ||||
|            Semget creates an array of semaphores, corresponding to | ||||
|            key.</para> | ||||
|            <para><literal>semget(key, nsems, flag)</literal>: | ||||
|             <literal>semget</literal> creates an array of semaphores, | ||||
|             corresponding to <literal>key</literal>.</para> | ||||
| 
 | ||||
|           <para><literal>Key and flag take on the same meaning as they | ||||
|           <para><literal>key and flag take on the same meaning as they | ||||
|           do in msgget.</literal></para></listitem> | ||||
| 
 | ||||
|         <listitem><para>&man.semop.2;<literal>(semid, sops, nsops)</literal>: | ||||
|           Semop does the set of semaphore operations in the array of | ||||
|           structures ops, to the set of semaphores identified by | ||||
|           id.</para></listitem> | ||||
|         <listitem><para><literal>semop(semid, array, nops)</literal>: | ||||
|           <literal>semop</literal> performs a group of operations indicated | ||||
|           by <literal>array</literal>, to the set of semaphores identified by | ||||
|           <literal>semid</literal>.</para></listitem> | ||||
|       </itemizedlist> | ||||
| 
 | ||||
|       <indexterm><primary>shared memory</primary></indexterm> | ||||
|  | @ -472,28 +522,29 @@ if (!jail_sysvipc_allowed && jailed(td->td_ucred) | |||
|         memory. Processes can communicate directly with each other by | ||||
|         sharing parts of their virtual address space and then reading | ||||
|         and writing data stored in the shared memory. These system | ||||
|         calls are blocked within a jailed environment: <literal>shmdt, | ||||
|         shmat, oshmctl, shmctl, shmget</literal>, and | ||||
|         <literal>shmsys</literal>.</para> | ||||
|         calls are blocked within a jailed environment: &man.shmdt.2;, | ||||
|         &man.shmat.2;, &man.shmctl.2; and &man.shmget.2;.</para> | ||||
| 
 | ||||
|       <para><filename>/usr/src/sys/kern/sysv_shm.c</filename>:</para> | ||||
| 
 | ||||
|       <itemizedlist> | ||||
|         <listitem><para>&man.shmctl.2;<literal>(shmid, cmd, buf)</literal>: | ||||
|         shmctl does various control operations on the shared memory | ||||
|         region identified by id.</para></listitem> | ||||
|         <listitem><para><literal>shmctl(shmid, cmd, buf)</literal>: | ||||
|         <literal>shmctl</literal> does various control operations on the | ||||
|         shared memory region identified by | ||||
|         <literal>shmid</literal>.</para></listitem> | ||||
| 
 | ||||
|         <listitem><para>&man.shmget.2;<literal>(key, size, | ||||
|         shmflg)</literal>: shmget accesses or creates a shared memory | ||||
|         region of size bytes.</para></listitem> | ||||
|         <listitem><para><literal>shmget(key, size, flag)</literal>: | ||||
|         <literal>shmget</literal> accesses or creates a shared memory | ||||
|         region of <literal>size</literal> bytes.</para></listitem> | ||||
| 
 | ||||
|         <listitem><para>&man.shmat.2;<literal>(shmid, shmaddr, shmflg)</literal>: | ||||
|         shmat attaches a shared memory region identified by id to the | ||||
|         address space of a process.</para></listitem> | ||||
|         <listitem><para><literal>shmat(shmid, addr, flag)</literal>: | ||||
|         <literal>shmat</literal> attaches a shared memory region identified | ||||
|         by <literal>shmid</literal> to the address space of a | ||||
|         process.</para></listitem> | ||||
| 
 | ||||
|         <listitem><para>&man.shmdt.2;<literal>(shmaddr)</literal>: shmdt | ||||
|         detaches the shared memory region previously attached at | ||||
|         addr.</para></listitem> | ||||
|         <listitem><para><literal>shmdt(addr)</literal>: | ||||
|         <literal>shmdt</literal> detaches the shared memory region | ||||
|         previously attached at <literal>addr</literal>.</para></listitem> | ||||
| 
 | ||||
|       </itemizedlist> | ||||
|     </sect2> | ||||
|  | @ -502,10 +553,10 @@ if (!jail_sysvipc_allowed && jailed(td->td_ucred) | |||
|       <title>Sockets</title> | ||||
| 
 | ||||
|       <indexterm><primary>sockets</primary></indexterm> | ||||
|       <para>Jail treats the &man.socket.2; system call and related | ||||
|         lower-level socket functions in a special manner. In order to | ||||
|         determine whether a certain socket is allowed to be created, | ||||
|         it first checks to see if the sysctl | ||||
|       <para><application>Jail</application> treats the &man.socket.2; system | ||||
|         call and related lower-level socket functions in a special manner. | ||||
|         In order to determine whether a certain socket is allowed to be | ||||
|         created, it first checks to see if the sysctl | ||||
|         <literal>security.jail.socket_unixiproute_only</literal> is set. If | ||||
|         set, sockets are only allowed to be created if the family | ||||
|         specified is either <literal>PF_LOCAL</literal>, | ||||
|  | @ -515,8 +566,8 @@ if (!jail_sysvipc_allowed && jailed(td->td_ucred) | |||
| 
 | ||||
|       <programlisting><filename>/usr/src/sys/kern/uipc_socket.c</filename>: | ||||
| int | ||||
| socreate(dom, aso, type, proto, cred, td) | ||||
| ... | ||||
| socreate(int dom, struct socket **aso, int type, int proto, | ||||
|     struct ucred *cred, struct thread *td) | ||||
| { | ||||
|     struct protosw *prp; | ||||
| ... | ||||
|  | @ -537,11 +588,10 @@ socreate(dom, aso, type, proto, cred, td) | |||
|       <indexterm><primary>Berkeley Packet Filter</primary></indexterm> | ||||
|       <indexterm><primary>data link layer</primary></indexterm> | ||||
| 
 | ||||
|       <para>The Berkeley Packet Filter provides a raw interface to | ||||
|         data link layers in a protocol independent fashion. The | ||||
|         function <literal>bpfopen()</literal> opens an Ethernet | ||||
|         device. It's now controlled by the devfs whether can be used | ||||
|         in the jail. | ||||
|       <para>The <application>Berkeley Packet Filter</application> provides | ||||
|         a raw interface to data link layers in a protocol independent | ||||
|         fashion. <application>BPF</application> is now controlled by the | ||||
|         &man.devfs.8; whether it can be used in a jailed environment.</para> | ||||
| 
 | ||||
|     </sect2> | ||||
| 
 | ||||
|  | @ -554,17 +604,20 @@ socreate(dom, aso, type, proto, cred, td) | |||
|         TCP, UDP, IP and ICMP. IP and ICMP are on the same level: the | ||||
|         network layer 2. There are certain precautions which are | ||||
|         taken in order to prevent a jailed process from binding a | ||||
|         protocol to a certain port only if the <literal>nam</literal> | ||||
|         parameter is set. nam is a pointer to a sockaddr structure, | ||||
|         protocol to a certain address only if the <literal>nam</literal> | ||||
|         parameter is set. <literal>nam</literal> is a pointer to a | ||||
|         <literal>sockaddr</literal> structure, | ||||
|         which describes the address on which to bind the service. A | ||||
|         more exact definition is that sockaddr "may be used as a | ||||
|         template for referring to the identifying tag and length of | ||||
|         each address"[2]. In the function | ||||
|         <literal>in_pcbbind_setup</literal>, <literal>sin</literal> is a | ||||
|         pointer to a sockaddr_in structure, which contains the port, | ||||
|         address, length and domain family of the socket which is to be | ||||
|         bound. Basically, this disallows any processes from jail to be | ||||
|         able to specify the domain family.</para> | ||||
|         more exact definition is that <literal>sockaddr</literal> "may be | ||||
|         used as a template for referring to the identifying tag and length of | ||||
|         each address". In the function | ||||
|         <literal>in_pcbbind_setup()</literal>, <literal>sin</literal> is a | ||||
|         pointer to a <literal>sockaddr_in</literal> structure, which | ||||
|         contains the port, address, length and domain family of the socket | ||||
|         which is to be bound. Basically, this disallows any processes from | ||||
|         <application>jail</application> to be able to specify the address | ||||
|         that doesn't belong to the <application>jail</application> in which | ||||
|         the calling process exists.</para> | ||||
| 
 | ||||
|       <programlisting><filename>/usr/src/sys/netinet/in_pcb.c</filename>:  | ||||
| int | ||||
|  | @ -577,30 +630,39 @@ in_pcbbind_setup(struct inpcb *inp, struct sockaddr *nam, in_addr_t *laddrp, | |||
|     if (nam) { | ||||
|         sin = (struct sockaddr_in *)nam; | ||||
|         ... | ||||
| #ifdef notdef | ||||
|         /* | ||||
|          * We should check the family, but old programs | ||||
|          * incorrectly fail to initialize it. | ||||
|          */ | ||||
|         if (sin->sin_family != AF_INET) | ||||
|             return (EAFNOSUPPORT); | ||||
| #endif | ||||
|         if (sin->sin_addr.s_addr != INADDR_ANY) | ||||
|             if (prison_ip(cred, 0, &sin->sin_addr.s_addr)) | ||||
|                 return(EINVAL); | ||||
|         ... | ||||
|         if (lport) { | ||||
|             ... | ||||
|             if (prison && prison_ip(cred, 0, &sin->sin_addr.s_addr)) | ||||
|                 return (EADDRNOTAVAIL); | ||||
|             ... | ||||
|         } | ||||
|     } | ||||
|     if (lport == 0) { | ||||
|         ... | ||||
|         if (laddr.s_addr != INADDR_ANY) | ||||
|             if (prison_ip(cred, 0, &laddr.s_addr)) | ||||
|                 return (EINVAL); | ||||
|         ... | ||||
|     } | ||||
| ... | ||||
|     if (prison_ip(cred, 0, &laddr.s_addr)) | ||||
|         return (EINVAL); | ||||
| ... | ||||
| }</programlisting> | ||||
| 
 | ||||
|       <para>You might be wondering what function | ||||
|         <literal>prison_ip()</literal> does. prison_ip is given three | ||||
|         arguments, a pointer to the credential(represented by | ||||
|         <literal>cred</literal>), any flags, and an ip address. It | ||||
|         returns 1 if the ip address does NOT belong to the jail or | ||||
|         0 otherwise.  As you can see from the code, if it is indeed | ||||
|         an ip address not belonging to the jail, the protcol is | ||||
|         not allowed to bind to a certain port.</para> | ||||
|         <literal>prison_ip()</literal> does. <literal>prison_ip()</literal> | ||||
|         is given three arguments, a pointer to the credential(represented by | ||||
|         <literal>cred</literal>), any flags, and an IP address. It | ||||
|         returns 1 if the IP address does NOT belong to the | ||||
|         <application>jail</application> or 0 otherwise.  As you can see | ||||
|         from the code, if it is indeed an IP address not belonging to the | ||||
|         <application>jail</application>, the protcol is not allowed to bind | ||||
|         to that address.</para> | ||||
| 
 | ||||
|       <programlisting><filename>/usr/src/sys/kern/kern_jail.c:</filename> | ||||
| int | ||||
|  | @ -632,51 +694,59 @@ prison_ip(struct ucred *cred, int flag, u_int32_t *ip) | |||
|         return (1); | ||||
|     return (0); | ||||
| }</programlisting> | ||||
| 
 | ||||
|       <para>Jailed users are not allowed to bind services to an ip | ||||
|         which does not belong to the jail. The restriction is also | ||||
|         written within the function <literal>in_pcbbind_setup</literal>:</para> | ||||
| 
 | ||||
|       <programlisting><filename>/usr/src/sys/netinet/in_pcb.c</filename> | ||||
|         if (nam) { | ||||
|                ...  | ||||
|                lport = sin->sin.port;  | ||||
|                ... if (lport) {  | ||||
|                          ...  | ||||
|                          if (jailed(cred)) | ||||
|                                 prison = 1;  | ||||
|                          ... | ||||
|                          if (prison && | ||||
|                              prison_ip(cred, 0, &sin->sin_addr.s_addr)) | ||||
| 			            return (EADDRNOTAVAIL);</programlisting> | ||||
| 
 | ||||
|     </sect2> | ||||
| 
 | ||||
|     <sect2> | ||||
|       <title>Filesystem</title> | ||||
| 
 | ||||
|       <indexterm><primary>filesystem</primary></indexterm> | ||||
|       <para>Even root users within the jail are not allowed to set any | ||||
|         file flags, such as immutable, append, and no unlink flags, if | ||||
|         the securelevel is greater than 0.</para> | ||||
|       <para>Even <literal>root</literal> users within the | ||||
|         <application>jail</application> are not allowed to unset or modify | ||||
|         any file flags, such as immutable, append-only, and undeleteable | ||||
|         flags, if the securelevel is greater than 0.</para> | ||||
| 
 | ||||
|       <programlisting>/usr/src/sys/ufs/ufs/ufs_vnops.c: | ||||
|       <programlisting><filename>/usr/src/sys/ufs/ufs/ufs_vnops.c:</filename> | ||||
| static int | ||||
| ufs_setattr(ap) | ||||
|     ... | ||||
| { | ||||
|     ... | ||||
|         if (!suser_cred(cred, | ||||
|             jail_chflags_allowed ? SUSER_ALLOWJAIL : 0)) { | ||||
|         if (!priv_check_cred(cred, PRIV_VFS_SYSFLAGS, 0)) { | ||||
|             if (ip->i_flags | ||||
|                 & (SF_NOUNLINK | SF_IMMUTABLE | SF_APPEND)) { | ||||
|                 error = securelevel_gt(cred, 0); | ||||
|                 if (error) | ||||
|                     return (error); | ||||
|                     error = securelevel_gt(cred, 0); | ||||
|                     if (error) | ||||
|                         return (error); | ||||
|             } | ||||
|             ... | ||||
|         } | ||||
| } | ||||
| <filename>/usr/src/sys/kern/kern_priv.c</filename> | ||||
| int | ||||
| priv_check_cred(struct ucred *cred, int priv, int flags) | ||||
| { | ||||
|     ... | ||||
|     error = prison_priv_check(cred, priv); | ||||
|     if (error) | ||||
|         return (error); | ||||
|     ... | ||||
| } | ||||
| <filename>/usr/src/sys/kern/kern_jail.c</filename> | ||||
| int | ||||
| prison_priv_check(struct ucred *cred, int priv) | ||||
| { | ||||
|     ... | ||||
|     switch (priv) { | ||||
|     ... | ||||
|     case PRIV_VFS_SYSFLAGS: | ||||
|         if (jail_chflags_allowed) | ||||
|             return (0); | ||||
|         else | ||||
|             return (EPERM); | ||||
|     ... | ||||
|     } | ||||
|     ... | ||||
| }</programlisting> | ||||
| 
 | ||||
|     </sect2> | ||||
| 
 | ||||
|   </sect1> | ||||
|  |  | |||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue