as reported by the following: % cd doc/en_US.ISO8859-1 % find . -type f -name '*.sgml' |\ xargs grep '&[^[:space:];&][^[:space:];]*$' | grep -v '&$' % find . -type f -name '*.sgml' |\ xargs grep '&[^[:space:];&][^[:space:];]*[[:space:]<]' |\ grep -v '&[[: space:]]' | grep -v '[[:space:]]&&[[:space:]]'
1587 lines
57 KiB
Text
1587 lines
57 KiB
Text
<!--
|
|
The FreeBSD Documentation Project
|
|
|
|
$FreeBSD$
|
|
-->
|
|
|
|
<chapter id="ipv6">
|
|
<title>IPv6 Internals</title>
|
|
|
|
<sect1 id="ipv6-implementation">
|
|
<title>IPv6/IPsec Implementation</title>
|
|
|
|
<para><emphasis>Contributed by &a.shin;, 5 March
|
|
2000.</emphasis></para>
|
|
|
|
<para>This section should explain IPv6 and IPsec related implementation
|
|
internals. These functionalities are derived from <ulink
|
|
url="http://www.kame.net/">KAME project</ulink></para>
|
|
|
|
<sect2 id="ipv6details">
|
|
<title>IPv6</title>
|
|
|
|
<sect3>
|
|
<title>Conformance</title>
|
|
|
|
<para>The IPv6 related functions conforms, or tries to conform to
|
|
the latest set of IPv6 specifications. For future reference we list
|
|
some of the relevant documents below (<emphasis>NOTE</emphasis>: this
|
|
is not a complete list - this is too hard to maintain...).</para>
|
|
|
|
<para>For details please refer to specific chapter in the document,
|
|
RFCs, man pages, or comments in the source code.</para>
|
|
|
|
<para>Conformance tests have been performed on the KAME STABLE kit
|
|
at TAHI project. Results can be viewed at <ulink
|
|
url="http://www.tahi.org/report/KAME/">http://www.tahi.org/report/KAME/
|
|
</ulink>. We also attended Univ. of New Hampshire IOL tests (<ulink
|
|
url="http://www.iol.unh.edu/">http://www.iol.unh.edu/</ulink>) in the
|
|
past, with our past snapshots.</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>RFC1639: FTP Operation Over Big Address Records
|
|
(FOOBAR)</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>RFC2428 is preferred over RFC1639. FTP clients will
|
|
first try RFC2428, then RFC1639 if failed.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC1886: DNS Extensions to support IPv6</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC1933: Transition Mechanisms for IPv6 Hosts and
|
|
Routers</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>IPv4 compatible address is not supported.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>automatic tunneling (described in 4.3 of this RFC) is not
|
|
supported.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>&man.gif.4; interface implements IPv[46]-over-IPv[46]
|
|
tunnel in a generic way, and it covers "configured tunnel"
|
|
described in the spec. See <link linkend="gif">23.5.1.5</link>
|
|
in this document for details.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC1981: Path MTU Discovery for IPv6</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2080: RIPng for IPv6</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>usr.sbin/route6d support this.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2292: Advanced Sockets API for IPv6</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>For supported library functions/kernel APIs, see
|
|
<filename>sys/netinet6/ADVAPI</filename>.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2362: Protocol Independent Multicast-Sparse
|
|
Mode (PIM-SM)</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>RFC2362 defines packet formats for PIM-SM.
|
|
<filename>draft-ietf-pim-ipv6-01.txt</filename> is
|
|
written based on this.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2373: IPv6 Addressing Architecture</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>supports node required addresses, and conforms to
|
|
the scope requirement.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2374: An IPv6 Aggregatable Global Unicast Address
|
|
Format</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>supports 64-bit length of Interface ID.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2375: IPv6 Multicast Address Assignments</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>Userland applications use the well-known addresses
|
|
assigned in the RFC.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2428: FTP Extensions for IPv6 and NATs</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>RFC2428 is preferred over RFC1639. FTP clients will
|
|
first try RFC2428, then RFC1639 if failed.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2460: IPv6 specification</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2461: Neighbor discovery for IPv6</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>See <link linkend="neighbor-discovery">23.5.1.2</link>
|
|
in this document for details.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2462: IPv6 Stateless Address Autoconfiguration</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>See <link linkend="ipv6-pnp">23.5.1.4</link> in this
|
|
document for details.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2463: ICMPv6 for IPv6 specification</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>See <link linkend="icmpv6">23.5.1.9</link> in this
|
|
document for details.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2464: Transmission of IPv6 Packets over Ethernet
|
|
Networks</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2465: MIB for IPv6: Textual Conventions and General
|
|
Group</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>Necessary statistics are gathered by the kernel. Actual
|
|
IPv6 MIB support is provided as a patchkit for ucd-snmp.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2466: MIB for IPv6: ICMPv6 group</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>Necessary statistics are gathered by the kernel. Actual
|
|
IPv6 MIB support is provided as patchkit for ucd-snmp.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2467: Transmission of IPv6 Packets over FDDI
|
|
Networks</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2497: Transmission of IPv6 packet over ARCnet
|
|
Networks</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2553: Basic Socket Interface Extensions for IPv6</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>IPv4 mapped address (3.7) and special behavior of IPv6
|
|
wildcard bind socket (3.8) are supported. See <link
|
|
linkend="ipv6-wildcard-socket">23.5.1.12</link>
|
|
in this document for details.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2675: IPv6 Jumbograms</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>See <link linkend="ipv6-jumbo">23.5.1.7</link> in
|
|
this document for details.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2710: Multicast Listener Discovery for IPv6</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RFC2711: IPv6 router alert option</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><filename>draft-ietf-ipngwg-router-renum-08</filename>: Router
|
|
renumbering for IPv6</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><filename>draft-ietf-ipngwg-icmp-namelookups-02</filename>:
|
|
IPv6 Name Lookups Through ICMP</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><filename>draft-ietf-ipngwg-icmp-name-lookups-03</filename>:
|
|
IPv6 Name Lookups Through ICMP</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><filename>draft-ietf-pim-ipv6-01.txt</filename>:
|
|
PIM for IPv6</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>&man.pim6dd.8; implements dense mode. &man.pim6sd.8;
|
|
implements sparse mode.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><filename>draft-itojun-ipv6-tcp-to-anycast-00</filename>:
|
|
Disconnecting TCP connection toward IPv6 anycast address</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><filename>draft-yamamoto-wideipv6-comm-model-00</filename>
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>See <link linkend="ipv6-sas">23.5.1.6</link> in this
|
|
document for details.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><filename>draft-ietf-ipngwg-scopedaddr-format-00.txt
|
|
</filename>: An Extension of Format for IPv6 Scoped
|
|
Addresses</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</sect3>
|
|
|
|
<sect3 id="neighbor-discovery">
|
|
<title>Neighbor Discovery</title>
|
|
|
|
<para>Neighbor Discovery is fairly stable. Currently Address
|
|
Resolution, Duplicated Address Detection, and Neighbor Unreachability
|
|
Detection are supported. In the near future we will be adding Proxy
|
|
Neighbor Advertisement support in the kernel and Unsolicited Neighbor
|
|
Advertisement transmission command as admin tool.</para>
|
|
|
|
<para>If DAD fails, the address will be marked "duplicated" and
|
|
message will be generated to syslog (and usually to console). The
|
|
"duplicated" mark can be checked with &man.ifconfig.8;. It is
|
|
administrators' responsibility to check for and recover from DAD
|
|
failures. The behavior should be improved in the near future.</para>
|
|
|
|
<para>Some of the network driver loops multicast packets back to itself,
|
|
even if instructed not to do so (especially in promiscuous mode).
|
|
In such cases DAD may fail, because DAD engine sees inbound NS packet
|
|
(actually from the node itself) and considers it as a sign of duplicate.
|
|
You may want to look at #if condition marked "heuristics" in
|
|
sys/netinet6/nd6_nbr.c:nd6_dad_timer() as workaround (note that the code
|
|
fragment in "heuristics" section is not spec conformant).</para>
|
|
|
|
<para>Neighbor Discovery specification (RFC2461) does not talk about
|
|
neighbor cache handling in the following cases:</para>
|
|
|
|
<orderedlist>
|
|
<listitem>
|
|
<para>when there was no neighbor cache entry, node
|
|
received unsolicited RS/NS/NA/redirect packet without
|
|
link-layer address</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>neighbor cache handling on medium without link-layer
|
|
address (we need a neighbor cache entry for IsRouter bit)</para>
|
|
</listitem>
|
|
</orderedlist>
|
|
|
|
<para>For first case, we implemented workaround based on discussions
|
|
on IETF ipngwg mailing list. For more details, see the comments in
|
|
the source code and email thread started from (IPng 7155), dated
|
|
Feb 6 1999.</para>
|
|
|
|
<para>IPv6 on-link determination rule (RFC2461) is quite different
|
|
from assumptions in BSD network code. At this moment, no on-link
|
|
determination rule is supported where default router list is empty
|
|
(RFC2461, section 5.2, last sentence in 2nd paragraph - note that
|
|
the spec misuse the word "host" and "node" in several places in
|
|
the section).</para>
|
|
|
|
<para>To avoid possible DoS attacks and infinite loops, only 10
|
|
options on ND packet is accepted now. Therefore, if you have 20
|
|
prefix options attached to RA, only the first 10 prefixes will be
|
|
recognized. If this troubles you, please ask it on FREEBSD-CURRENT
|
|
mailing list and/or modify nd6_maxndopt in
|
|
<filename>sys/netinet6/nd6.c</filename>. If there are high demands
|
|
we may provide sysctl knob for the variable.</para>
|
|
</sect3>
|
|
|
|
<sect3 id="ipv6-scope-index">
|
|
<title>Scope Index</title>
|
|
|
|
<para>IPv6 uses scoped addresses. Therefore, it is very important to
|
|
specify scope index (interface index for link-local address, or
|
|
site index for site-local address) with an IPv6 address. Without
|
|
scope index, scoped IPv6 address is ambiguous to the kernel, and
|
|
kernel will not be able to determine the outbound interface for a
|
|
packet.</para>
|
|
|
|
<para>Ordinary userland applications should use advanced API
|
|
(RFC2292) to specify scope index, or interface index. For similar
|
|
purpose, sin6_scope_id member in sockaddr_in6 structure is defined
|
|
in RFC2553. However, the semantics for sin6_scope_id is rather vague.
|
|
If you care about portability of your application, we suggest you to
|
|
use advanced API rather than sin6_scope_id.</para>
|
|
|
|
<para>In the kernel, an interface index for link-local scoped address is
|
|
embedded into 2nd 16bit-word (3rd and 4th byte) in IPv6 address. For
|
|
example, you may see something like:
|
|
</para>
|
|
|
|
<screen> fe80:1::200:f8ff:fe01:6317
|
|
</screen>
|
|
|
|
<para>in the routing table and interface address structure (struct
|
|
in6_ifaddr). The address above is a link-local unicast address
|
|
which belongs to a network interface whose interface identifier is 1.
|
|
The embedded index enables us to identify IPv6 link local
|
|
addresses over multiple interfaces effectively and with only a
|
|
little code change.</para>
|
|
|
|
<para>Routing daemons and configuration programs, like &man.route6d.8;
|
|
and &man.ifconfig.8;, will need to manipulate the "embedded" scope
|
|
index. These programs use routing sockets and ioctls (like
|
|
SIOCGIFADDR_IN6) and the kernel API will return IPv6 addresses with
|
|
2nd 16bit-word filled in. The APIs are for manipulating kernel
|
|
internal structure. Programs that use these APIs have to be prepared
|
|
about differences in kernels anyway.</para>
|
|
|
|
<para>When you specify scoped address to the command line, NEVER write
|
|
the embedded form (such as ff02:1::1 or fe80:2::fedc). This is not
|
|
supposed to work. Always use standard form, like ff02::1 or
|
|
fe80::fedc, with command line option for specifying interface (like
|
|
<command>ping6 -I ne0 ff02::1</command>). In general, if a command
|
|
does not have command line option to specify outgoing interface, that
|
|
command is not ready to accept scoped address. This may seem to be
|
|
opposite from IPv6's premise to support "dentist office" situation.
|
|
We believe that specifications need some improvements for this.</para>
|
|
|
|
<para>Some of the userland tools support extended numeric IPv6 syntax,
|
|
as documented in
|
|
<filename>draft-ietf-ipngwg-scopedaddr-format-00.txt</filename>. You
|
|
can specify outgoing link, by using name of the outgoing interface
|
|
like "fe80::1%ne0". This way you will be able to specify link-local
|
|
scoped address without much trouble.</para>
|
|
|
|
<para>To use this extension in your program, you will need to use
|
|
&man.getaddrinfo.3;, and &man.getnameinfo.3; with NI_WITHSCOPEID.
|
|
The implementation currently assumes 1-to-1 relationship between a
|
|
link and an interface, which is stronger than what specs say.</para>
|
|
</sect3>
|
|
|
|
<sect3 id="ipv6-pnp">
|
|
<title>Plug and Play</title>
|
|
|
|
<para>Most of the IPv6 stateless address autoconfiguration is implemented
|
|
in the kernel. Neighbor Discovery functions are implemented in the
|
|
kernel as a whole. Router Advertisement (RA) input for hosts is
|
|
implemented in the kernel. Router Solicitation (RS) output for
|
|
endhosts, RS input for routers, and RA output for routers are
|
|
implemented in the userland.</para>
|
|
|
|
<sect4>
|
|
<title>Assignment of link-local, and special addresses</title>
|
|
|
|
<para>IPv6 link-local address is generated from IEEE802 address
|
|
(Ethernet MAC address). Each of interface is assigned an IPv6
|
|
link-local address automatically, when the interface becomes up
|
|
(IFF_UP). Also, direct route for the link-local address is added
|
|
to routing table.</para>
|
|
|
|
<para>Here is an output of netstat command:</para>
|
|
|
|
<screen>Internet6:
|
|
Destination Gateway Flags Netif Expire
|
|
fe80:1::%ed0/64 link#1 UC ed0
|
|
fe80:2::%ep0/64 link#2 UC ep0</screen>
|
|
|
|
<para>Interfaces that has no IEEE802 address (pseudo interfaces
|
|
like tunnel interfaces, or ppp interfaces) will borrow IEEE802
|
|
address from other interfaces, such as Ethernet interfaces,
|
|
whenever possible. If there is no IEEE802 hardware attached,
|
|
last-resort pseudorandom value, which is from MD5(hostname), will
|
|
be used as source of link-local address. If it is not suitable
|
|
for your usage, you will need to configure the link-local address
|
|
manually.</para>
|
|
|
|
<para>If an interface is not capable of handling IPv6 (such as
|
|
lack of multicast support), link-local address will not be
|
|
assigned to that interface. See section 2 for details.</para>
|
|
|
|
<para>Each interface joins the solicited multicast address and the
|
|
link-local all-nodes multicast addresses (e.g. fe80::1:ff01:6317
|
|
and ff02::1, respectively, on the link the interface is attached).
|
|
In addition to a link-local address, the loopback address (::1)
|
|
will be assigned to the loopback interface. Also, ::1/128 and
|
|
ff01::/32 are automatically added to routing table, and loopback
|
|
interface joins node-local multicast group ff01::1.</para>
|
|
</sect4>
|
|
|
|
<sect4>
|
|
<title>Stateless address autoconfiguration on hosts</title>
|
|
|
|
<para>In IPv6 specification, nodes are separated into two categories:
|
|
<emphasis>routers</emphasis> and <emphasis>hosts</emphasis>. Routers
|
|
forward packets addressed to others, hosts does not forward the
|
|
packets. net.inet6.ip6.forwarding defines whether this node is
|
|
router or host (router if it is 1, host if it is 0).</para>
|
|
|
|
<para>When a host hears Router Advertisement from the router, a host
|
|
may autoconfigure itself by stateless address autoconfiguration.
|
|
This behavior can be controlled by net.inet6.ip6.accept_rtadv (host
|
|
autoconfigures itself if it is set to 1). By autoconfiguration,
|
|
network address prefix for the receiving interface (usually global
|
|
address prefix) is added. Default route is also configured.
|
|
Routers periodically generate Router Advertisement packets. To
|
|
request an adjacent router to generate RA packet, a host can
|
|
transmit Router Solicitation. To generate a RS packet at any time,
|
|
use the <emphasis>rtsol</emphasis> command. &man.rtsold.8; daemon is
|
|
also available. &man.rtsold.8; generates Router Solicitation whenever
|
|
necessary, and it works great for nomadic usage (notebooks/laptops).
|
|
If one wishes to ignore Router Advertisements, use sysctl to set
|
|
net.inet6.ip6.accept_rtadv to 0.</para>
|
|
|
|
<para>To generate Router Advertisement from a router, use the
|
|
&man.rtadvd.8; daemon.</para>
|
|
|
|
<para>Note that, IPv6 specification assumes the following items, and
|
|
nonconforming cases are left unspecified:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>Only hosts will listen to router advertisements</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Hosts have single network interface (except loopback)</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Therefore, this is unwise to enable net.inet6.ip6.accept_rtadv
|
|
on routers, or multi-interface host. A misconfigured node can
|
|
behave strange (nonconforming configuration allowed for those who
|
|
would like to do some experiments).</para>
|
|
|
|
<para>To summarize the sysctl knob:</para>
|
|
|
|
<screen> accept_rtadv forwarding role of the node
|
|
--- --- ---
|
|
0 0 host (to be manually configured)
|
|
0 1 router
|
|
1 0 autoconfigured host
|
|
(spec assumes that host has single
|
|
interface only, autoconfigured host
|
|
with multiple interface is
|
|
out-of-scope)
|
|
1 1 invalid, or experimental
|
|
(out-of-scope of spec)</screen>
|
|
|
|
<para>RFC2462 has validation rule against incoming RA prefix
|
|
information option, in 5.5.3 (e). This is to protect hosts from
|
|
malicious (or misconfigured) routers that advertise very short
|
|
prefix lifetime. There was an update from Jim Bound to ipngwg
|
|
mailing list (look for "(ipng 6712)" in the archive) and it is
|
|
implemented Jim's update.</para>
|
|
|
|
<para>See <link linkend="neighbor-discovery">23.5.1.2</link> in
|
|
the document for relationship between DAD and
|
|
autoconfiguration.</para>
|
|
</sect4>
|
|
</sect3>
|
|
|
|
<sect3 id="gif">
|
|
<title>Generic tunnel interface</title>
|
|
|
|
<para>GIF (Generic InterFace) is a pseudo interface for configured
|
|
tunnel. Details are described in &man.gif.4;. Currently</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>v6 in v6</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>v6 in v4</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>v4 in v6</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>v4 in v4</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>are available. Use &man.gifconfig.8; to assign physical (outer)
|
|
source and destination address to gif interfaces. Configuration that
|
|
uses same address family for inner and outer IP header (v4 in v4, or
|
|
v6 in v6) is dangerous. It is very easy to configure interfaces and
|
|
routing tables to perform infinite level of tunneling.
|
|
<emphasis>Please be warned</emphasis>.</para>
|
|
|
|
<para>gif can be configured to be ECN-friendly. See <link
|
|
linkend="ipsec-ecn">23.5.4.5</link> for ECN-friendliness of
|
|
tunnels, and &man.gif.4; for how to configure.</para>
|
|
|
|
<para>If you would like to configure an IPv4-in-IPv6 tunnel with gif
|
|
interface, read &man.gif.4; carefully. You will need to
|
|
remove IPv6 link-local address automatically assigned to the gif
|
|
interface.</para>
|
|
</sect3>
|
|
|
|
<sect3 id="ipv6-sas">
|
|
<title>Source Address Selection</title>
|
|
|
|
<para>Current source selection rule is scope oriented (there are some
|
|
exceptions - see below). For a given destination, a source IPv6
|
|
address is selected by the following rule:</para>
|
|
|
|
<orderedlist>
|
|
<listitem>
|
|
<para>If the source address is explicitly specified by
|
|
the user (e.g. via the advanced API), the specified address
|
|
is used.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>If there is an address assigned to the outgoing
|
|
interface (which is usually determined by looking up the
|
|
routing table) that has the same scope as the destination
|
|
address, the address is used.</para>
|
|
|
|
<para>This is the most typical case.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>If there is no address that satisfies the above
|
|
condition, choose a global address assigned to one of
|
|
the interfaces on the sending node.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>If there is no address that satisfies the above condition,
|
|
and destination address is site local scope, choose a site local
|
|
address assigned to one of the interfaces on the sending node.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>If there is no address that satisfies the above condition,
|
|
choose the address associated with the routing table entry for the
|
|
destination. This is the last resort, which may cause scope
|
|
violation.</para>
|
|
</listitem>
|
|
</orderedlist>
|
|
|
|
<para>For instance, ::1 is selected for ff01::1,
|
|
fe80:1::200:f8ff:fe01:6317 for fe80:1::2a0:24ff:feab:839b (note
|
|
that embedded interface index - described in <link
|
|
linkend="ipv6-scope-index">23.5.1.3</link> - helps us
|
|
choose the right source address. Those embedded indices will not
|
|
be on the wire). If the outgoing interface has multiple address for
|
|
the scope, a source is selected longest match basis (rule 3). Suppose
|
|
3ffe:501:808:1:200:f8ff:fe01:6317 and 3ffe:2001:9:124:200:f8ff:fe01:6317
|
|
are given to the outgoing interface. 3ffe:501:808:1:200:f8ff:fe01:6317
|
|
is chosen as the source for the destination 3ffe:501:800::1.</para>
|
|
|
|
<para>Note that the above rule is not documented in the IPv6 spec.
|
|
It is considered "up to implementation" item. There are some cases
|
|
where we do not use the above rule. One example is connected TCP
|
|
session, and we use the address kept in tcb as the source. Another
|
|
example is source address for Neighbor Advertisement. Under the spec
|
|
(RFC2461 7.2.2) NA's source should be the target address of the
|
|
corresponding NS's target. In this case we follow the spec rather
|
|
than the above longest-match rule.</para>
|
|
|
|
<para>For new connections (when rule 1 does not apply), deprecated
|
|
addresses (addresses with preferred lifetime = 0) will not be chosen
|
|
as source address if other choices are available. If no other choices
|
|
are available, deprecated address will be used as a last resort. If
|
|
there are multiple choice of deprecated addresses, the above scope
|
|
rule will be used to choose from those deprecated addresses. If you
|
|
would like to prohibit the use of deprecated address for some reason,
|
|
configure net.inet6.ip6.use_deprecated to 0. The issue related to
|
|
deprecated address is described in RFC2462 5.5.4 (NOTE: there is
|
|
some debate underway in IETF ipngwg on how to use "deprecated"
|
|
address).</para>
|
|
</sect3>
|
|
|
|
<sect3 id="ipv6-jumbo">
|
|
<title>Jumbo Payload</title>
|
|
|
|
<para>The Jumbo Payload hop-by-hop option is implemented and can
|
|
be used to send IPv6 packets with payloads longer than 65,535 octets.
|
|
But currently no physical interface whose MTU is more than 65,535 is
|
|
supported, so such payloads can be seen only on the loopback
|
|
interface (i.e. lo0).</para>
|
|
|
|
<para>If you want to try jumbo payloads, you first have to reconfigure
|
|
the kernel so that the MTU of the loopback interface is more than
|
|
65,535 bytes; add the following to the kernel configuration file:</para>
|
|
|
|
<para><literal>
|
|
options "LARGE_LOMTU" #To test jumbo payload
|
|
</literal></para>
|
|
|
|
<para>and recompile the new kernel.</para>
|
|
|
|
<para>Then you can test jumbo payloads by the &man.ping6.8; command
|
|
with -b and -s options. The -b option must be specified to enlarge
|
|
the size of the socket buffer and the -s option specifies the length
|
|
of the packet, which should be more than 65,535. For example,
|
|
type as follows:</para>
|
|
|
|
<para><userinput>
|
|
&prompt.user; <command>ping6 -b 70000 -s 68000 ::1</command>
|
|
</userinput></para>
|
|
|
|
<para>The IPv6 specification requires that the Jumbo Payload option
|
|
must not be used in a packet that carries a fragment header. If
|
|
this condition is broken, an ICMPv6 Parameter Problem message must
|
|
be sent to the sender. specification is followed, but you cannot
|
|
usually see an ICMPv6 error caused by this requirement.</para>
|
|
|
|
<para>When an IPv6 packet is received, the frame length is checked and
|
|
compared to the length specified in the payload length field of the
|
|
IPv6 header or in the value of the Jumbo Payload option, if any. If
|
|
the former is shorter than the latter, the packet is discarded and
|
|
statistics are incremented. You can see the statistics as output of
|
|
&man.netstat.8; command with `-s -p ip6' option:</para>
|
|
|
|
<screen> &prompt.user; <command>netstat -s -p ip6</command>
|
|
ip6:
|
|
(snip)
|
|
1 with data size < data length</screen>
|
|
|
|
<para>So, kernel does not send an ICMPv6 error unless the erroneous
|
|
packet is an actual Jumbo Payload, that is, its packet size is more
|
|
than 65,535 bytes. As described above, currently no physical interface
|
|
with such a huge MTU is supported, so it rarely returns an
|
|
ICMPv6 error.</para>
|
|
|
|
<para>TCP/UDP over jumbogram is not supported at this moment. This
|
|
is because we have no medium (other than loopback) to test this.
|
|
Contact us if you need this.</para>
|
|
|
|
<para>IPsec does not work on jumbograms. This is due to some
|
|
specification twists in supporting AH with jumbograms (AH header
|
|
size influences payload length, and this makes it real hard to
|
|
authenticate inbound packet with jumbo payload option as well as AH).
|
|
</para>
|
|
|
|
<para>There are fundamental issues in *BSD support for jumbograms.
|
|
We would like to address those, but we need more time to finalize
|
|
these. To name a few:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>mbuf pkthdr.len field is typed as "int" in 4.4BSD, so
|
|
it will not hold jumbogram with len > 2G on 32bit architecture
|
|
CPUs. If we would like to support jumbogram properly, the field
|
|
must be expanded to hold 4G + IPv6 header + link-layer header.
|
|
Therefore, it must be expanded to at least int64_t
|
|
(u_int32_t is NOT enough).</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>We mistakingly use "int" to hold packet length in many
|
|
places. We need to convert them into larger integral type.
|
|
It needs a great care, as we may experience overflow during
|
|
packet length computation.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>We mistakingly check for ip6_plen field of IPv6 header
|
|
for packet payload length in various places. We should be
|
|
checking mbuf pkthdr.len instead. ip6_input() will perform
|
|
sanity check on jumbo payload option on input, and we can
|
|
safely use mbuf pkthdr.len afterwards.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>TCP code needs a careful update in bunch of places, of
|
|
course.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Loop prevention in header processing</title>
|
|
|
|
<para>IPv6 specification allows arbitrary number of extension headers
|
|
to be placed onto packets. If we implement IPv6 packet processing
|
|
code in the way BSD IPv4 code is implemented, kernel stack may
|
|
overflow due to long function call chain. sys/netinet6 code
|
|
is carefully designed to avoid kernel stack overflow. Because of
|
|
this, sys/netinet6 code defines its own protocol switch
|
|
structure, as "struct ip6protosw" (see
|
|
<filename>netinet6/ip6protosw.h</filename>). There is no such
|
|
update to IPv4 part (sys/netinet) for compatibility, but small
|
|
change is added to its pr_input() prototype. So "struct ipprotosw"
|
|
is also defined. Because of this, if you receive IPsec-over-IPv4
|
|
packet with massive number of IPsec headers, kernel stack may blow
|
|
up. IPsec-over-IPv6 is okay. (Off-course, for those all IPsec
|
|
headers to be processed, each such IPsec header must pass each
|
|
IPsec check. So an anonymous attacker will not be able to do such an
|
|
attack.)</para>
|
|
</sect3>
|
|
|
|
<sect3 id="icmpv6">
|
|
<title>ICMPv6</title>
|
|
|
|
<para>After RFC2463 was published, IETF ipngwg has decided to
|
|
disallow ICMPv6 error packet against ICMPv6 redirect, to prevent
|
|
ICMPv6 storm on a network medium. This is already implemented
|
|
into the kernel.</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Applications</title>
|
|
|
|
<para>For userland programming, we support IPv6 socket API as
|
|
specified in RFC2553, RFC2292 and upcoming Internet drafts.</para>
|
|
|
|
<para>TCP/UDP over IPv6 is available and quite stable. You can
|
|
enjoy &man.telnet.1;, &man.ftp.1;, &man.rlogin.1;, &man.rsh.1;,
|
|
&man.ssh.1;, etc. These applications are protocol independent.
|
|
That is, they automatically chooses IPv4 or IPv6 according to DNS.
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Kernel Internals</title>
|
|
|
|
<para>While ip_forward() calls ip_output(), ip6_forward() directly
|
|
calls if_output() since routers must not divide IPv6 packets into
|
|
fragments.</para>
|
|
|
|
<para>ICMPv6 should contain the original packet as long as possible
|
|
up to 1280. UDP6/IP6 port unreach, for instance, should contain
|
|
all extension headers and the *unchanged* UDP6 and IP6 headers.
|
|
So, all IP6 functions except TCP never convert network byte
|
|
order into host byte order, to save the original packet.</para>
|
|
|
|
<para>tcp_input(), udp6_input() and icmp6_input() can not assume that
|
|
IP6 header is preceding the transport headers due to extension
|
|
headers. So, in6_cksum() was implemented to handle packets whose IP6
|
|
header and transport header is not continuous. TCP/IP6 nor UDP6/IP6
|
|
header structures do not exist for checksum calculation.</para>
|
|
|
|
<para>To process IP6 header, extension headers and transport headers
|
|
easily, network drivers are now required to store packets in one
|
|
internal mbuf or one or more external mbufs. A typical old driver
|
|
prepares two internal mbufs for 96 - 204 bytes data, however, now
|
|
such packet data is stored in one external mbuf.</para>
|
|
|
|
<para><command>netstat -s -p ip6</command> tells you whether or not
|
|
your driver conforms such requirement. In the following example,
|
|
"cce0" violates the requirement. (For more information, refer to
|
|
Section 2.)</para>
|
|
|
|
<screen>Mbuf statistics:
|
|
317 one mbuf
|
|
two or more mbuf::
|
|
lo0 = 8
|
|
cce0 = 10
|
|
3282 one ext mbuf
|
|
0 two or more ext mbuf
|
|
</screen>
|
|
|
|
<para>Each input function calls IP6_EXTHDR_CHECK in the beginning to
|
|
check if the region between IP6 and its header is continuous.
|
|
IP6_EXTHDR_CHECK calls m_pullup() only if the mbuf has M_LOOP flag,
|
|
that is, the packet comes from the loopback interface. m_pullup()
|
|
is never called for packets coming from physical network interfaces.
|
|
</para>
|
|
|
|
<para>Both IP and IP6 reassemble functions never call m_pullup().</para>
|
|
</sect3>
|
|
|
|
<sect3 id="ipv6-wildcard-socket">
|
|
<title>IPv4 mapped address and IPv6 wildcard socket</title>
|
|
|
|
<para>RFC2553 describes IPv4 mapped address (3.7) and special behavior
|
|
of IPv6 wildcard bind socket (3.8). The spec allows you to:</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>Accept IPv4 connections by AF_INET6 wildcard bind
|
|
socket.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Transmit IPv4 packet over AF_INET6 socket by using
|
|
special form of the address like ::ffff:10.1.1.1.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>but the spec itself is very complicated and does not specify
|
|
how the socket layer should behave. Here we call the former one
|
|
"listening side" and the latter one "initiating side", for
|
|
reference purposes.</para>
|
|
|
|
<para>You can perform wildcard bind on both of the address families,
|
|
on the same port.</para>
|
|
|
|
<para>The following table show the behavior of FreeBSD 4.x.</para>
|
|
|
|
<screen>listening side initiating side
|
|
(AF_INET6 wildcard (connection to ::ffff:10.1.1.1)
|
|
socket gets IPv4 conn.)
|
|
--- ---
|
|
FreeBSD 4.x configurable supported
|
|
default: enabled
|
|
</screen>
|
|
|
|
<para>The following sections will give you more details, and how you can
|
|
configure the behavior.</para>
|
|
|
|
<para>Comments on listening side:</para>
|
|
|
|
<para>It looks that RFC2553 talks too little on wildcard bind issue,
|
|
especially on the port space issue, failure mode and relationship
|
|
between AF_INET/INET6 wildcard bind. There can be several separate
|
|
interpretation for this RFC which conform to it but behaves differently.
|
|
So, to implement portable application you should assume nothing
|
|
about the behavior in the kernel. Using &man.getaddrinfo.3; is the
|
|
safest way. Port number space and wildcard bind issues were discussed
|
|
in detail on ipv6imp mailing list, in mid March 1999 and it looks
|
|
that there is no concrete consensus (means, up to implementers).
|
|
You may want to check the mailing list archives.</para>
|
|
|
|
<para>If a server application would like to accept IPv4 and IPv6
|
|
connections, there will be two alternatives.</para>
|
|
|
|
<para>One is using AF_INET and AF_INET6 socket (you will need two
|
|
sockets). Use &man.getaddrinfo.3; with AI_PASSIVE into ai_flags,
|
|
and &man.socket.2; and &man.bind.2; to all the addresses returned.
|
|
By opening multiple sockets, you can accept connections onto the
|
|
socket with proper address family. IPv4 connections will be
|
|
accepted by AF_INET socket, and IPv6 connections will be accepted
|
|
by AF_INET6 socket.</para>
|
|
|
|
<para>Another way is using one AF_INET6 wildcard bind socket. Use
|
|
&man.getaddrinfo.3; with AI_PASSIVE into ai_flags and with
|
|
AF_INET6 into ai_family, and set the 1st argument hostname to
|
|
NULL. And &man.socket.2; and &man.bind.2; to the address returned.
|
|
(should be IPv6 unspecified addr). You can accept either of IPv4
|
|
and IPv6 packet via this one socket.</para>
|
|
|
|
<para>To support only IPv6 traffic on AF_INET6 wildcard binded socket
|
|
portably, always check the peer address when a connection is made
|
|
toward AF_INET6 listening socket. If the address is IPv4 mapped
|
|
address, you may want to reject the connection. You can check the
|
|
condition by using IN6_IS_ADDR_V4MAPPED() macro.</para>
|
|
|
|
<para>To resolve this issue more easily, there is system dependent
|
|
&man.setsockopt.2; option, IPV6_BINDV6ONLY, used like below.</para>
|
|
|
|
<screen> int on;
|
|
|
|
setsockopt(s, IPPROTO_IPV6, IPV6_BINDV6ONLY,
|
|
(char *)&on, sizeof (on)) < 0));
|
|
</screen>
|
|
|
|
<para>When this call succeed, then this socket only receive IPv6
|
|
packets.</para>
|
|
|
|
<para>Comments on initiating side:</para>
|
|
|
|
<para>Advise to application implementers: to implement a portable
|
|
IPv6 application (which works on multiple IPv6 kernels), we believe
|
|
that the following is the key to the success:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>NEVER hardcode AF_INET nor AF_INET6.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Use &man.getaddrinfo.3; and &man.getnameinfo.3;
|
|
throughout the system. Never use gethostby*(), getaddrby*(),
|
|
inet_*() or getipnodeby*(). (To update existing applications
|
|
to be IPv6 aware easily, sometime getipnodeby*() will be
|
|
useful. But if possible, try to rewrite the code to use
|
|
&man.getaddrinfo.3; and &man.getnameinfo.3;.)</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>If you would like to connect to destination, use
|
|
&man.getaddrinfo.3; and try all the destination returned,
|
|
like &man.telnet.1; does.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Some of the IPv6 stack is shipped with buggy
|
|
&man.getaddrinfo.3;. Ship a minimal working version with
|
|
your application and use that as last resort.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>If you would like to use AF_INET6 socket for both IPv4 and
|
|
IPv6 outgoing connection, you will need to use &man.getipnodebyname.3;.
|
|
When you would like to update your existing application to be IPv6
|
|
aware with minimal effort, this approach might be chosen. But please
|
|
note that it is a temporal solution, because &man.getipnodebyname.3;
|
|
itself is not recommended as it does not handle scoped IPv6 addresses
|
|
at all. For IPv6 name resolution, &man.getaddrinfo.3; is the
|
|
preferred API. So you should rewrite your application to use
|
|
&man.getaddrinfo.3;, when you get the time to do it.</para>
|
|
|
|
<para>When writing applications that make outgoing connections,
|
|
story goes much simpler if you treat AF_INET and AF_INET6 as totally
|
|
separate address family. {set,get}sockopt issue goes simpler,
|
|
DNS issue will be made simpler. We do not recommend you to rely
|
|
upon IPv4 mapped address.</para>
|
|
|
|
<sect4>
|
|
<title>unified tcp and inpcb code</title>
|
|
|
|
<para>FreeBSD 4.x uses shared tcp code between IPv4 and IPv6
|
|
(from sys/netinet/tcp*) and separate udp4/6 code. It uses
|
|
unified inpcb structure.</para>
|
|
|
|
<para>The platform can be configured to support IPv4 mapped address.
|
|
Kernel configuration is summarized as follows:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>By default, AF_INET6 socket will grab IPv4
|
|
connections in certain condition, and can initiate
|
|
connection to IPv4 destination embedded in IPv4 mapped
|
|
IPv6 address.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>You can disable it on entire system with sysctl like
|
|
below.</para>
|
|
|
|
<para>
|
|
<command>sysctl net.inet6.ip6.mapped_addr=0</command>
|
|
</para>
|
|
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<sect5>
|
|
<title>listening side</title>
|
|
|
|
<para>Each socket can be configured to support special AF_INET6
|
|
wildcard bind (enabled by default). You can disable it on
|
|
each socket basis with &man.setsockopt.2; like below.</para>
|
|
|
|
<screen> int on;
|
|
|
|
setsockopt(s, IPPROTO_IPV6, IPV6_BINDV6ONLY,
|
|
(char *)&on, sizeof (on)) < 0));
|
|
</screen>
|
|
|
|
<para>Wildcard AF_INET6 socket grabs IPv4 connection if and only
|
|
if the following conditions are satisfied:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>there is no AF_INET socket that matches the IPv4
|
|
connection</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>the AF_INET6 socket is configured to accept IPv4
|
|
traffic, i.e. getsockopt(IPV6_BINDV6ONLY) returns 0.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>There is no problem with open/close ordering.</para>
|
|
</sect5>
|
|
|
|
<sect5>
|
|
<title>initiating side</title>
|
|
|
|
<para>FreeBSD 4.x supports outgoing connection to IPv4 mapped
|
|
address (::ffff:10.1.1.1), if the node is configured to support
|
|
IPv4 mapped address.</para>
|
|
</sect5>
|
|
</sect4>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>sockaddr_storage</title>
|
|
|
|
<para>When RFC2553 was about to be finalized, there was discussion on
|
|
how struct sockaddr_storage members are named. One proposal is to
|
|
prepend "__" to the members (like "__ss_len") as they should not be
|
|
touched. The other proposal was not to prepend it (like "ss_len")
|
|
as we need to touch those members directly. There was no clear
|
|
consensus on it.</para>
|
|
|
|
<para>As a result, RFC2553 defines struct sockaddr_storage as
|
|
follows:</para>
|
|
|
|
<screen> struct sockaddr_storage {
|
|
u_char __ss_len; /* address length */
|
|
u_char __ss_family; /* address family */
|
|
/* and bunch of padding */
|
|
};
|
|
</screen>
|
|
|
|
<para>On the contrary, XNET draft defines as follows:</para>
|
|
|
|
<screen> struct sockaddr_storage {
|
|
u_char ss_len; /* address length */
|
|
u_char ss_family; /* address family */
|
|
/* and bunch of padding */
|
|
};
|
|
</screen>
|
|
|
|
<para>In December 1999, it was agreed that RFC2553bis should pick
|
|
the latter (XNET) definition.</para>
|
|
|
|
<para>Current implementation conforms to XNET definition, based on
|
|
RFC2553bis discussion.</para>
|
|
|
|
<para>If you look at multiple IPv6 implementations, you will be able
|
|
to see both definitions. As an userland programmer, the most
|
|
portable way of dealing with it is to:</para>
|
|
|
|
<orderedlist>
|
|
<listitem>
|
|
<para>ensure ss_family and/or ss_len are available on the
|
|
platform, by using GNU autoconf,</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>have -Dss_family=__ss_family to unify all occurrences
|
|
(including header file) into __ss_family, or</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>never touch __ss_family. cast to sockaddr * and use sa_family
|
|
like:</para>
|
|
|
|
<screen> struct sockaddr_storage ss;
|
|
family = ((struct sockaddr *)&ss)->sa_family
|
|
</screen>
|
|
|
|
</listitem>
|
|
</orderedlist>
|
|
</sect3>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Network Drivers</title>
|
|
|
|
<para>Now following two items are required to be supported by standard
|
|
drivers:</para>
|
|
|
|
<orderedlist>
|
|
<listitem>
|
|
<para>mbuf clustering requirement. In this stable release, we
|
|
changed MINCLSIZE into MHLEN+1 for all the operating systems
|
|
in order to make all the drivers behave as we expect.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>multicast. If &man.ifmcstat.8; yields no multicast group for
|
|
a interface, that interface has to be patched.</para>
|
|
</listitem>
|
|
</orderedlist>
|
|
|
|
<para>If any of the drivers do not support the requirements, then
|
|
the drivers can not be used for IPv6 and/or IPsec communication. If
|
|
you find any problem with your card using IPv6/IPsec, then, please
|
|
report it to the &a.bugs;.</para>
|
|
|
|
<para>(NOTE: In the past we required all PCMCIA drivers to have a
|
|
call to in6_ifattach(). We have no such requirement any more)</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Translator</title>
|
|
|
|
<para>We categorize IPv4/IPv6 translator into 4 types:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para><emphasis>Translator A</emphasis> --- It is used in the early
|
|
stage of transition to make it possible to establish a
|
|
connection from an IPv6 host in an IPv6 island to an IPv4 host
|
|
in the IPv4 ocean.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><emphasis>Translator B</emphasis> --- It is used in the early
|
|
stage of transition to make it possible to establish a connection
|
|
from an IPv4 host in the IPv4 ocean to an IPv6 host in an
|
|
IPv6 island.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><emphasis>Translator C</emphasis> --- It is used in the late
|
|
stage of transition to make it possible to establish a
|
|
connection from an IPv4 host in an IPv4 island to an IPv6 host
|
|
in the IPv6 ocean.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><emphasis>Translator D</emphasis> --- It is used in the late
|
|
stage of transition to make it possible to establish a
|
|
connection from an IPv6 host in the IPv6 ocean to an IPv4 host
|
|
in an IPv4 island.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>TCP relay translator for category A is supported. This is called
|
|
"FAITH". We also provide IP header translator for category A.
|
|
(The latter is not yet put into FreeBSD 4.x yet.)</para>
|
|
|
|
<sect3>
|
|
<title>FAITH TCP relay translator</title>
|
|
|
|
<para>FAITH system uses TCP relay daemon called &man.faithd.8; helped
|
|
by the kernel. FAITH will reserve an IPv6 address prefix, and relay
|
|
TCP connection toward that prefix to IPv4 destination.</para>
|
|
|
|
<para>For example, if the reserved IPv6 prefix is
|
|
3ffe:0501:0200:ffff::, and the IPv6 destination for TCP connection
|
|
is 3ffe:0501:0200:ffff::163.221.202.12, the connection will be
|
|
relayed toward IPv4 destination 163.221.202.12.</para>
|
|
|
|
<screen> destination IPv4 node (163.221.202.12)
|
|
^
|
|
| IPv4 tcp toward 163.221.202.12
|
|
FAITH-relay dual stack node
|
|
^
|
|
| IPv6 TCP toward 3ffe:0501:0200:ffff::163.221.202.12
|
|
source IPv6 node
|
|
</screen>
|
|
|
|
<para>&man.faithd.8; must be invoked on FAITH-relay dual stack
|
|
node.</para>
|
|
|
|
<para>For more details, consult
|
|
<filename>src/usr.sbin/faithd/README</filename></para>
|
|
</sect3>
|
|
</sect2>
|
|
|
|
<sect2 id="ipsec-implementation">
|
|
<title>IPsec</title>
|
|
|
|
<para>IPsec is mainly organized by three components.</para>
|
|
|
|
<orderedlist>
|
|
<listitem>
|
|
<para>Policy Management</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Key Management</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>AH and ESP handling</para>
|
|
</listitem>
|
|
</orderedlist>
|
|
|
|
<sect3>
|
|
<title>Policy Management</title>
|
|
|
|
<para>The kernel implements experimental policy management code.
|
|
There are two way to manage security policy. One is to configure
|
|
per-socket policy using &man.setsockopt.2;. In this cases, policy
|
|
configuration is described in &man.ipsec.set.policy.3;. The other
|
|
is to configure kernel packet filter-based policy using PF_KEY
|
|
interface, via &man.setkey.8;.</para>
|
|
|
|
<para>The policy entry is not re-ordered with its
|
|
indexes, so the order of entry when you add is very significant.</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Key Management</title>
|
|
|
|
<para>The key management code implemented in this kit (sys/netkey)
|
|
is a home-brew PFKEY v2 implementation. This conforms to RFC2367.
|
|
</para>
|
|
|
|
<para>The home-brew IKE daemon, "racoon" is included in the
|
|
kit (kame/kame/racoon). Basically you will need to run racoon as
|
|
daemon, then setup a policy to require keys (like
|
|
<command>ping -P 'out ipsec esp/transport//use'</command>).
|
|
The kernel will contact racoon daemon as necessary to exchange
|
|
keys.</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>AH and ESP handling</title>
|
|
|
|
<para>IPsec module is implemented as "hooks" to the standard IPv4/IPv6
|
|
processing. When sending a packet, ip{,6}_output() checks if ESP/AH
|
|
processing is required by checking if a matching SPD (Security
|
|
Policy Database) is found. If ESP/AH is needed,
|
|
{esp,ah}{4,6}_output() will be called and mbuf will be updated
|
|
accordingly. When a packet is received, {esp,ah}4_input() will be
|
|
called based on protocol number, i.e. (*inetsw[proto])().
|
|
{esp,ah}4_input() will decrypt/check authenticity of the packet,
|
|
and strips off daisy-chained header and padding for ESP/AH. It is
|
|
safe to strip off the ESP/AH header on packet reception, since we
|
|
will never use the received packet in "as is" form.</para>
|
|
|
|
<para>By using ESP/AH, TCP4/6 effective data segment size will be
|
|
affected by extra daisy-chained headers inserted by ESP/AH. Our
|
|
code takes care of the case.</para>
|
|
|
|
<para>Basic crypto functions can be found in directory "sys/crypto".
|
|
ESP/AH transform are listed in {esp,ah}_core.c with wrapper functions.
|
|
If you wish to add some algorithm, add wrapper function in
|
|
{esp,ah}_core.c, and add your crypto algorithm code into
|
|
sys/crypto.</para>
|
|
|
|
<para>Tunnel mode is partially supported in this release, with the
|
|
following restrictions:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>IPsec tunnel is not combined with GIF generic tunneling
|
|
interface. It needs a great care because we may create an
|
|
infinite loop between ip_output() and tunnelifp->if_output().
|
|
Opinion varies if it is better to unify them, or not.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>MTU and Don't Fragment bit (IPv4) considerations need more
|
|
checking, but basically works fine.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>Authentication model for AH tunnel must be revisited.
|
|
We will need to improve the policy management engine,
|
|
eventually.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Conformance to RFCs and IDs</title>
|
|
|
|
<para>The IPsec code in the kernel conforms (or, tries to conform)
|
|
to the following standards:</para>
|
|
|
|
<para>"old IPsec" specification documented in
|
|
<filename>rfc182[5-9].txt</filename></para>
|
|
|
|
<para>"new IPsec" specification documented in
|
|
<filename>rfc240[1-6].txt</filename>,
|
|
<filename>rfc241[01].txt</filename>, <filename>rfc2451.txt</filename>
|
|
and <filename>draft-mcdonald-simple-ipsec-api-01.txt</filename>
|
|
(draft expired, but you can take from <ulink
|
|
url="ftp://ftp.kame.net/pub/internet-drafts/">
|
|
ftp://ftp.kame.net/pub/internet-drafts/</ulink>).
|
|
(NOTE: IKE specifications, <filename>rfc241[7-9].txt</filename> are
|
|
implemented in userland, as "racoon" IKE daemon)</para>
|
|
|
|
<para>Currently supported algorithms are:</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>old IPsec AH</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>null crypto checksum (no document, just for
|
|
debugging)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>keyed MD5 with 128bit crypto checksum
|
|
(<filename>rfc1828.txt</filename>)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>keyed SHA1 with 128bit crypto checksum
|
|
(no document)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>HMAC MD5 with 128bit crypto checksum
|
|
(<filename>rfc2085.txt</filename>)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>HMAC SHA1 with 128bit crypto checksum
|
|
(no document)</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>old IPsec ESP</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>null encryption (no document, similar to
|
|
<filename>rfc2410.txt</filename>)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>DES-CBC mode (<filename>rfc1829.txt</filename>)</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>new IPsec AH</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>null crypto checksum (no document,
|
|
just for debugging)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>keyed MD5 with 96bit crypto checksum
|
|
(no document)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>keyed SHA1 with 96bit crypto checksum
|
|
(no document)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>HMAC MD5 with 96bit crypto checksum
|
|
(<filename>rfc2403.txt</filename>)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>HMAC SHA1 with 96bit crypto checksum
|
|
(<filename>rfc2404.txt</filename>)</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>new IPsec ESP</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>null encryption
|
|
(<filename>rfc2410.txt</filename>)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>DES-CBC with derived IV
|
|
(<filename>draft-ietf-ipsec-ciph-des-derived-01.txt</filename>,
|
|
draft expired)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>DES-CBC with explicit IV
|
|
(<filename>rfc2405.txt</filename>)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>3DES-CBC with explicit IV
|
|
(<filename>rfc2451.txt</filename>)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>BLOWFISH CBC
|
|
(<filename>rfc2451.txt</filename>)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>CAST128 CBC
|
|
(<filename>rfc2451.txt</filename>)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>RC5 CBC
|
|
(<filename>rfc2451.txt</filename>)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>each of the above can be combined with:</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>ESP authentication with HMAC-MD5(96bit)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>ESP authentication with HMAC-SHA1(96bit)</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>The following algorithms are NOT supported:</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
|
|
<para>old IPsec AH</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>HMAC MD5 with 128bit crypto checksum + 64bit
|
|
replay prevention (<filename>rfc2085.txt</filename>)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>keyed SHA1 with 160bit crypto checksum + 32bit padding
|
|
(<filename>rfc1852.txt</filename>)</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>IPsec (in kernel) and IKE (in userland as "racoon") has been
|
|
tested at several interoperability test events, and it is known to
|
|
interoperate with many other implementations well. Also, current
|
|
IPsec implementation as quite wide coverage for IPsec crypto
|
|
algorithms documented in RFC (we cover algorithms without intellectual
|
|
property issues only).</para>
|
|
</sect3>
|
|
|
|
<sect3 id="ipsec-ecn">
|
|
<title>ECN consideration on IPsec tunnels</title>
|
|
|
|
<para>ECN-friendly IPsec tunnel is supported as described in
|
|
<filename>draft-ipsec-ecn-00.txt</filename>.</para>
|
|
|
|
<para>Normal IPsec tunnel is described in RFC2401. On encapsulation,
|
|
IPv4 TOS field (or, IPv6 traffic class field) will be copied from inner
|
|
IP header to outer IP header. On decapsulation outer IP header
|
|
will be simply dropped. The decapsulation rule is not compatible
|
|
with ECN, since ECN bit on the outer IP TOS/traffic class field will be
|
|
lost.</para>
|
|
|
|
<para>To make IPsec tunnel ECN-friendly, we should modify encapsulation
|
|
and decapsulation procedure. This is described in <ulink
|
|
url="http://www.aciri.org/floyd/papers/draft-ipsec-ecn-00.txt">
|
|
http://www.aciri.org/floyd/papers/draft-ipsec-ecn-00.txt</ulink>,
|
|
chapter 3.</para>
|
|
|
|
<para>IPsec tunnel implementation can give you three behaviors, by
|
|
setting net.inet.ipsec.ecn (or net.inet6.ipsec6.ecn) to some
|
|
value:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>RFC2401: no consideration for ECN (sysctl value -1)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>ECN forbidden (sysctl value 0)</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>ECN allowed (sysctl value 1)</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Note that the behavior is configurable in per-node manner,
|
|
not per-SA manner (draft-ipsec-ecn-00 wants per-SA configuration,
|
|
but it looks too much for me).</para>
|
|
|
|
<para>The behavior is summarized as follows (see source code for
|
|
more detail):</para>
|
|
|
|
<screen>
|
|
encapsulate decapsulate
|
|
--- ---
|
|
RFC2401 copy all TOS bits drop TOS bits on outer
|
|
from inner to outer. (use inner TOS bits as is)
|
|
|
|
ECN forbidden copy TOS bits except for ECN drop TOS bits on outer
|
|
(masked with 0xfc) from inner (use inner TOS bits as is)
|
|
to outer. set ECN bits to 0.
|
|
|
|
ECN allowed copy TOS bits except for ECN use inner TOS bits with some
|
|
CE (masked with 0xfe) from change. if outer ECN CE bit
|
|
inner to outer. is 1, enable ECN CE bit on
|
|
set ECN CE bit to 0. the inner.
|
|
|
|
</screen>
|
|
|
|
<para>General strategy for configuration is as follows:</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>if both IPsec tunnel endpoint are capable of ECN-friendly
|
|
behavior, you should better configure both end to <quote>ECN allowed</quote>
|
|
(sysctl value 1).</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>if the other end is very strict about TOS bit, use "RFC2401"
|
|
(sysctl value -1).</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>in other cases, use "ECN forbidden" (sysctl value 0).</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>The default behavior is "ECN forbidden" (sysctl value 0).</para>
|
|
|
|
<para>For more information, please refer to:</para>
|
|
|
|
<para><ulink
|
|
url="http://www.aciri.org/floyd/papers/draft-ipsec-ecn-00.txt">
|
|
http://www.aciri.org/floyd/papers/draft-ipsec-ecn-00.txt</ulink>,
|
|
RFC2481 (Explicit Congestion Notification),
|
|
src/sys/netinet6/{ah,esp}_input.c</para>
|
|
|
|
<para>(Thanks goes to Kenjiro Cho <email>kjc@csl.sony.co.jp</email>
|
|
for detailed analysis)</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Interoperability</title>
|
|
|
|
<para>Here are (some of) platforms that KAME code have tested
|
|
IPsec/IKE interoperability in the past. Note that both ends may
|
|
have modified their implementation, so use the following list just
|
|
for reference purposes.</para>
|
|
|
|
<para>Altiga, Ashley-laurent (vpcom.com), Data Fellows (F-Secure),
|
|
Ericsson ACC, FreeS/WAN, HITACHI, IBM AIX, IIJ, Intel,
|
|
Microsoft WinNT, NIST (linux IPsec + plutoplus), Netscreen, OpenBSD,
|
|
RedCreek, Routerware, SSH, Secure Computing, Soliton, Toshiba,
|
|
VPNet, Yamaha RT100i</para>
|
|
</sect3>
|
|
</sect2>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<!--
|
|
Local Variables:
|
|
mode: sgml
|
|
sgml-declaration: "../chapter.decl"
|
|
sgml-indent-data: t
|
|
sgml-omittag: nil
|
|
sgml-always-quote-attributes: t
|
|
sgml-parent-document: ("../book.sgml" "part" "chapter")
|
|
End:
|
|
-->
|