doc/en_US.ISO8859-1/articles/5-roadmap/article.sgml
2012-09-14 17:47:48 +00:00

633 lines
26 KiB
XML

<?xml version="1.0" encoding="iso-8859-1" standalone="no"?>
<!DOCTYPE article PUBLIC "-//FreeBSD//DTD DocBook XML V4.2-Based Extension//EN"
"../../../share/sgml/freebsd42.dtd" [
<!ENTITY % entities PUBLIC "-//FreeBSD//ENTITIES DocBook FreeBSD Entity Set//EN" "../../share/sgml/entities.ent">
%entities;
<!ENTITY t.releng.3 "<literal>RELENG_3</literal>">
<!ENTITY t.releng.4 "<literal>RELENG_4</literal>">
<!ENTITY t.releng.5 "<literal>RELENG_5</literal>">
<!ENTITY t.releng.5.1 "<literal>RELENG_5_1</literal>">
<!ENTITY t.releng.5.2 "<literal>RELENG_5_2</literal>">
<!ENTITY t.releng.5.3 "<literal>RELENG_5_3</literal>">
<!ENTITY t.releng.head "<literal>HEAD</literal>">
]>
<article lang='en'>
<articleinfo>
<title>The Road Map for 5-STABLE</title>
<authorgroup>
<corpauthor>The &os; Release Engineering Team</corpauthor>
</authorgroup>
<copyright>
<year>2003</year>
<holder role="mailto:re@FreeBSD.org">The &os; Release
Engineering Team</holder>
</copyright>
<legalnotice id="trademarks" role="trademarks">
&tm-attrib.freebsd;
&tm-attrib.ieee;
&tm-attrib.intel;
&tm-attrib.sparc;
&tm-attrib.sun;
&tm-attrib.opengroup;
&tm-attrib.general;
</legalnotice>
<pubdate>$FreeBSD$</pubdate>
<releaseinfo>$FreeBSD$</releaseinfo>
<abstract>
<para> This document is now mostly of historical value. It
presented a roadmap for the development of &os;'s &t.releng.5;
branch. It was originally written in February 2003 (between
the 5.0 and 5.1 releases), and was intended to provide a plan
for making the &t.releng.5; branch <quote>stable</quote>, both
in terms of code quality and finalization of various
APIs/ABIs. For a different perspective, the article
<ulink url="&url.articles.version-guide;">
<quote>Choosing the &os; Version That Is Right For You</quote>
</ulink>
may be of interest. The version-guide article was written in August
2005 (two and a half years later), and it contains a section
discussing how these plans and events actually unfolded, as well as
some lessons learned.</para>
</abstract>
</articleinfo>
<sect1 id="intro">
<title>Introduction and Background</title>
<para>After nearly three years of work, &os; 5.0 was released in January
of 2003. Features like the GEOM block layer, Mandatory Access Controls,
ACPI, &sparc64; and ia64 platform support, and UFS snapshots, background
filesystem checks, and 64-bit inode sizes make it an exciting operating
system for both desktop and enterprise users. However, some important
features are not complete. The foundations for fine-grained locking
and preemption in the kernel exist, but much more work is left to be
done. Performance and stability compared to &os;
4.<replaceable>X</replaceable> has declined and must be restored and
surpassed.</para>
<para>This is somewhat similar to the situation that &os; faced in the
3.<replaceable>X</replaceable> series. Work on 3-CURRENT trudged along
seemingly forever, and finally a cry was made to <quote>just ship
it</quote> and clean up later. This decision resulted in the 3.0 and
3.1 releases being very unsatisfying for most, and it was not until 3.2
that the series was considered <quote>stable</quote>. To make matters
worse, the &t.releng.3; branch was created along with the 3.0 release,
and the &t.releng.head; branch was allowed to advance immediately
towards 4-CURRENT. This resulted in a quick divergence between
&t.releng.head; and &t.releng.3;, making maintenance of the &t.releng.3;
branch very difficult. &os; 2.2.8 was left for quite a while as the
last production-quality version of &os;.</para>
<para>Our intent is to avoid repeating that scenario with &os; 5.X.
Delaying the &t.releng.5; branch until it is stable and production quality
will ensure that it stays maintainable and provides a compelling reason
to upgrade from 4.<replaceable>X</replaceable>. To do this, we must
identify the current areas of weakness and set clear goals for
resolving them. This document contains what we as the release
engineering team feel are the milestones and issues that must be
resolved for the &t.releng.5; branch. It does not dictate every aspect of
&os; development, and we welcome further input. Nothing that follows
is meant to be a sleight against any person or group, or to trivialize
any work that has been done. There are some significant issues,
though, that need decisive and unbiased action.</para>
</sect1>
<sect1 id="major-issues">
<title>Major issues</title>
<para>The success of the 5.<replaceable>X</replaceable> series hinges on
the ability to deliver fine-grained threading and re-entrancy in the
kernel (also known as SMPng) and kernel-supported POSIX threads in
userland, while not sacrificing overall system stability or
performance.</para>
<sect2 id="SMPng">
<title>SMPng</title>
<para>The state of SMPng and kernel lockdown is the biggest concern for
5.<replaceable>X</replaceable>. To date, few major systems have come
out from under the kernel-wide mutex known as <quote>Giant</quote>.
The SMP status page at <ulink url="&url.base;/smp"></ulink>
provides a comprehensive breakdown
of the overall SMPng status. Status specific to SMPng progress in
device drivers can be found at
<ulink url="&url.base;/projects/busdma"></ulink>.
In summary:</para>
<itemizedlist>
<listitem>
<para>VM: Kernel malloc is locked and free of Giant. The UMA zone
allocator is also free of Giant. vm_object locking is in progress
and is an important step to making the buffer/cache free of
Giant. Pmap locking remains to be started.</para>
</listitem>
<listitem>
<para>GEOM: The GEOM block layer was designed to run free of Giant
and allow GEOM modules and underlying block drivers to run free
of Giant. Currently, only the &man.ata.4; and &man.aac.4; drivers
are locked and run without Giant. Work on other block drivers is
in progress. Locking the CAM subsystem is required for nearly all
SCSI drivers to run without Giant; this work has not started
yet.</para>
<para>Additionally, GEOM has the potential to suffer performance loss
due to its upcall and downcall data paths happening in kernel threads.
Improved lightweight context switches might help this.</para>
</listitem>
<listitem>
<para>Network: Work has restarted on locking the network stack.
Routing tables, ARP, bridge, IPFW, Fast-Forward, TCP, UDP, IP,
Fast IPSEC, and interface layers are being targeted initially, along
with several Ethernet device drivers. The socket layer, IPv6, and
other protocol layers will be targeted later. The primary goal
of this work is to regain the performance found in
&os; 4.<replaceable>X</replaceable>. The cost of context switching
to the device driver ithreads and the netisr is still hampering
performance.</para>
</listitem>
<listitem>
<para>VFS: Initial pre-cleanup started.</para>
</listitem>
<listitem>
<para>buffer/cache: Initial work complete on locking the buffer.</para>
</listitem>
<listitem>
<para>Proc: Initial proc locking is in place, further progress is
expected for &os; 5.2.</para>
</listitem>
<listitem>
<para>CAM: No significant work has occurred on the CAM SCSI
layer.</para>
</listitem>
<listitem>
<para>Newbus: some work has started on locking down the device_t
structure.</para>
</listitem>
<listitem>
<para>Pipes: complete</para>
</listitem>
<listitem>
<para>File descriptors: complete.</para>
</listitem>
<listitem>
<para>Process accounting: jails, credentials, MAC labels, and
scheduler are out from under Giant.</para>
</listitem>
<listitem>
<para>MAC Framework: complete</para>
</listitem>
<listitem>
<para>Timekeeping: complete</para>
</listitem>
<listitem>
<para>kernel encryption: crypto drivers and core &man.crypto.4;
framework are Giant-free. KAME IPsec has not been locked.</para>
</listitem>
<listitem>
<para>Sound subsystem: complete, but lock order reversal problems seem
to persist.</para>
</listitem>
<listitem>
<para>kernel preemption: preemption for interrupt threads is enabled.
However, contention due to Giant covering much of the kernel and
most of the device driver interrupt routines causes excessive
context switches and might actually be hurting performance. Work
is underway to explore ways to make preemption be
conditional.</para>
</listitem>
</itemizedlist>
</sect2>
<sect2 id="interrupts">
<title>Interrupt latency and servicing</title>
<para>SMPng introduced the concept of dedicating kernel threads, known as
ithreads, to servicing interrupts. With this, driver interrupt
service routines are allowed to block for mutexes, memory allocations,
etc. While this makes writing drivers easier, it introduces considerable
latency into the system due to the complete process context switch which must
be performed in order to service the ithread. This is aggravated by the
extensive coverage over the kernel by the Giant mutex, and often results
in multiple sleeps and context switches in order to service an interrupt.
Drivers that register their interrupt as INTR_MPSAFE are less likely to
feel these aggravating effects, but the overhead of doing a context
switch remains. Interrupt service routines that are registered as
INTR_FAST are run directly from the interrupt context and do not suffer
these problems at all. However, the INTR_FAST property forces the
interrupt line to be exclusive; no sharing can occur on it. The
proliferation of shared interrupts on PC systems makes this
undesirable.</para>
<para>Several ideas have been proposed to help combat this problem:</para>
<itemizedlist>
<listitem>
<para>Special casing ithreads to be lightweight is a possibility. This
might involve reducing the amount of saved context for the ithread,
stack-borrowing from another kthread, and/or creating a new fast-path
to avoid the mi_switch() routine.</para>
</listitem>
<listitem>
<para>A new interrupt model can be introduced to allow drivers to
register an 'interrupt filter' along with a normal service routine.
This would be similar to the Mac OS X model in use today. Interrupt
filter routines would allow the driver to determine if it is
interested in servicing the interrupt, allow it to squelch the
interrupt source, and possibly determine and schedule service
actions. It would run in the same context as the low-level interrupt
service routine, so sleeping would be strictly forbidden. If actions
that result in sleeping or blocking for long periods are required,
the filter would signal to the caller that its normal ithread routine
should be scheduled.</para>
</listitem>
</itemizedlist>
</sect2>
<sect2 id="KSE">
<title>Kernel-supported application threads</title>
<para>The FreeBSD 5.1 development cycle saw the KSE package jump into a
highly usable state. THR, an alternate threading package based on some
of the KSE kernel primitives but implementing purely 1:1 scheduling
semantics also appeared and is in a similarly experimental but usable
state. Users may interchange these two libraries along with the legacy
libc_r library via relinking their apps or by using the new libmap
feature of the runtime linker. This excellent progress must be driven
to completion before the &t.releng.5; branch point so that the libc_r
package can be deprecated.</para>
<itemizedlist>
<listitem>
<para>The kernel and userland components for KSE and THR must be
completed for all Tier-1 platforms. The decision on which thread
package to sanction as the default will likely be made on a
per-platform basis depending on the stability and completeness of
each package.</para>
<para>
<table id="KSETable">
<title>KSE Status</title>
<tgroup cols="4">
<thead>
<row>
<entry>Platform</entry>
<entry>Kernel</entry>
<entry>Userland</entry>
<entry>Works?</entry>
</row>
</thead>
<tbody>
<row>
<entry>i386</entry>
<entry>YES</entry>
<entry>YES</entry>
<entry>YES</entry>
</row>
<row>
<entry>alpha</entry>
<entry>NO</entry>
<entry>YES</entry>
<entry>NO</entry>
</row>
<row>
<entry>sparc64</entry>
<entry>YES</entry>
<entry>NO</entry>
<entry>NO</entry>
</row>
<row>
<entry>ia64</entry>
<entry>YES</entry>
<entry>YES</entry>
<entry>YES</entry>
</row>
<row>
<entry>amd64</entry>
<entry>YES</entry>
<entry>YES</entry>
<entry>YES</entry>
</row>
</tbody>
</tgroup>
</table>
<table id="THRTable">
<title>THR Status</title>
<tgroup cols="4">
<thead>
<row>
<entry>Platform</entry>
<entry>Kernel</entry>
<entry>Userland</entry>
<entry>Works?</entry>
</row>
</thead>
<tbody>
<row>
<entry>i386</entry>
<entry>YES</entry>
<entry>YES</entry>
<entry>YES</entry>
</row>
<row>
<entry>alpha</entry>
<entry>YES</entry>
<entry>YES</entry>
<entry>YES</entry>
</row>
<row>
<entry>sparc64</entry>
<entry>YES</entry>
<entry>YES</entry>
<entry>NO</entry>
</row>
<row>
<entry>ia64</entry>
<entry>YES</entry>
<entry>YES</entry>
<entry>YES</entry>
</row>
<row>
<entry>amd64</entry>
<entry>NO</entry>
<entry>NO</entry>
<entry>NO</entry>
</row>
</tbody>
</tgroup>
</table>
</para>
</listitem>
<listitem>
<para>KSE must pass the ACE test suite on all Tier-1 platforms.
Additional real-world testing must also be performed to ensure
that the libraries are indeed useful. At a minimum, the following
packages should be tested:</para>
<itemizedlist>
<listitem><para>OpenOffice</para></listitem>
<listitem><para>KDE Desktop</para></listitem>
<listitem><para>Apache 2.x</para></listitem>
<listitem><para>BIND 9.2.x</para></listitem>
<listitem><para>MySQL</para></listitem>
<listitem><para>&java; 1.4.x</para></listitem>
</itemizedlist>
</listitem>
</itemizedlist>
</sect2>
</sect1>
<sect1 id="goals">
<title>Requirements for 5-STABLE</title>
<para>The &t.releng.5; branch must offer users the same stability and
performance that is currently enjoyed in the &t.releng.4; branch.
While the goal of SMPng is to allow performance to far exceed what
is found in &t.releng.4; and its sibling BSDs, regaining performance
to the basic level is of the utmost importance. The branch must also
be mature enough to avoid ABI and API changes while still allowing
potential problems to be resolved.</para>
<sect2 id="API">
<title>ABI/API/Infrastructure stability</title>
<para>Enough infrastructure must be in place and stable to allow
fixes from &t.releng.head; to easily and safely be merged into
&t.releng.5;. Also, we must draw a line as to what subsystems are
to be locked down when we go into 5-STABLE.</para>
<itemizedlist>
<listitem>
<para>KSE: Both kernel and userland components must
reach the same level of functionality for all Tier-1 platforms,
in both UP and SMP configurations. The definition of <quote>Tier-1
platforms</quote> can be found in
<ulink url="&url.articles.committers-guide;/archs.html"></ulink>.
Continued testing against the ACE test
suite must be made as the &t.releng.5; branch draws near. KSE
must pose no functional regressions for the ongoing &java;
certification program. Common desktop and server applications
must run seamlessly under KSE. A policy must be decided on as
to which platforms will enable KSE as the default threading
package, how to allow the user to switch threading packages, and
how third-party packages will be made aware of these choices.</para>
</listitem>
<listitem>
<para>busdma interface and drivers: architectures like PAE/&i386; and
sparc64 which do not have a direct mapping between host memory
address space and expansion bus address space require the
elimination of vtophys() and friends. The busdma interface was
created to handle exactly this problem, but many drivers do not use
it yet. The busdma project at
<ulink url="&url.base;/projects/busdma"></ulink>
tracks the
progress of this and should be used to determine which drivers
must be converted for &t.releng.5; and which can be left behind.
No new storage or network drivers shall be allowed into the
&os; source tree. Exceptions for other classes of drivers must
be justified in public discussion.</para>
</listitem>
<listitem>
<para>PCI resource allocation: PC2003 compliance requires that x86
systems no longer configure PCI devices from the system BIOS,
leaving this task solely to the OS. &os; must gain the ability to
manage and allocate PCI memory resources on its own. Implementing
this should take into account cardbus, PCI-HotPlug, and laptop
docking-station requirements. This feature will become increasingly
critical through the lifetime of &t.releng.5;, and therefore is a
requirement for the &t.releng.5; branch.</para>
</listitem>
</itemizedlist>
</sect2>
<sect2 id="performance">
<title>Performance</title>
<para>Performance hinges on the progress of SMPng infrastructure in
the following areas:</para>
<itemizedlist>
<listitem>
<para>Storage: The GEOM block layer allows storage drivers to
run without Giant. All drivers that interface directly with
GEOM (as opposed to sitting underneath CAM or another middleware)
must be locked and free of Giant in both their strategy and
completion paths. Their interrupt handlers must also run free
of Giant.</para>
</listitem>
<listitem>
<para>Network: The layers in the IPv4 path below the socket layer
must be locked and free of Giant. This includes the protocol,
routing, bridging, filtering, and hardware layers. Allowances must
be made for protocols that are not locked, especially IPv6.
Testing must also be performed to ensure stability, correctness,
and performance.</para>
</listitem>
<listitem>
<para>Interrupt and context switching: As discussed above, interrupt
latency and context switching have a severe impact of performance.
Context switching for ithreads and kthreads must be improved.
New interrupt handling models that allow for faster and
more flexible handling of both traditional and MSI interrupts must
be investigated and implemented.</para>
</listitem>
</itemizedlist>
</sect2>
<sect2 id="benchmarks">
<title>Benchmarks and performance testing</title>
<para>Having a source of reliable and useful benchmarks is essential
to identifying performance problems and guarding against performance
regressions. A <quote>performance team</quote> that is made up of
people and resources for formulating, developing, and executing
benchmark tests should be put into place soon. Comparisons should
be made against both &os; 4.<replaceable>X</replaceable> and Linux
2.4/2.6. Tests to consider are:</para>
<itemizedlist>
<listitem>
<para>the classic <quote>worldstone</quote></para>
</listitem>
<listitem>
<para>webstone: <filename role="package">www/webstone</filename></para>
</listitem>
<listitem>
<para>Fstress: <ulink url="http://www.cs.duke.edu/ari/fstress/"></ulink></para>
</listitem>
<listitem>
<para>ApacheBench: <filename role="package">www/p5-ApacheBench</filename></para>
</listitem>
<listitem>
<para>netperf: <filename role="package">benchmarks/netperf</filename></para>
</listitem>
<listitem>
<para>Web Polygraph: <ulink url="http://www.web-polygraph.org/"></ulink>
Note: does not compile with gcc 3.x yet.</para>
</listitem>
</itemizedlist>
</sect2>
<sect2 id="features">
<title>Features:</title>
<itemizedlist>
<listitem>
<para>NEWCARD/OLDCARD: The NEWCARD subsystem was made the default
for &os;&nbsp;5.0. Unfortunately, it does not include support for
non-Cardbus bridges and falls victim to interrupt routing
problems on some laptops. The classic 16-bit bridge support,
OLDCARD, still exists and can be compiled in, but this is highly
inconvenient for users of older laptops. If OLDCARD cannot be
completely deprecated for &t.releng.5;, then provisions must be made
to allow users to easily install an OLDCARD-enabled kernel.
Documentation should be written to help transition users from
OLDCARD to NEWCARD and from &man.pccardd.8; to
&man.devd.8;. The power management and
<quote>dumpcis</quote> functionality of &man.pccardc.8; needs to be
brought forward to work with NEWCARD, along with the ability to
load CIS quirk entries. Most of this functionality can be
integrated into &man.devd.8; and
&man.devctl.4;.</para>
</listitem>
<listitem>
<para>New scheduler framework: The new scheduler framework is in
place, and users can select between the classic 4BSD scheduler
and the new ULE scheduler. A scheduler that demonstrates
processor affinity, HyperThreading and KSE awareness, and no
regressions in performance or interactivity characteristics must
be available for &t.releng.5;.</para>
</listitem>
<listitem>
<para>GDB: GDB in the base system must work for sparc64, and
must also understand KSE thread semantics. GDB 5.3 is available
and is reported to address the sparc64 issues.</para>
</listitem>
</itemizedlist>
</sect2>
<sect2 id="documentation">
<title>Documentation:</title>
<itemizedlist>
<listitem>
<para>The manual pages, Handbook, and FAQ should be free from
content specific to &os; 4.<replaceable>X</replaceable>, i.e. all
text should be equally applicable to &os;
5.<replaceable>X</replaceable>. The installation section of the
handbook needs the most work in this area.</para>
</listitem>
<listitem>
<para>The release documentation needs to be complete and accurate
for all Tier-1 architectures. The hardware notes and
installation guides need specific attention.</para>
</listitem>
</itemizedlist>
</sect2>
</sect1>
<sect1 id="future">
<title>Post &t.releng.5; direction</title>
<para>The focus should be bug fixes and incremental improvements, as with
all the -STABLE development branches. Following the usual procedure,
everything should be vetted through the &t.releng.head; branch first and
committed to &t.releng.5; with caution. New device drivers, incremental
features, etc, will be welcome in the branch once they have been tested
in &t.releng.head; and found stable enough.</para>
<para>Further SMPng lockdowns will be divided into two categories: driver
and subsystem. The only subsystem that will be sufficiently locked
down for &t.releng.5; will be GEOM, so incrementally locking down device
drivers under it is a worthy goal for the branch. Full subsystem
lockdowns will have to be fully tested and proven in &t.releng.head; before
consideration will be given to merging them into &t.releng.5;.</para>
</sect1>
</article>