<?xml version="1.0" encoding="iso-8859-1" standalone="no"?>
<!DOCTYPE article PUBLIC "-//FreeBSD//DTD DocBook XML V4.2-Based Extension//EN"
	"../../../share/sgml/freebsd42.dtd" [
<!ENTITY % entities PUBLIC "-//FreeBSD//ENTITIES DocBook FreeBSD Entity Set//EN" "../../share/sgml/entities.ent">
%entities;

<!ENTITY t.releng.3 "<literal>RELENG_3</literal>">
<!ENTITY t.releng.4 "<literal>RELENG_4</literal>">
<!ENTITY t.releng.5 "<literal>RELENG_5</literal>">
<!ENTITY t.releng.5.1 "<literal>RELENG_5_1</literal>">
<!ENTITY t.releng.5.2 "<literal>RELENG_5_2</literal>">
<!ENTITY t.releng.5.3 "<literal>RELENG_5_3</literal>">
<!ENTITY t.releng.head "<literal>HEAD</literal>">
]>

<article lang='en'>
  <articleinfo>
    <title>The Road Map for 5-STABLE</title>

    <authorgroup>
      <corpauthor>The &os; Release Engineering Team</corpauthor>
    </authorgroup>

    <copyright>
      <year>2003</year>
      <holder role="mailto:re@FreeBSD.org">The &os; Release
        Engineering Team</holder>
    </copyright>

    <legalnotice id="trademarks" role="trademarks">
      &tm-attrib.freebsd;
      &tm-attrib.ieee;
      &tm-attrib.intel;
      &tm-attrib.sparc;
      &tm-attrib.sun;
      &tm-attrib.opengroup;
      &tm-attrib.general;
    </legalnotice>

    <pubdate>$FreeBSD$</pubdate>

    <releaseinfo>$FreeBSD$</releaseinfo>

    <abstract>
      <para> This document is now mostly of historical value.  It
	presented a roadmap for the development of &os;'s &t.releng.5;
	branch.  It was originally written in February 2003 (between
	the 5.0 and 5.1 releases), and was intended to provide a plan
	for making the &t.releng.5; branch <quote>stable</quote>, both
	in terms of code quality and finalization of various
	APIs/ABIs.  For a different perspective, the article
	<ulink url="&url.articles.version-guide;">
	  <quote>Choosing the &os; Version That Is Right For You</quote>
	</ulink>
	may be of interest.  The version-guide article was written in August
	2005 (two and a half years later), and it contains a section
	discussing how these plans and events actually unfolded, as well as
	some lessons learned.</para>
    </abstract>

  </articleinfo>

  <sect1 id="intro">
    <title>Introduction and Background</title>

    <para>After nearly three years of work, &os; 5.0 was released in January
      of 2003.  Features like the GEOM block layer, Mandatory Access Controls,
      ACPI, &sparc64; and ia64 platform support, and UFS snapshots, background
      filesystem checks, and 64-bit inode sizes make it an exciting operating
      system for both desktop and enterprise users.  However, some important
      features are not complete.  The foundations for fine-grained locking
      and preemption in the kernel exist, but much more work is left to be
      done.  Performance and stability compared to &os;
      4.<replaceable>X</replaceable> has declined and must be restored and
      surpassed.</para>

    <para>This is somewhat similar to the situation that &os; faced in the
      3.<replaceable>X</replaceable> series.  Work on 3-CURRENT trudged along
      seemingly forever, and finally a cry was made to <quote>just ship
      it</quote> and clean up later.  This decision resulted in the 3.0 and
      3.1 releases being very unsatisfying for most, and it was not until 3.2
      that the series was considered <quote>stable</quote>.  To make matters
      worse, the &t.releng.3; branch was created along with the 3.0 release,
      and the &t.releng.head; branch was allowed to advance immediately
      towards 4-CURRENT.  This resulted in a quick divergence between
      &t.releng.head; and &t.releng.3;, making maintenance of the &t.releng.3;
      branch very difficult.  &os; 2.2.8 was left for quite a while as the
      last production-quality version of &os;.</para>

    <para>Our intent is to avoid repeating that scenario with &os; 5.X.
      Delaying the &t.releng.5; branch until it is stable and production quality
      will ensure that it stays maintainable and provides a compelling reason
      to upgrade from 4.<replaceable>X</replaceable>.  To do this, we must
      identify the current areas of weakness and set clear goals for
      resolving them.  This document contains what we as the release
      engineering team feel are the milestones and issues that must be
      resolved for the &t.releng.5; branch.  It does not dictate every aspect of
      &os; development, and we welcome further input.  Nothing that follows
      is meant to be a sleight against any person or group, or to trivialize
      any work that has been done.  There are some significant issues,
      though, that need decisive and unbiased action.</para>
  </sect1>

  <sect1 id="major-issues">
    <title>Major issues</title>

    <para>The success of the 5.<replaceable>X</replaceable> series hinges on
      the ability to deliver fine-grained threading and re-entrancy in the
      kernel (also known as SMPng) and kernel-supported POSIX threads in
      userland, while not sacrificing overall system stability or
      performance.</para>

  <sect2 id="SMPng">
    <title>SMPng</title>

    <para>The state of SMPng and kernel lockdown is the biggest concern for
      5.<replaceable>X</replaceable>.  To date, few major systems have come
      out from under the kernel-wide mutex known as <quote>Giant</quote>.
      The SMP status page at <ulink url="&url.base;/smp"></ulink>
      provides a comprehensive breakdown
      of the overall SMPng status.  Status specific to SMPng progress in
      device drivers can be found at
      <ulink url="&url.base;/projects/busdma"></ulink>.
      In summary:</para>

    <itemizedlist>
      <listitem>
        <para>VM: Kernel malloc is locked and free of Giant.  The UMA zone
          allocator is also free of Giant.  vm_object locking is in progress
          and is an important step to making the buffer/cache free of
	  Giant.  Pmap locking remains to be started.</para>
      </listitem>

      <listitem>
        <para>GEOM: The GEOM block layer was designed to run free of Giant
	  and allow GEOM modules and underlying block drivers to run free
	  of Giant.  Currently, only the &man.ata.4; and &man.aac.4; drivers
	  are locked and run without Giant.  Work on other block drivers is
	  in progress.  Locking the CAM subsystem is required for nearly all
	  SCSI drivers to run without Giant; this work has not started
	  yet.</para>
	<para>Additionally, GEOM has the potential to suffer performance loss
	  due to its upcall and downcall data paths happening in kernel threads.
	  Improved lightweight context switches might help this.</para>
      </listitem>

      <listitem>
        <para>Network: Work has restarted on locking the network stack.
	  Routing tables, ARP, bridge, IPFW, Fast-Forward, TCP, UDP, IP,
	  Fast IPSEC, and interface layers are being targeted initially, along
          with several Ethernet device drivers.  The socket layer, IPv6, and
          other protocol layers will be targeted later.  The primary goal
          of this work is to regain the performance found in
          &os; 4.<replaceable>X</replaceable>.  The cost of context switching
          to the device driver ithreads and the netisr is still hampering
          performance.</para>
      </listitem>

      <listitem>
        <para>VFS: Initial pre-cleanup started.</para>
      </listitem>

      <listitem>
        <para>buffer/cache: Initial work complete on locking the buffer.</para>
      </listitem>

      <listitem>
        <para>Proc: Initial proc locking is in place, further progress is
	  expected for &os; 5.2.</para>
      </listitem>

      <listitem>
        <para>CAM: No significant work has occurred on the CAM SCSI
          layer.</para>
      </listitem>

      <listitem>
        <para>Newbus: some work has started on locking down the device_t
          structure.</para>
      </listitem>

      <listitem>
        <para>Pipes: complete</para>
      </listitem>

      <listitem>
        <para>File descriptors: complete.</para>
      </listitem>

      <listitem>
        <para>Process accounting: jails, credentials, MAC labels, and
          scheduler are out from under Giant.</para>
      </listitem>

      <listitem>
        <para>MAC Framework: complete</para>
      </listitem>

      <listitem>
         <para>Timekeeping: complete</para>
      </listitem>

      <listitem>
        <para>kernel encryption: crypto drivers and core &man.crypto.4;
	  framework are Giant-free.  KAME IPsec has not been locked.</para>
      </listitem>

      <listitem>
        <para>Sound subsystem: complete, but lock order reversal problems seem
	  to persist.</para>
      </listitem>

      <listitem>
        <para>kernel preemption: preemption for interrupt threads is enabled.
          However, contention due to Giant covering much of the kernel and
          most of the device driver interrupt routines causes excessive
          context switches and might actually be hurting performance.  Work
          is underway to explore ways to make preemption be
          conditional.</para>
      </listitem>
    </itemizedlist>

  </sect2>

  <sect2 id="interrupts">
    <title>Interrupt latency and servicing</title>

    <para>SMPng introduced the concept of dedicating kernel threads, known as
      ithreads, to servicing interrupts.  With this, driver interrupt
      service routines are allowed to block for mutexes, memory allocations,
      etc.  While this makes writing drivers easier, it introduces considerable
      latency into the system due to the complete process context switch which must
      be performed in order to service the ithread.  This is aggravated by the
      extensive coverage over the kernel by the Giant mutex, and often results
      in multiple sleeps and context switches in order to service an interrupt.
      Drivers that register their interrupt as INTR_MPSAFE are less likely to
      feel these aggravating effects, but the overhead of doing a context
      switch remains.  Interrupt service routines that are registered as
      INTR_FAST are run directly from the interrupt context and do not suffer
      these problems at all.  However, the INTR_FAST property forces the
      interrupt line to be exclusive; no sharing can occur on it.  The
      proliferation of shared interrupts on PC systems makes this
      undesirable.</para>

    <para>Several ideas have been proposed to help combat this problem:</para>
    <itemizedlist>
      <listitem>
        <para>Special casing ithreads to be lightweight is a possibility.  This
          might involve reducing the amount of saved context for the ithread,
          stack-borrowing from another kthread, and/or creating a new fast-path
          to avoid the mi_switch() routine.</para>
      </listitem>

      <listitem>
        <para>A new interrupt model can be introduced to allow drivers to
          register an 'interrupt filter' along with a normal service routine.
          This would be similar to the Mac OS X model in use today.  Interrupt
          filter routines would allow the driver to determine if it is
          interested in servicing the interrupt, allow it to squelch the
          interrupt source, and possibly determine and schedule service
          actions.  It would run in the same context as the low-level interrupt
          service routine, so sleeping would be strictly forbidden.  If actions
          that result in sleeping or blocking for long periods are required,
          the filter would signal to the caller that its normal ithread routine
          should be scheduled.</para>
      </listitem>
    </itemizedlist>

  </sect2>

  <sect2 id="KSE">
    <title>Kernel-supported application threads</title>

    <para>The FreeBSD 5.1 development cycle saw the KSE package jump into a
      highly usable state. THR, an alternate threading package based on some
      of the KSE kernel primitives but implementing purely 1:1 scheduling
      semantics also appeared and is in a similarly experimental but usable
      state. Users may interchange these two libraries along with the legacy
      libc_r library via relinking their apps or by using the new libmap
      feature of the runtime linker. This excellent progress must be driven
      to completion before the &t.releng.5; branch point so that the libc_r
      package can be deprecated.</para>

    <itemizedlist>
      <listitem>
        <para>The kernel and userland components for KSE and THR must be
          completed for all Tier-1 platforms.  The decision on which thread
          package to sanction as the default will likely be made on a
          per-platform basis depending on the stability and completeness of
          each package.</para>

        <para>
          <table id="KSETable">
            <title>KSE Status</title>
            <tgroup cols="4">
              <thead>
                <row>
                  <entry>Platform</entry>
                  <entry>Kernel</entry>
                  <entry>Userland</entry>
                  <entry>Works?</entry>
                </row>
              </thead>

              <tbody>
                <row>
                  <entry>i386</entry>
                  <entry>YES</entry>
                  <entry>YES</entry>
                  <entry>YES</entry>
                </row>

                <row>
                  <entry>alpha</entry>
                  <entry>NO</entry>
                  <entry>YES</entry>
                  <entry>NO</entry>
                </row>

                <row>
                  <entry>sparc64</entry>
                  <entry>YES</entry>
                  <entry>NO</entry>
                  <entry>NO</entry>
                </row>

                <row>
                  <entry>ia64</entry>
                  <entry>YES</entry>
                  <entry>YES</entry>
                  <entry>YES</entry>
                </row>

                <row>
                  <entry>amd64</entry>
                  <entry>YES</entry>
                  <entry>YES</entry>
                  <entry>YES</entry>
                </row>
              </tbody>
            </tgroup>
          </table>

          <table id="THRTable">
            <title>THR Status</title>
            <tgroup cols="4">
              <thead>
                <row>
                  <entry>Platform</entry>
                  <entry>Kernel</entry>
                  <entry>Userland</entry>
                  <entry>Works?</entry>
                </row>
              </thead>

              <tbody>
                <row>
                  <entry>i386</entry>
                  <entry>YES</entry>
                  <entry>YES</entry>
                  <entry>YES</entry>
                </row>

                <row>
                  <entry>alpha</entry>
                  <entry>YES</entry>
                  <entry>YES</entry>
                  <entry>YES</entry>
                </row>

                <row>
                  <entry>sparc64</entry>
                  <entry>YES</entry>
                  <entry>YES</entry>
                  <entry>NO</entry>
                </row>

                <row>
                  <entry>ia64</entry>
                  <entry>YES</entry>
                  <entry>YES</entry>
                  <entry>YES</entry>
                </row>

                <row>
                  <entry>amd64</entry>
                  <entry>NO</entry>
                  <entry>NO</entry>
                  <entry>NO</entry>
                </row>
              </tbody>
            </tgroup>
          </table>
        </para>
      </listitem>

      <listitem>
        <para>KSE must pass the ACE test suite on all Tier-1 platforms.
          Additional real-world testing must also be performed to ensure
          that the libraries are indeed useful.  At a minimum, the following
          packages should be tested:</para>
        <itemizedlist>
          <listitem><para>OpenOffice</para></listitem>
          <listitem><para>KDE Desktop</para></listitem>
          <listitem><para>Apache 2.x</para></listitem>
          <listitem><para>BIND 9.2.x</para></listitem>
          <listitem><para>MySQL</para></listitem>
          <listitem><para>&java; 1.4.x</para></listitem>
        </itemizedlist>
      </listitem>

    </itemizedlist>

  </sect2>
  </sect1>

  <sect1 id="goals">
    <title>Requirements for 5-STABLE</title>

    <para>The &t.releng.5; branch must offer users the same stability and
      performance that is currently enjoyed in the &t.releng.4; branch.
      While the goal of SMPng is to allow performance to far exceed what
      is found in &t.releng.4; and its sibling BSDs, regaining performance
      to the basic level is of the utmost importance.  The branch must also
      be mature enough to avoid ABI and API changes while still allowing
      potential problems to be resolved.</para>

    <sect2 id="API">
      <title>ABI/API/Infrastructure stability</title>
        <para>Enough infrastructure must be in place and stable to allow
          fixes from &t.releng.head; to easily and safely be merged into
          &t.releng.5;.  Also, we must draw a line as to what subsystems are
          to be locked down when we go into 5-STABLE.</para>

      <itemizedlist>
        <listitem>
          <para>KSE:  Both kernel and userland components must
            reach the same level of functionality for all Tier-1 platforms,
            in both UP and SMP configurations.  The definition of <quote>Tier-1
            platforms</quote> can be found in
	    <ulink url="&url.articles.committers-guide;/archs.html"></ulink>.
	    Continued testing against the ACE test
            suite must be made as the &t.releng.5; branch draws near.  KSE
            must pose no functional regressions for the ongoing &java;
            certification program.  Common desktop and server applications
            must run seamlessly under KSE.  A policy must be decided on as
            to which platforms will enable KSE as the default threading
            package, how to allow the user to switch threading packages, and
            how third-party packages will be made aware of these choices.</para>
        </listitem>

        <listitem>
          <para>busdma interface and drivers: architectures like PAE/&i386; and
            sparc64 which do not have a direct mapping between host memory
            address space and expansion bus address space require the
            elimination of vtophys() and friends.  The busdma interface was
            created to handle exactly this problem, but many drivers do not use
            it yet.  The busdma project at
            <ulink url="&url.base;/projects/busdma"></ulink>
            tracks the
            progress of this and should be used to determine which drivers
            must be converted for &t.releng.5; and which can be left behind.
            No new storage or network drivers shall be allowed into the
            &os; source tree.  Exceptions for other classes of drivers must
            be justified in public discussion.</para>
        </listitem>

        <listitem>
          <para>PCI resource allocation: PC2003 compliance requires that x86
            systems no longer configure PCI devices from the system BIOS,
            leaving this task solely to the OS.  &os; must gain the ability to
            manage and allocate PCI memory resources on its own.  Implementing
            this should take into account cardbus, PCI-HotPlug, and laptop
            docking-station requirements.  This feature will become increasingly
            critical through the lifetime of &t.releng.5;, and therefore is a
            requirement for the &t.releng.5; branch.</para>
        </listitem>
      </itemizedlist>
    </sect2>

    <sect2 id="performance">
    <title>Performance</title>
      <para>Performance hinges on the progress of SMPng infrastructure in
        the following areas:</para>

      <itemizedlist>
        <listitem>
          <para>Storage:  The GEOM block layer allows storage drivers to
            run without Giant.  All drivers that interface directly with
            GEOM (as opposed to sitting underneath CAM or another middleware)
            must be locked and free of Giant in both their strategy and
            completion paths.  Their interrupt handlers must also run free
            of Giant.</para>
        </listitem>

        <listitem>
          <para>Network:  The layers in the IPv4 path below the socket layer
            must be locked and free of Giant. This includes the protocol,
            routing, bridging, filtering, and hardware layers. Allowances must
            be made for protocols that are not locked, especially IPv6.
            Testing must also be performed to ensure stability, correctness,
            and performance.</para>
        </listitem>

        <listitem>
          <para>Interrupt and context switching:  As discussed above, interrupt
            latency and context switching have a severe impact of performance.
            Context switching for ithreads and kthreads must be improved.
            New interrupt handling models that allow for faster and
            more flexible handling of both traditional and MSI interrupts must
            be investigated and implemented.</para>
        </listitem>
      </itemizedlist>
    </sect2>

    <sect2 id="benchmarks">
    <title>Benchmarks and performance testing</title>
      <para>Having a source of reliable and useful benchmarks is essential
        to identifying performance problems and guarding against performance
        regressions.  A <quote>performance team</quote> that is made up of
        people and resources for formulating, developing, and executing
        benchmark tests should be put into place soon.  Comparisons should
        be made against both &os; 4.<replaceable>X</replaceable> and Linux
        2.4/2.6.  Tests to consider are:</para>

      <itemizedlist>
        <listitem>
          <para>the classic <quote>worldstone</quote></para>
        </listitem>

        <listitem>
          <para>webstone: <filename role="package">www/webstone</filename></para>
        </listitem>

        <listitem>
          <para>Fstress: <ulink url="http://www.cs.duke.edu/ari/fstress/"></ulink></para>
        </listitem>

        <listitem>
          <para>ApacheBench: <filename role="package">www/p5-ApacheBench</filename></para>
        </listitem>

        <listitem>
          <para>netperf: <filename role="package">benchmarks/netperf</filename></para>
        </listitem>

        <listitem>
          <para>Web Polygraph: <ulink url="http://www.web-polygraph.org/"></ulink>
            Note: does not compile with gcc 3.x yet.</para>
        </listitem>
      </itemizedlist>
    </sect2>

    <sect2 id="features">
    <title>Features:</title>

      <itemizedlist>
        <listitem>
          <para>NEWCARD/OLDCARD: The NEWCARD subsystem was made the default
            for &os;&nbsp;5.0.  Unfortunately, it does not include support for
            non-Cardbus bridges and falls victim to interrupt routing
            problems on some laptops.  The classic 16-bit bridge support,
            OLDCARD, still exists and can be compiled in, but this is highly
            inconvenient for users of older laptops.  If OLDCARD cannot be
            completely deprecated for &t.releng.5;, then provisions must be made
            to allow users to easily install an OLDCARD-enabled kernel.
            Documentation should be written to help transition users from
            OLDCARD to NEWCARD and from &man.pccardd.8; to
            &man.devd.8;.  The power management and
            <quote>dumpcis</quote> functionality of &man.pccardc.8; needs to be
            brought forward to work with NEWCARD, along with the ability to
            load CIS quirk entries.  Most of this functionality can be
            integrated into &man.devd.8; and
            &man.devctl.4;.</para>
        </listitem>

        <listitem>
          <para>New scheduler framework: The new scheduler framework is in
            place, and users can select between the classic 4BSD scheduler
            and the new ULE scheduler.  A scheduler that demonstrates
            processor affinity, HyperThreading and KSE awareness, and no
            regressions in performance or interactivity characteristics must
            be available for &t.releng.5;.</para>
        </listitem>

        <listitem>
           <para>GDB: GDB in the base system must work for sparc64, and
             must also understand KSE thread semantics.  GDB 5.3 is available
             and is reported to address the sparc64 issues.</para>
        </listitem>

      </itemizedlist>
    </sect2>

    <sect2 id="documentation">
    <title>Documentation:</title>

      <itemizedlist>
        <listitem>
          <para>The manual pages, Handbook, and FAQ should be free from
            content specific to &os; 4.<replaceable>X</replaceable>, i.e. all
            text should be equally applicable to &os;
            5.<replaceable>X</replaceable>.  The installation section of the
            handbook needs the most work in this area.</para>
        </listitem>

        <listitem>
          <para>The release documentation needs to be complete and accurate
            for all Tier-1 architectures.  The hardware notes and
            installation guides need specific attention.</para>
        </listitem>
      </itemizedlist>
    </sect2>
  </sect1>

  <sect1 id="future">
    <title>Post &t.releng.5; direction</title>

    <para>The focus should be bug fixes and incremental improvements, as with
      all the -STABLE development branches.  Following the usual procedure,
      everything should be vetted through the &t.releng.head; branch first and
      committed to &t.releng.5; with caution.  New device drivers, incremental
      features, etc, will be welcome in the branch once they have been tested
      in &t.releng.head; and found stable enough.</para>

    <para>Further SMPng lockdowns will be divided into two categories: driver
      and subsystem.  The only subsystem that will be sufficiently locked
      down for &t.releng.5; will be GEOM, so incrementally locking down device
      drivers under it is a worthy goal for the branch.  Full subsystem
      lockdowns will have to be fully tested and proven in &t.releng.head; before
      consideration will be given to merging them into &t.releng.5;.</para>
  </sect1>
</article>