Add an initial web page for the netperf project, talking a bit about

the approaches that are being taken in the network performance work for 5.x/6.x.
svn path=/www/; revision=22923
2004-11-11 22:21:42 +00:00 · 2004-11-11 22:21:42 +00:00 · 9d2a62b399 · 2020-12-08 03:00:23 +00:00
commit 9d2a62b399
parent 27a7295915
3 changed files with 251 additions and 0 deletions
--- a/en/projects/netperf/Makefile
+++ b/en/projects/netperf/Makefile
@ -0,0 +1,17 @@
 # Summary for busdma project status
 #
 # $FreeBSD: www/en/projects/busdma/Makefile,v 1.1 2002/12/09 21:36:29 rwatson Exp $
 MAINTAINER=	rwatson
 .if exists(../Makefile.conf)
 .include "../Makefile.conf"
 .endif
 .if exists(../Makefile.inc)
 .include "../Makefile.inc"
 .endif
 DOCS=	index.sgml
 DATA=	style.css
 .include "${WEB_PREFIX}/share/mk/web.site.mk"
--- a/en/projects/netperf/index.sgml
+++ b/en/projects/netperf/index.sgml
@ -0,0 +1,196 @@
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" [
 <!ENTITY base CDATA "../..">
 <!ENTITY date "$FreeBSD$">
 <!ENTITY title "FreeBSD Network Performance Project (netperf)">
 <!ENTITY email 'mux'>
 <!ENTITY % includes SYSTEM "../../includes.sgml"> %includes;
 <!ENTITY status.na "<font color=green>N/A</font>">
 <!ENTITY status.done "<font color=green>Done</font>">
 <!ENTITY status.wip "<font color=blue>In progress</font>">
 <!ENTITY status.untested "<font color=yellow>Needs testing</font>">
 <!ENTITY status.new "<font color=red>New task</font>">
 <!ENTITY status.unknown "<font color=red>Unknown</font>">
 <!ENTITY % developers SYSTEM "../../developers.sgml"> %developers;
 ]>
 <html>
  &header;
    <h2>Contents</h2>
    <ul>
      <li><a href="#goal">Project Goal</a></li>
      <li><a href="#strategies">Project Strategies</a></li>
      <li><a href="#tasks">Project Tasks</a></li>
      <li><a href="#links">Links</a></li>
    </ul>
    <a name="goal"></a>
    <h2>Project Goal</h2>
    <p>The netperf project is working to enhance the performance of the
      FreeBSD network stack.  This work grew out of the
      <a href="../smp">SMPng Project</a>, which moved the FreeBSD kernel from
      a "Giant Lock" to more fine-grained locking and multi-threading.  SMPng
      offered both performance improvement and degradation for the network
      stack, improving parallelism and preemption, but substantially
      increasing per-packet processing costs.  The netperf project is
      primarily focussed on further improving parallelism in network
      processing, while reducing the SMP synchronization overhead.  This in
      turn will lead to higher processing throughput and lower processing
      latency.</p>
    <a name="strategies"></a>
    <h2>Project Strategies</h2>
    <p>Robert Watson</p>
    <p>The two primary focuses of this work are to increase parallelism
      while decreasing overhead.  Several activities are being performed that
      will work towards these goals:</p>
    <ul>
      <li><p>Complete locking work to make sure all components of the stack
 	are able to run without the Giant lock.  While most of the network
 	stack, especially mainstream protocols, runs without Giant, some
 	components require Giant to be placed back over the stack if compiled
 	into the kernel, reducing parallelism.</p></li>
      <li><p>Optimize locking strategies to find better balances between
 	locking granularity and locking overhead.  In the first cut locking
 	work on the kernel, the goal was to adopt a medium-grained locking
 	approach based on data locking.  This approach identifies critical
 	data structures, and inserts new locks and locking operations to
 	protect those data structures.  Depending on the data model of the
 	code being protected, this may lead to the introduction of a
 	substantial number of locks offering unnecessary granularity, where
 	the overhead of locking overwhelms the benefits of available
 	parallelism and preemption.  By selectively reducing granularity, it
 	is possible to improve performance by decreasing locking overhead.
 	</p></li>
      <li><p>Amortize the cost of locking by processing queues of packets or
 	events.  While the cost of individual synchronization operations may
 	be high, it is possible to amortize the cost of synchronization
 	operations by grouping processing of similar data (packets, events)
 	under the same protection.  This approach focuses on identifying
 	places where similar locking occurs frequently in succession, and
 	introducing queueing or coalescing of lock operations across the
 	body of the work.  For example, when a series of packets is inserted
 	into an outgoing interface queue, a basic locking approach would
 	lock the queue for each insert operation, unlock it, and hand off to
 	the interface driver to begin the send, repeating this sequence as
 	required.  With a coalesced approach, the caller would pass off a
 	queue of packets in order to reduce the locking overhead, as well as
 	eliminate unnecessary synchronization due to the queue being
 	thread-local.  This approach can be applied at several levels in the
 	stack, and is particularly applicable at lower levels of the stack
 	where streams of packets require almost identical processing.
 	</p></li>
      <li><p>Introduce new synchronization strategies with reduced overhead
 	relative to traditional strategies.  Most traditional strategies
 	employ a combination of interrupt disabling and atomic operations to
 	achieve mutual exclusion and non-preemption guarantees.  However,
 	these operations are expensive on modern CPUs, leading to the desire
 	for cheaper primitives with weaker semantics.  For example, the
 	application of uni-processor primitives where synchronization is
 	required only on a single processor, and optimizations to critical
 	section primitives to avoid the need for interrupt disabling.
 	</p></li>
      <li><p>Modify synchronization strategies to take advantage of
 	additional, non-locking, synchronization primitives.  This approach
 	might take the form of making increased use of per-CPU or per-thread
 	data structures, which require little or no synchronization.  For
 	example, through the use of critical sections, it is possible to
 	synchronize access to per-CPU caches and queues.  Through the use of
 	per-thread queues, data can be handed off between stack layers
 	without the use of synchronization.</p></li>
      <li><p>Increase the opportunities for parallelism through increased
 	threading in the network stack.  The current network stack model
 	offers the opportunity for substantial parallelism, with outbound
 	processing typically taking place in the context of the sending
 	thread in kernel, crypto occuring in crypto worker threads, and
 	receive processing taking place in a combination of the receiving
 	ithread and dispatched netisr thread.  While handoffs between
 	threads introduces overhead (synchronization, context switching),
 	there is the opportunity to increase parallelism in some workloads
 	through introducing additional worker threads.  Identifying work
 	that may be relocated to new threads must be done carefully to
 	balance overhead, and latency concerns, but can pay off by
 	increasing effective CPU utilization and hence throughput.  For
 	example, introducing additional netisr threads capable of running on
 	more than one CPU at a time can increase input parallelism, subject
 	to maintaining desirable packet ordering.</p></li>
    </ul>
    <a name="tasks"></a>
    <h2>Project Tasks</h2>
    <table border=3>
      <tr>
 	<th> Task </th>
 	<th> Responsible </th>
 	<th> Last updated </th>
 	<th> Status </th>
 	<th> Notes </th>
      </tr>
      <tr>
 	<td> Mbuf queue library </td>
 	<td> &a.rwatson; </td>
 	<td> 20041106 </td>
 	<td> &status.wip; </td>
 	<td> In order to facilitate passing off queues of packets between
 	  network stack components, create an mbuf queue primitive, struct
 	  mbufqueue.  The initial implementation is complete, and the
 	  primitive is now being applied in several sample cases to determine
 	  whether it offers the desired semantics and benefits.  The
 	  implementation can be found in the rwatson_dispatch Perforce
 	  branch.</td>
      </tr>
      <tr>
 	<td> Employ queued dispatch in interface send API </td>
 	<td> &a.rwatson; </td>
 	<td> 20041106 </td>
 	<td> &status.wip; </td>
 	<td> An experimental if_start_mbufqueue() interface to struct ifnet
 	  has been added, which passes an mbuf queue to the device driver for
 	  processing, avoiding redundant synchronization against the
 	  interface queue, even in the event that additional queueing is
 	  required.  This has not yet been benchmarked. </td>
      </tr>
      <tr>
 	<td> Employ queued dispatch in the interface receive API </td>
 	<td> &a.rwatson; </td>
 	<td> 20041106 </td>
 	<td> &status.new; </td>
 	<td> Similar to if_start_mbufqueue, allow input of a queue of mbufs
 	  from the device driver into the lowest protocol layers, such as
 	  ether_input_mbufqueue. </td>
      </tr>
    </table>
    <a name="links"></a>
    <h2>Links</h2>
    <p>Some useful links relating to the netperf work:</p>
    <ul>
      <li><p><a href="../smp/">SMPng Project</a> -- Project to introduce
 	finer grained locking in the FreeBSD kernel.</p></li>
      <li><p><a href="http://www.watson.org/~robert/freebsd/netperf">Robert
 	Watson's netperf web page</a> -- Web page that includes a change log
 	and performance measurement/debugging information.</p></li>
    </ul>
  &footer;
  </body>
 </html>
--- a/en/projects/netperf/style.css
+++ b/en/projects/netperf/style.css
@ -0,0 +1,38 @@
 BODY {
 }
 BODY TD {
 	font-size: 13px;
 }
 BODY SMALL {
 	width: 615px;
 	font-size: 11px;
 }
 .heading {
 	font-size: 15px;
 	background-color: #cbd2ec;
 }
 .section {
 	font-size: 15px;
 	font-weight: bold;
 	background-color: #e7e9f7;
 }
 .notes {
 	font-size: 13px;
 	font-weight: normal;
 }
 .main {
 	width: 615px;
 	height: auto;
 	text-align: justify;
 }
 .list {
 	width: 550px;
 	height: auto;
 }