Add an initial web page for the netperf project, talking a bit about

the approaches that are being taken in the network performance work for 5.x/6.x.
svn path=/www/; revision=22923
2004-11-11 22:21:42 +00:00 · 2004-11-11 22:21:42 +00:00 · 9d2a62b399 · 2020-12-08 03:00:23 +00:00
commit 9d2a62b399
parent 27a7295915
3 changed files with 251 additions and 0 deletions
--- a/en/projects/netperf/Makefile
+++ b/en/projects/netperf/Makefile
@ -0,0 +1,17 @@
+# Summary for busdma project status
+#
+# $FreeBSD: www/en/projects/busdma/Makefile,v 1.1 2002/12/09 21:36:29 rwatson Exp $
+
+MAINTAINER=	rwatson
+
+.if exists(../Makefile.conf)
+.include "../Makefile.conf"
+.endif
+.if exists(../Makefile.inc)
+.include "../Makefile.inc"
+.endif
+
+DOCS=	index.sgml
+DATA=	style.css
+
+.include "${WEB_PREFIX}/share/mk/web.site.mk"
--- a/en/projects/netperf/index.sgml
+++ b/en/projects/netperf/index.sgml
@ -0,0 +1,196 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" [
+<!ENTITY base CDATA "../..">
+<!ENTITY date "$FreeBSD$">
+<!ENTITY title "FreeBSD Network Performance Project (netperf)">
+<!ENTITY email 'mux'>
+<!ENTITY % includes SYSTEM "../../includes.sgml"> %includes;
+
+<!ENTITY status.na "<font color=green>N/A</font>">
+<!ENTITY status.done "<font color=green>Done</font>">
+<!ENTITY status.wip "<font color=blue>In progress</font>">
+<!ENTITY status.untested "<font color=yellow>Needs testing</font>">
+<!ENTITY status.new "<font color=red>New task</font>">
+<!ENTITY status.unknown "<font color=red>Unknown</font>">
+
+<!ENTITY % developers SYSTEM "../../developers.sgml"> %developers;
+
+]>
+
+<html>
+  &header;
+
+    <h2>Contents</h2>
+    <ul>
+      <li><a href="#goal">Project Goal</a></li>
+      <li><a href="#strategies">Project Strategies</a></li>
+      <li><a href="#tasks">Project Tasks</a></li>
+      <li><a href="#links">Links</a></li>
+    </ul>
+
+    <a name="goal"></a>
+    <h2>Project Goal</h2>
+
+    <p>The netperf project is working to enhance the performance of the
+      FreeBSD network stack.  This work grew out of the
+      <a href="../smp">SMPng Project</a>, which moved the FreeBSD kernel from
+      a "Giant Lock" to more fine-grained locking and multi-threading.  SMPng
+      offered both performance improvement and degradation for the network
+      stack, improving parallelism and preemption, but substantially
+      increasing per-packet processing costs.  The netperf project is
+      primarily focussed on further improving parallelism in network
+      processing, while reducing the SMP synchronization overhead.  This in
+      turn will lead to higher processing throughput and lower processing
+      latency.</p>
+
+    <a name="strategies"></a>
+    <h2>Project Strategies</h2>
+    <p>Robert Watson</p>
+
+    <p>The two primary focuses of this work are to increase parallelism
+      while decreasing overhead.  Several activities are being performed that
+      will work towards these goals:</p>
+
+    <ul>
+      <li><p>Complete locking work to make sure all components of the stack
+	are able to run without the Giant lock.  While most of the network
+	stack, especially mainstream protocols, runs without Giant, some
+	components require Giant to be placed back over the stack if compiled
+	into the kernel, reducing parallelism.</p></li>
+
+      <li><p>Optimize locking strategies to find better balances between
+	locking granularity and locking overhead.  In the first cut locking
+	work on the kernel, the goal was to adopt a medium-grained locking
+	approach based on data locking.  This approach identifies critical
+	data structures, and inserts new locks and locking operations to
+	protect those data structures.  Depending on the data model of the
+	code being protected, this may lead to the introduction of a
+	substantial number of locks offering unnecessary granularity, where
+	the overhead of locking overwhelms the benefits of available
+	parallelism and preemption.  By selectively reducing granularity, it
+	is possible to improve performance by decreasing locking overhead.
+	</p></li>
+
+      <li><p>Amortize the cost of locking by processing queues of packets or
+	events.  While the cost of individual synchronization operations may
+	be high, it is possible to amortize the cost of synchronization
+	operations by grouping processing of similar data (packets, events)
+	under the same protection.  This approach focuses on identifying
+	places where similar locking occurs frequently in succession, and
+	introducing queueing or coalescing of lock operations across the
+	body of the work.  For example, when a series of packets is inserted
+	into an outgoing interface queue, a basic locking approach would
+	lock the queue for each insert operation, unlock it, and hand off to
+	the interface driver to begin the send, repeating this sequence as
+	required.  With a coalesced approach, the caller would pass off a
+	queue of packets in order to reduce the locking overhead, as well as
+	eliminate unnecessary synchronization due to the queue being
+	thread-local.  This approach can be applied at several levels in the
+	stack, and is particularly applicable at lower levels of the stack
+	where streams of packets require almost identical processing.
+	</p></li>
+
+      <li><p>Introduce new synchronization strategies with reduced overhead
+	relative to traditional strategies.  Most traditional strategies
+	employ a combination of interrupt disabling and atomic operations to
+	achieve mutual exclusion and non-preemption guarantees.  However,
+	these operations are expensive on modern CPUs, leading to the desire
+	for cheaper primitives with weaker semantics.  For example, the
+	application of uni-processor primitives where synchronization is
+	required only on a single processor, and optimizations to critical
+	section primitives to avoid the need for interrupt disabling.
+	</p></li>
+
+      <li><p>Modify synchronization strategies to take advantage of
+	additional, non-locking, synchronization primitives.  This approach
+	might take the form of making increased use of per-CPU or per-thread
+	data structures, which require little or no synchronization.  For
+	example, through the use of critical sections, it is possible to
+	synchronize access to per-CPU caches and queues.  Through the use of
+	per-thread queues, data can be handed off between stack layers
+	without the use of synchronization.</p></li>
+
+      <li><p>Increase the opportunities for parallelism through increased
+	threading in the network stack.  The current network stack model
+	offers the opportunity for substantial parallelism, with outbound
+	processing typically taking place in the context of the sending
+	thread in kernel, crypto occuring in crypto worker threads, and
+	receive processing taking place in a combination of the receiving
+	ithread and dispatched netisr thread.  While handoffs between
+	threads introduces overhead (synchronization, context switching),
+	there is the opportunity to increase parallelism in some workloads
+	through introducing additional worker threads.  Identifying work
+	that may be relocated to new threads must be done carefully to
+	balance overhead, and latency concerns, but can pay off by
+	increasing effective CPU utilization and hence throughput.  For
+	example, introducing additional netisr threads capable of running on
+	more than one CPU at a time can increase input parallelism, subject
+	to maintaining desirable packet ordering.</p></li>
+    </ul>
+
+    <a name="tasks"></a>
+    <h2>Project Tasks</h2>
+
+    <table border=3>
+      <tr>
+	<th> Task </th>
+	<th> Responsible </th>
+	<th> Last updated </th>
+	<th> Status </th>
+	<th> Notes </th>
+      </tr>
+
+      <tr>
+	<td> Mbuf queue library </td>
+	<td> &a.rwatson; </td>
+	<td> 20041106 </td>
+	<td> &status.wip; </td>
+	<td> In order to facilitate passing off queues of packets between
+	  network stack components, create an mbuf queue primitive, struct
+	  mbufqueue.  The initial implementation is complete, and the
+	  primitive is now being applied in several sample cases to determine
+	  whether it offers the desired semantics and benefits.  The
+	  implementation can be found in the rwatson_dispatch Perforce
+	  branch.</td>
+      </tr>
+
+      <tr>
+	<td> Employ queued dispatch in interface send API </td>
+	<td> &a.rwatson; </td>
+	<td> 20041106 </td>
+	<td> &status.wip; </td>
+	<td> An experimental if_start_mbufqueue() interface to struct ifnet
+	  has been added, which passes an mbuf queue to the device driver for
+	  processing, avoiding redundant synchronization against the
+	  interface queue, even in the event that additional queueing is
+	  required.  This has not yet been benchmarked. </td>
+      </tr>
+
+      <tr>
+	<td> Employ queued dispatch in the interface receive API </td>
+	<td> &a.rwatson; </td>
+	<td> 20041106 </td>
+	<td> &status.new; </td>
+	<td> Similar to if_start_mbufqueue, allow input of a queue of mbufs
+	  from the device driver into the lowest protocol layers, such as
+	  ether_input_mbufqueue. </td>
+      </tr>
+
+    </table>
+
+    <a name="links"></a>
+    <h2>Links</h2>
+
+    <p>Some useful links relating to the netperf work:</p>
+
+    <ul>
+      <li><p><a href="../smp/">SMPng Project</a> -- Project to introduce
+	finer grained locking in the FreeBSD kernel.</p></li>
+
+      <li><p><a href="http://www.watson.org/~robert/freebsd/netperf">Robert
+	Watson's netperf web page</a> -- Web page that includes a change log
+	and performance measurement/debugging information.</p></li>
+    </ul>
+
+  &footer;
+  </body>
+</html>
--- a/en/projects/netperf/style.css
+++ b/en/projects/netperf/style.css
@ -0,0 +1,38 @@
+BODY {
+}
+
+BODY TD {
+	font-size: 13px;
+}
+
+BODY SMALL {
+	width: 615px;
+	font-size: 11px;
+}
+
+.heading {
+	font-size: 15px;
+	background-color: #cbd2ec;
+}
+
+.section {
+	font-size: 15px;
+	font-weight: bold;
+	background-color: #e7e9f7;
+}
+
+.notes {
+	font-size: 13px;
+	font-weight: normal;
+}
+
+.main {
+	width: 615px;
+	height: auto;
+	text-align: justify;
+}
+
+.list {
+	width: 550px;
+	height: auto;
+}