Add an initial web page for the netperf project, talking a bit about
the approaches that are being taken in the network performance work for 5.x/6.x.
This commit is contained in:
parent
27a7295915
commit
9d2a62b399
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/www/; revision=22923
3 changed files with 251 additions and 0 deletions
17
en/projects/netperf/Makefile
Normal file
17
en/projects/netperf/Makefile
Normal file
|
@ -0,0 +1,17 @@
|
|||
# Summary for busdma project status
|
||||
#
|
||||
# $FreeBSD: www/en/projects/busdma/Makefile,v 1.1 2002/12/09 21:36:29 rwatson Exp $
|
||||
|
||||
MAINTAINER= rwatson
|
||||
|
||||
.if exists(../Makefile.conf)
|
||||
.include "../Makefile.conf"
|
||||
.endif
|
||||
.if exists(../Makefile.inc)
|
||||
.include "../Makefile.inc"
|
||||
.endif
|
||||
|
||||
DOCS= index.sgml
|
||||
DATA= style.css
|
||||
|
||||
.include "${WEB_PREFIX}/share/mk/web.site.mk"
|
196
en/projects/netperf/index.sgml
Normal file
196
en/projects/netperf/index.sgml
Normal file
|
@ -0,0 +1,196 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" [
|
||||
<!ENTITY base CDATA "../..">
|
||||
<!ENTITY date "$FreeBSD$">
|
||||
<!ENTITY title "FreeBSD Network Performance Project (netperf)">
|
||||
<!ENTITY email 'mux'>
|
||||
<!ENTITY % includes SYSTEM "../../includes.sgml"> %includes;
|
||||
|
||||
<!ENTITY status.na "<font color=green>N/A</font>">
|
||||
<!ENTITY status.done "<font color=green>Done</font>">
|
||||
<!ENTITY status.wip "<font color=blue>In progress</font>">
|
||||
<!ENTITY status.untested "<font color=yellow>Needs testing</font>">
|
||||
<!ENTITY status.new "<font color=red>New task</font>">
|
||||
<!ENTITY status.unknown "<font color=red>Unknown</font>">
|
||||
|
||||
<!ENTITY % developers SYSTEM "../../developers.sgml"> %developers;
|
||||
|
||||
]>
|
||||
|
||||
<html>
|
||||
&header;
|
||||
|
||||
<h2>Contents</h2>
|
||||
<ul>
|
||||
<li><a href="#goal">Project Goal</a></li>
|
||||
<li><a href="#strategies">Project Strategies</a></li>
|
||||
<li><a href="#tasks">Project Tasks</a></li>
|
||||
<li><a href="#links">Links</a></li>
|
||||
</ul>
|
||||
|
||||
<a name="goal"></a>
|
||||
<h2>Project Goal</h2>
|
||||
|
||||
<p>The netperf project is working to enhance the performance of the
|
||||
FreeBSD network stack. This work grew out of the
|
||||
<a href="../smp">SMPng Project</a>, which moved the FreeBSD kernel from
|
||||
a "Giant Lock" to more fine-grained locking and multi-threading. SMPng
|
||||
offered both performance improvement and degradation for the network
|
||||
stack, improving parallelism and preemption, but substantially
|
||||
increasing per-packet processing costs. The netperf project is
|
||||
primarily focussed on further improving parallelism in network
|
||||
processing, while reducing the SMP synchronization overhead. This in
|
||||
turn will lead to higher processing throughput and lower processing
|
||||
latency.</p>
|
||||
|
||||
<a name="strategies"></a>
|
||||
<h2>Project Strategies</h2>
|
||||
<p>Robert Watson</p>
|
||||
|
||||
<p>The two primary focuses of this work are to increase parallelism
|
||||
while decreasing overhead. Several activities are being performed that
|
||||
will work towards these goals:</p>
|
||||
|
||||
<ul>
|
||||
<li><p>Complete locking work to make sure all components of the stack
|
||||
are able to run without the Giant lock. While most of the network
|
||||
stack, especially mainstream protocols, runs without Giant, some
|
||||
components require Giant to be placed back over the stack if compiled
|
||||
into the kernel, reducing parallelism.</p></li>
|
||||
|
||||
<li><p>Optimize locking strategies to find better balances between
|
||||
locking granularity and locking overhead. In the first cut locking
|
||||
work on the kernel, the goal was to adopt a medium-grained locking
|
||||
approach based on data locking. This approach identifies critical
|
||||
data structures, and inserts new locks and locking operations to
|
||||
protect those data structures. Depending on the data model of the
|
||||
code being protected, this may lead to the introduction of a
|
||||
substantial number of locks offering unnecessary granularity, where
|
||||
the overhead of locking overwhelms the benefits of available
|
||||
parallelism and preemption. By selectively reducing granularity, it
|
||||
is possible to improve performance by decreasing locking overhead.
|
||||
</p></li>
|
||||
|
||||
<li><p>Amortize the cost of locking by processing queues of packets or
|
||||
events. While the cost of individual synchronization operations may
|
||||
be high, it is possible to amortize the cost of synchronization
|
||||
operations by grouping processing of similar data (packets, events)
|
||||
under the same protection. This approach focuses on identifying
|
||||
places where similar locking occurs frequently in succession, and
|
||||
introducing queueing or coalescing of lock operations across the
|
||||
body of the work. For example, when a series of packets is inserted
|
||||
into an outgoing interface queue, a basic locking approach would
|
||||
lock the queue for each insert operation, unlock it, and hand off to
|
||||
the interface driver to begin the send, repeating this sequence as
|
||||
required. With a coalesced approach, the caller would pass off a
|
||||
queue of packets in order to reduce the locking overhead, as well as
|
||||
eliminate unnecessary synchronization due to the queue being
|
||||
thread-local. This approach can be applied at several levels in the
|
||||
stack, and is particularly applicable at lower levels of the stack
|
||||
where streams of packets require almost identical processing.
|
||||
</p></li>
|
||||
|
||||
<li><p>Introduce new synchronization strategies with reduced overhead
|
||||
relative to traditional strategies. Most traditional strategies
|
||||
employ a combination of interrupt disabling and atomic operations to
|
||||
achieve mutual exclusion and non-preemption guarantees. However,
|
||||
these operations are expensive on modern CPUs, leading to the desire
|
||||
for cheaper primitives with weaker semantics. For example, the
|
||||
application of uni-processor primitives where synchronization is
|
||||
required only on a single processor, and optimizations to critical
|
||||
section primitives to avoid the need for interrupt disabling.
|
||||
</p></li>
|
||||
|
||||
<li><p>Modify synchronization strategies to take advantage of
|
||||
additional, non-locking, synchronization primitives. This approach
|
||||
might take the form of making increased use of per-CPU or per-thread
|
||||
data structures, which require little or no synchronization. For
|
||||
example, through the use of critical sections, it is possible to
|
||||
synchronize access to per-CPU caches and queues. Through the use of
|
||||
per-thread queues, data can be handed off between stack layers
|
||||
without the use of synchronization.</p></li>
|
||||
|
||||
<li><p>Increase the opportunities for parallelism through increased
|
||||
threading in the network stack. The current network stack model
|
||||
offers the opportunity for substantial parallelism, with outbound
|
||||
processing typically taking place in the context of the sending
|
||||
thread in kernel, crypto occuring in crypto worker threads, and
|
||||
receive processing taking place in a combination of the receiving
|
||||
ithread and dispatched netisr thread. While handoffs between
|
||||
threads introduces overhead (synchronization, context switching),
|
||||
there is the opportunity to increase parallelism in some workloads
|
||||
through introducing additional worker threads. Identifying work
|
||||
that may be relocated to new threads must be done carefully to
|
||||
balance overhead, and latency concerns, but can pay off by
|
||||
increasing effective CPU utilization and hence throughput. For
|
||||
example, introducing additional netisr threads capable of running on
|
||||
more than one CPU at a time can increase input parallelism, subject
|
||||
to maintaining desirable packet ordering.</p></li>
|
||||
</ul>
|
||||
|
||||
<a name="tasks"></a>
|
||||
<h2>Project Tasks</h2>
|
||||
|
||||
<table border=3>
|
||||
<tr>
|
||||
<th> Task </th>
|
||||
<th> Responsible </th>
|
||||
<th> Last updated </th>
|
||||
<th> Status </th>
|
||||
<th> Notes </th>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td> Mbuf queue library </td>
|
||||
<td> &a.rwatson; </td>
|
||||
<td> 20041106 </td>
|
||||
<td> &status.wip; </td>
|
||||
<td> In order to facilitate passing off queues of packets between
|
||||
network stack components, create an mbuf queue primitive, struct
|
||||
mbufqueue. The initial implementation is complete, and the
|
||||
primitive is now being applied in several sample cases to determine
|
||||
whether it offers the desired semantics and benefits. The
|
||||
implementation can be found in the rwatson_dispatch Perforce
|
||||
branch.</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td> Employ queued dispatch in interface send API </td>
|
||||
<td> &a.rwatson; </td>
|
||||
<td> 20041106 </td>
|
||||
<td> &status.wip; </td>
|
||||
<td> An experimental if_start_mbufqueue() interface to struct ifnet
|
||||
has been added, which passes an mbuf queue to the device driver for
|
||||
processing, avoiding redundant synchronization against the
|
||||
interface queue, even in the event that additional queueing is
|
||||
required. This has not yet been benchmarked. </td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td> Employ queued dispatch in the interface receive API </td>
|
||||
<td> &a.rwatson; </td>
|
||||
<td> 20041106 </td>
|
||||
<td> &status.new; </td>
|
||||
<td> Similar to if_start_mbufqueue, allow input of a queue of mbufs
|
||||
from the device driver into the lowest protocol layers, such as
|
||||
ether_input_mbufqueue. </td>
|
||||
</tr>
|
||||
|
||||
</table>
|
||||
|
||||
<a name="links"></a>
|
||||
<h2>Links</h2>
|
||||
|
||||
<p>Some useful links relating to the netperf work:</p>
|
||||
|
||||
<ul>
|
||||
<li><p><a href="../smp/">SMPng Project</a> -- Project to introduce
|
||||
finer grained locking in the FreeBSD kernel.</p></li>
|
||||
|
||||
<li><p><a href="http://www.watson.org/~robert/freebsd/netperf">Robert
|
||||
Watson's netperf web page</a> -- Web page that includes a change log
|
||||
and performance measurement/debugging information.</p></li>
|
||||
</ul>
|
||||
|
||||
&footer;
|
||||
</body>
|
||||
</html>
|
38
en/projects/netperf/style.css
Normal file
38
en/projects/netperf/style.css
Normal file
|
@ -0,0 +1,38 @@
|
|||
BODY {
|
||||
}
|
||||
|
||||
BODY TD {
|
||||
font-size: 13px;
|
||||
}
|
||||
|
||||
BODY SMALL {
|
||||
width: 615px;
|
||||
font-size: 11px;
|
||||
}
|
||||
|
||||
.heading {
|
||||
font-size: 15px;
|
||||
background-color: #cbd2ec;
|
||||
}
|
||||
|
||||
.section {
|
||||
font-size: 15px;
|
||||
font-weight: bold;
|
||||
background-color: #e7e9f7;
|
||||
}
|
||||
|
||||
.notes {
|
||||
font-size: 13px;
|
||||
font-weight: normal;
|
||||
}
|
||||
|
||||
.main {
|
||||
width: 615px;
|
||||
height: auto;
|
||||
text-align: justify;
|
||||
}
|
||||
|
||||
.list {
|
||||
width: 550px;
|
||||
height: auto;
|
||||
}
|
Loading…
Reference in a new issue