Add an initial web page for the netperf project, talking a bit about
the approaches that are being taken in the network performance work for 5.x/6.x.
This commit is contained in:
parent
27a7295915
commit
9d2a62b399
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/www/; revision=22923
3 changed files with 251 additions and 0 deletions
17
en/projects/netperf/Makefile
Normal file
17
en/projects/netperf/Makefile
Normal file
|
@ -0,0 +1,17 @@
|
||||||
|
# Summary for busdma project status
|
||||||
|
#
|
||||||
|
# $FreeBSD: www/en/projects/busdma/Makefile,v 1.1 2002/12/09 21:36:29 rwatson Exp $
|
||||||
|
|
||||||
|
MAINTAINER= rwatson
|
||||||
|
|
||||||
|
.if exists(../Makefile.conf)
|
||||||
|
.include "../Makefile.conf"
|
||||||
|
.endif
|
||||||
|
.if exists(../Makefile.inc)
|
||||||
|
.include "../Makefile.inc"
|
||||||
|
.endif
|
||||||
|
|
||||||
|
DOCS= index.sgml
|
||||||
|
DATA= style.css
|
||||||
|
|
||||||
|
.include "${WEB_PREFIX}/share/mk/web.site.mk"
|
196
en/projects/netperf/index.sgml
Normal file
196
en/projects/netperf/index.sgml
Normal file
|
@ -0,0 +1,196 @@
|
||||||
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" [
|
||||||
|
<!ENTITY base CDATA "../..">
|
||||||
|
<!ENTITY date "$FreeBSD$">
|
||||||
|
<!ENTITY title "FreeBSD Network Performance Project (netperf)">
|
||||||
|
<!ENTITY email 'mux'>
|
||||||
|
<!ENTITY % includes SYSTEM "../../includes.sgml"> %includes;
|
||||||
|
|
||||||
|
<!ENTITY status.na "<font color=green>N/A</font>">
|
||||||
|
<!ENTITY status.done "<font color=green>Done</font>">
|
||||||
|
<!ENTITY status.wip "<font color=blue>In progress</font>">
|
||||||
|
<!ENTITY status.untested "<font color=yellow>Needs testing</font>">
|
||||||
|
<!ENTITY status.new "<font color=red>New task</font>">
|
||||||
|
<!ENTITY status.unknown "<font color=red>Unknown</font>">
|
||||||
|
|
||||||
|
<!ENTITY % developers SYSTEM "../../developers.sgml"> %developers;
|
||||||
|
|
||||||
|
]>
|
||||||
|
|
||||||
|
<html>
|
||||||
|
&header;
|
||||||
|
|
||||||
|
<h2>Contents</h2>
|
||||||
|
<ul>
|
||||||
|
<li><a href="#goal">Project Goal</a></li>
|
||||||
|
<li><a href="#strategies">Project Strategies</a></li>
|
||||||
|
<li><a href="#tasks">Project Tasks</a></li>
|
||||||
|
<li><a href="#links">Links</a></li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<a name="goal"></a>
|
||||||
|
<h2>Project Goal</h2>
|
||||||
|
|
||||||
|
<p>The netperf project is working to enhance the performance of the
|
||||||
|
FreeBSD network stack. This work grew out of the
|
||||||
|
<a href="../smp">SMPng Project</a>, which moved the FreeBSD kernel from
|
||||||
|
a "Giant Lock" to more fine-grained locking and multi-threading. SMPng
|
||||||
|
offered both performance improvement and degradation for the network
|
||||||
|
stack, improving parallelism and preemption, but substantially
|
||||||
|
increasing per-packet processing costs. The netperf project is
|
||||||
|
primarily focussed on further improving parallelism in network
|
||||||
|
processing, while reducing the SMP synchronization overhead. This in
|
||||||
|
turn will lead to higher processing throughput and lower processing
|
||||||
|
latency.</p>
|
||||||
|
|
||||||
|
<a name="strategies"></a>
|
||||||
|
<h2>Project Strategies</h2>
|
||||||
|
<p>Robert Watson</p>
|
||||||
|
|
||||||
|
<p>The two primary focuses of this work are to increase parallelism
|
||||||
|
while decreasing overhead. Several activities are being performed that
|
||||||
|
will work towards these goals:</p>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li><p>Complete locking work to make sure all components of the stack
|
||||||
|
are able to run without the Giant lock. While most of the network
|
||||||
|
stack, especially mainstream protocols, runs without Giant, some
|
||||||
|
components require Giant to be placed back over the stack if compiled
|
||||||
|
into the kernel, reducing parallelism.</p></li>
|
||||||
|
|
||||||
|
<li><p>Optimize locking strategies to find better balances between
|
||||||
|
locking granularity and locking overhead. In the first cut locking
|
||||||
|
work on the kernel, the goal was to adopt a medium-grained locking
|
||||||
|
approach based on data locking. This approach identifies critical
|
||||||
|
data structures, and inserts new locks and locking operations to
|
||||||
|
protect those data structures. Depending on the data model of the
|
||||||
|
code being protected, this may lead to the introduction of a
|
||||||
|
substantial number of locks offering unnecessary granularity, where
|
||||||
|
the overhead of locking overwhelms the benefits of available
|
||||||
|
parallelism and preemption. By selectively reducing granularity, it
|
||||||
|
is possible to improve performance by decreasing locking overhead.
|
||||||
|
</p></li>
|
||||||
|
|
||||||
|
<li><p>Amortize the cost of locking by processing queues of packets or
|
||||||
|
events. While the cost of individual synchronization operations may
|
||||||
|
be high, it is possible to amortize the cost of synchronization
|
||||||
|
operations by grouping processing of similar data (packets, events)
|
||||||
|
under the same protection. This approach focuses on identifying
|
||||||
|
places where similar locking occurs frequently in succession, and
|
||||||
|
introducing queueing or coalescing of lock operations across the
|
||||||
|
body of the work. For example, when a series of packets is inserted
|
||||||
|
into an outgoing interface queue, a basic locking approach would
|
||||||
|
lock the queue for each insert operation, unlock it, and hand off to
|
||||||
|
the interface driver to begin the send, repeating this sequence as
|
||||||
|
required. With a coalesced approach, the caller would pass off a
|
||||||
|
queue of packets in order to reduce the locking overhead, as well as
|
||||||
|
eliminate unnecessary synchronization due to the queue being
|
||||||
|
thread-local. This approach can be applied at several levels in the
|
||||||
|
stack, and is particularly applicable at lower levels of the stack
|
||||||
|
where streams of packets require almost identical processing.
|
||||||
|
</p></li>
|
||||||
|
|
||||||
|
<li><p>Introduce new synchronization strategies with reduced overhead
|
||||||
|
relative to traditional strategies. Most traditional strategies
|
||||||
|
employ a combination of interrupt disabling and atomic operations to
|
||||||
|
achieve mutual exclusion and non-preemption guarantees. However,
|
||||||
|
these operations are expensive on modern CPUs, leading to the desire
|
||||||
|
for cheaper primitives with weaker semantics. For example, the
|
||||||
|
application of uni-processor primitives where synchronization is
|
||||||
|
required only on a single processor, and optimizations to critical
|
||||||
|
section primitives to avoid the need for interrupt disabling.
|
||||||
|
</p></li>
|
||||||
|
|
||||||
|
<li><p>Modify synchronization strategies to take advantage of
|
||||||
|
additional, non-locking, synchronization primitives. This approach
|
||||||
|
might take the form of making increased use of per-CPU or per-thread
|
||||||
|
data structures, which require little or no synchronization. For
|
||||||
|
example, through the use of critical sections, it is possible to
|
||||||
|
synchronize access to per-CPU caches and queues. Through the use of
|
||||||
|
per-thread queues, data can be handed off between stack layers
|
||||||
|
without the use of synchronization.</p></li>
|
||||||
|
|
||||||
|
<li><p>Increase the opportunities for parallelism through increased
|
||||||
|
threading in the network stack. The current network stack model
|
||||||
|
offers the opportunity for substantial parallelism, with outbound
|
||||||
|
processing typically taking place in the context of the sending
|
||||||
|
thread in kernel, crypto occuring in crypto worker threads, and
|
||||||
|
receive processing taking place in a combination of the receiving
|
||||||
|
ithread and dispatched netisr thread. While handoffs between
|
||||||
|
threads introduces overhead (synchronization, context switching),
|
||||||
|
there is the opportunity to increase parallelism in some workloads
|
||||||
|
through introducing additional worker threads. Identifying work
|
||||||
|
that may be relocated to new threads must be done carefully to
|
||||||
|
balance overhead, and latency concerns, but can pay off by
|
||||||
|
increasing effective CPU utilization and hence throughput. For
|
||||||
|
example, introducing additional netisr threads capable of running on
|
||||||
|
more than one CPU at a time can increase input parallelism, subject
|
||||||
|
to maintaining desirable packet ordering.</p></li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<a name="tasks"></a>
|
||||||
|
<h2>Project Tasks</h2>
|
||||||
|
|
||||||
|
<table border=3>
|
||||||
|
<tr>
|
||||||
|
<th> Task </th>
|
||||||
|
<th> Responsible </th>
|
||||||
|
<th> Last updated </th>
|
||||||
|
<th> Status </th>
|
||||||
|
<th> Notes </th>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td> Mbuf queue library </td>
|
||||||
|
<td> &a.rwatson; </td>
|
||||||
|
<td> 20041106 </td>
|
||||||
|
<td> &status.wip; </td>
|
||||||
|
<td> In order to facilitate passing off queues of packets between
|
||||||
|
network stack components, create an mbuf queue primitive, struct
|
||||||
|
mbufqueue. The initial implementation is complete, and the
|
||||||
|
primitive is now being applied in several sample cases to determine
|
||||||
|
whether it offers the desired semantics and benefits. The
|
||||||
|
implementation can be found in the rwatson_dispatch Perforce
|
||||||
|
branch.</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td> Employ queued dispatch in interface send API </td>
|
||||||
|
<td> &a.rwatson; </td>
|
||||||
|
<td> 20041106 </td>
|
||||||
|
<td> &status.wip; </td>
|
||||||
|
<td> An experimental if_start_mbufqueue() interface to struct ifnet
|
||||||
|
has been added, which passes an mbuf queue to the device driver for
|
||||||
|
processing, avoiding redundant synchronization against the
|
||||||
|
interface queue, even in the event that additional queueing is
|
||||||
|
required. This has not yet been benchmarked. </td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td> Employ queued dispatch in the interface receive API </td>
|
||||||
|
<td> &a.rwatson; </td>
|
||||||
|
<td> 20041106 </td>
|
||||||
|
<td> &status.new; </td>
|
||||||
|
<td> Similar to if_start_mbufqueue, allow input of a queue of mbufs
|
||||||
|
from the device driver into the lowest protocol layers, such as
|
||||||
|
ether_input_mbufqueue. </td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<a name="links"></a>
|
||||||
|
<h2>Links</h2>
|
||||||
|
|
||||||
|
<p>Some useful links relating to the netperf work:</p>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li><p><a href="../smp/">SMPng Project</a> -- Project to introduce
|
||||||
|
finer grained locking in the FreeBSD kernel.</p></li>
|
||||||
|
|
||||||
|
<li><p><a href="http://www.watson.org/~robert/freebsd/netperf">Robert
|
||||||
|
Watson's netperf web page</a> -- Web page that includes a change log
|
||||||
|
and performance measurement/debugging information.</p></li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
&footer;
|
||||||
|
</body>
|
||||||
|
</html>
|
38
en/projects/netperf/style.css
Normal file
38
en/projects/netperf/style.css
Normal file
|
@ -0,0 +1,38 @@
|
||||||
|
BODY {
|
||||||
|
}
|
||||||
|
|
||||||
|
BODY TD {
|
||||||
|
font-size: 13px;
|
||||||
|
}
|
||||||
|
|
||||||
|
BODY SMALL {
|
||||||
|
width: 615px;
|
||||||
|
font-size: 11px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.heading {
|
||||||
|
font-size: 15px;
|
||||||
|
background-color: #cbd2ec;
|
||||||
|
}
|
||||||
|
|
||||||
|
.section {
|
||||||
|
font-size: 15px;
|
||||||
|
font-weight: bold;
|
||||||
|
background-color: #e7e9f7;
|
||||||
|
}
|
||||||
|
|
||||||
|
.notes {
|
||||||
|
font-size: 13px;
|
||||||
|
font-weight: normal;
|
||||||
|
}
|
||||||
|
|
||||||
|
.main {
|
||||||
|
width: 615px;
|
||||||
|
height: auto;
|
||||||
|
text-align: justify;
|
||||||
|
}
|
||||||
|
|
||||||
|
.list {
|
||||||
|
width: 550px;
|
||||||
|
height: auto;
|
||||||
|
}
|
Loading…
Reference in a new issue