diff --git a/en/projects/netperf/index.sgml b/en/projects/netperf/index.sgml
index c1d0093173..d4bcf1c119 100644
--- a/en/projects/netperf/index.sgml
+++ b/en/projects/netperf/index.sgml
@@ -1,6 +1,6 @@
-
+
%includes;
@@ -144,7 +144,7 @@
Prefer file descriptor reference counts to socket reference
counts for system calls. |
&a.rwatson; |
- 20041024 |
+ 20041124 |
&status.done; |
Sockets and file descriptors both have reference counts in order
to prevent these objects from being free'd while in use. However,
@@ -155,14 +155,14 @@
thus avoiding the synchronized operations necessary to modify the
socket reference count, an approach also taken in the VFS code.
This change has been made for most socket system calls, and has
- been committed to HEAD (6.x). It will be merged to 5.x in the
- near future. |
+ been committed to HEAD (6.x). It has also been merged to RELENG_5
+ for inclusion in 5.4.
Mbuf queue library |
&a.rwatson; |
- 20041106 |
+ 20041124 |
&status.prototyped; |
In order to facilitate passing off queues of packets between
network stack components, create an mbuf queue primitive, struct
@@ -170,7 +170,9 @@
primitive is now being applied in several sample cases to determine
whether it offers the desired semantics and benefits. The
implementation can be found in the rwatson_dispatch Perforce
- branch. |
+ branch. Additional work must also be done to explore the
+ performance impact of "queues" vs arrays of mbuf pointers, which
+ are likely to behave better from a caching perspective.
@@ -185,7 +187,7 @@
required. This has not yet been benchmarked. A subset change to
dispatch a single mbuf to a driver has also been prototyped, and
bencharked at a several percentage point improvement in packet send
- rates from user space.
+ rates from user space.
@@ -201,19 +203,20 @@
Employ queued dispatch across netisr dispatch API |
&a.rwatson; |
- 20041113 |
- &status.new; |
- Similar to if_start_mbufqueue(), allow dispatch of queues of
- mbufs into the netisr interface, avoiding multiple wakeups when a
- netisr thread is already in execution. Wakeups are expensive
- operations even when there are no threads waiting. |
+ 20041124 |
+ &status.prototyped; |
+ Pull all of the mbufs in the netisr ifqueue out of the ifqueue
+ into a thread-local mbuf queue to avoid repeated lock operations
+ to access the queue. Also use lock-free operations to test for
+ queue contents being present. This has been prototyped in the
+ rwatson_netperf branch. |
Modify UMA allocator to use critical sections not mutexes for
per-CPU caches. |
&a.rwatson; |
- 20041111 |
+ 20041124 |
&status.prototyped; |
The mutexes protecting per-CPU caches require atomic operations
on SMP systems; as they are per-CPU objects, the cost of
@@ -222,13 +225,14 @@
has been implemented in the rwatson_percpu branch, but is waiting
on critical section performance optimizations that will prevent
this change from negatively impacting uniprocessor performance.
- |
+ The critical section operations from John Baldwin have been posted
+ for public review.
Optimize critical section performance |
&a.jhb; |
- 20041111 |
+ 20041124 |
&status.prototyped; |
Critical sections prevent preemption of a thread on a CPU, as
well as preventing migration of that thread to another CPU, and
@@ -245,7 +249,8 @@
cost as a mutex, meaning that optimizations on SMP to use critical
sections instead of mutexes will not harm UP performance. A
prototype of this change is present in the jhb_lock Perforce
- branch. |
+ branch, and patches have been posted to per-architecture mailing
+ lists for review.