diff --git a/en/projects/netperf/index.sgml b/en/projects/netperf/index.sgml index c1d0093173..d4bcf1c119 100644 --- a/en/projects/netperf/index.sgml +++ b/en/projects/netperf/index.sgml @@ -1,6 +1,6 @@ - + %includes; @@ -144,7 +144,7 @@ Prefer file descriptor reference counts to socket reference counts for system calls. &a.rwatson; - 20041024 + 20041124 &status.done; Sockets and file descriptors both have reference counts in order to prevent these objects from being free'd while in use. However, @@ -155,14 +155,14 @@ thus avoiding the synchronized operations necessary to modify the socket reference count, an approach also taken in the VFS code. This change has been made for most socket system calls, and has - been committed to HEAD (6.x). It will be merged to 5.x in the - near future. + been committed to HEAD (6.x). It has also been merged to RELENG_5 + for inclusion in 5.4. Mbuf queue library &a.rwatson; - 20041106 + 20041124 &status.prototyped; In order to facilitate passing off queues of packets between network stack components, create an mbuf queue primitive, struct @@ -170,7 +170,9 @@ primitive is now being applied in several sample cases to determine whether it offers the desired semantics and benefits. The implementation can be found in the rwatson_dispatch Perforce - branch. + branch. Additional work must also be done to explore the + performance impact of "queues" vs arrays of mbuf pointers, which + are likely to behave better from a caching perspective. @@ -185,7 +187,7 @@ required. This has not yet been benchmarked. A subset change to dispatch a single mbuf to a driver has also been prototyped, and bencharked at a several percentage point improvement in packet send - rates from user space. + rates from user space. @@ -201,19 +203,20 @@ Employ queued dispatch across netisr dispatch API &a.rwatson; - 20041113 - &status.new; - Similar to if_start_mbufqueue(), allow dispatch of queues of - mbufs into the netisr interface, avoiding multiple wakeups when a - netisr thread is already in execution. Wakeups are expensive - operations even when there are no threads waiting. + 20041124 + &status.prototyped; + Pull all of the mbufs in the netisr ifqueue out of the ifqueue + into a thread-local mbuf queue to avoid repeated lock operations + to access the queue. Also use lock-free operations to test for + queue contents being present. This has been prototyped in the + rwatson_netperf branch. Modify UMA allocator to use critical sections not mutexes for per-CPU caches. &a.rwatson; - 20041111 + 20041124 &status.prototyped; The mutexes protecting per-CPU caches require atomic operations on SMP systems; as they are per-CPU objects, the cost of @@ -222,13 +225,14 @@ has been implemented in the rwatson_percpu branch, but is waiting on critical section performance optimizations that will prevent this change from negatively impacting uniprocessor performance. - + The critical section operations from John Baldwin have been posted + for public review. Optimize critical section performance &a.jhb; - 20041111 + 20041124 &status.prototyped; Critical sections prevent preemption of a thread on a CPU, as well as preventing migration of that thread to another CPU, and @@ -245,7 +249,8 @@ cost as a mutex, meaning that optimizations on SMP to use critical sections instead of mutexes will not harm UP performance. A prototype of this change is present in the jhb_lock Perforce - branch. + branch, and patches have been posted to per-architecture mailing + lists for review.