Add an idea for multi-queue BPF support.

This commit is contained in:
Robert Watson 2010-03-21 16:52:29 +00:00
parent f56ebb2e54
commit 5b09c7fb45
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/www/; revision=35545

View file

@ -15,7 +15,7 @@ Ideas//EN"
<ideas>
<cvs:keywords xmlns:cvs="http://www.FreeBSD.org/XML/CVS" version="1.0">
<cvs:keyword name="freebsd">
$FreeBSD: www/en/projects/ideas/ideas.xml,v 1.151 2010/03/21 16:20:58 rwatson Exp $
$FreeBSD: www/en/projects/ideas/ideas.xml,v 1.152 2010/03/21 16:37:18 rwatson Exp $
</cvs:keyword>
</cvs:keywords>
@ -1161,6 +1161,51 @@ href="http://info.iet.unipi.it/~luigi/FreeBSD/linux_bsd_kld.html">here</a>.
</desc>
</idea>
<idea id="multiqbpf" class="soc">
<title>Multiqueue BPF support and other BPF features</title>
<desc><p><strong>Technical contact</strong>: <a
href="mailto:rwatson@FreeBSD.org">Robert Watson</a></p>
<p>The Berkeley Packet Filter (BPF) allows packet capture filters to
be compiled into a bytecode that is either interpreted by a kernel
virtual machine, or compiled into native machine code via a JIT and
executed in in-kernel. Historically, the origin of packets has
been the network interface, with each (synthetic) BPF device
attached to exactly one NIC as requested by the application (for
example, tcpdump). However, network interfaces have become
significantly more complicated, and BPF has had to grow to support
new features, such as Data Link Types (DLTs), in which BPF devices
can tap network processing at different layers. This task would
involve teaching BPF about a further dimension in network interface
complexity: multiple input and output queues.</p>
<p>Modern 10gbps, and even 1gbps, network cards support multiple
queues for various reasons: at first quality of service (QoS)
differentiation in processing, but now especially to improve
parallelism in network processing by distributing work to many CPUs
on input. This same technica can also accelerate packet capture,
but BPF currently doesn't know this. In this project, BPF would be
enhanced to be aware of individual input and output queues on a NIC,
which means providing network stack abstractions for these concepts,
visible today only within device drivers. Userspace threads might
then use enhanced ioctl(2)s to query the set of available queues
and attach to one or more. Applications seeking maximum parallelism
could open (n) devices, attaching each to a queue, and executing
with appropriate CPU affinity. Ideally this would involve neither
lock nor cache line contention throughout the entire stack, from
device driver to userspace delivery.</p>
<ul>
<li>Strong knowledge of C.</li>
<li>Experience with multi-threaded programming.</li>
<li>Experience with kernel programming.</li>
<li>Knowledge of the TCP/IP protocol suite.</li>
</ul>
</desc>
</idea>
</category>
<category>