Break out PREEMPTION and SMP stability parts of the general stability

task into two separate items now that the resolutions are becoming
more clear.  Note the current condition of pth.

Move the in6_pcbnotify() item from "show stopper" to "testing" since
the fix is merged.

Add a TODO item for the routing socket netisr concerns.  Note that some
changes have been committed, but will need to be merged after testing.
This commit is contained in:
Robert Watson 2004-08-22 21:56:28 +00:00
parent 8b2c953922
commit 8a4063e29b
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/www/; revision=22059

View file

@ -1,7 +1,7 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" [
<!ENTITY base CDATA "../..">
<!ENTITY email 'freebsd-qa'>
<!ENTITY date "$FreeBSD: www/en/releases/5.3R/todo.sgml,v 1.42 2004/08/21 17:42:49 rwatson Exp $">
<!ENTITY date "$FreeBSD: www/en/releases/5.3R/todo.sgml,v 1.43 2004/08/22 12:41:39 blackend Exp $">
<!ENTITY title "FreeBSD 5.3 Open Issues">
<!ENTITY % includes SYSTEM "../../includes.sgml"> %includes;
<!ENTITY % developers SYSTEM "../../developers.sgml"> %developers;
@ -30,16 +30,23 @@
</tr>
<tr>
<td>General instability and lockups under high load</td>
<td>&status.new;</td>
<td>PREEMPTION-related hangs involving threads</td>
<td>&status.wip;</td>
<td>&a.scottl;, &a.julian;</td>
<td>Problems persist with crashes and hangs under heavy load, especially
under SMP. The recent introduction of full-scale preemption exacerbated
the problem, though preemption has been turned off temporarily while
this problem is debugged. Speculation on the source of the problem
seems to center around problems in the scheduler and appear to be common
to both the 4BSD and ULE schedulers. This needs to be driven to root
cause and fixed in order for 5.3 to be considered STABLE.</td>
<td>PREEMPTION appears to increase the chances of triggering a race
condition in the thread context management and scheduling code.
Patches to mitigate the problem have been developed, with on-going
work to come up with the correct solution prior to 5.3.</td>
</tr>
<tr>
<td>SMP instability under load</td>
<td>&status.wip;</td>
<td>&a.dwhite;, &a.alc;</td>
<td>High load on SMP systems appears to result in a hard hang related
to VM IPI. &a.dwhite; has prepared a candidate patch that appears to
resolve this instability, which is currently in testing for merge to
the CVS HEAD.</td>
</tr>
<tr>
@ -59,15 +66,6 @@
correctly as of the improved NFS support for disconnection changes.</td>
</tr>
<tr>
<td>in6_pcbnotify() panic with TCP</td>
<td>&status.wip;</td>
<td>&a.rwatson;</td>
<td>&a.kuriyama; has reportged a failed locking assertion with IPv6
TCP notifications. A patch has been committed to the CVS HEAD, and,
will be merged to RELENG_5 after testing.</td>
</tr>
<tr>
<td>poll()/select() application wedge reports with debug.mpsafenet=1</td>
<td>&status.wip;</td>
@ -200,6 +198,20 @@
interactivity for taps and button press events for some users.</td>
</tr>
<tr>
<td>Increased and configurable netisr queue max depth for routing
sockets</td>
<td>&status.wip;</td>
<td>&a.rwatson;</td>
<td>As part of the MPSAFE network stack work, delivery of routing socket
messages was moved to queued dispatch via netisr rather than direct
dispatch from the routing code. However, the risks of lost routing
messages for routing daemons are high; respond by increasing the max
depth beyond a default interface max depth of 50 to 128, and allow it
to be user-configured. This change is in CVS HEAD, and needs to be
merged to RELENG_5 after testing.</td>
</tr>
</table>
<h3>Desired features for 5.3-RELEASE</h3>
@ -520,6 +532,15 @@
testing is needed.</td>
</tr>
<tr>
<td>in6_pcbnotify() panic with TCP</td>
<td>&status.wip;</td>
<td>&a.rwatson;</td>
<td>&a.kuriyama; has reportged a failed locking assertion with IPv6
TCP notifications. A patch has been committed to the CVS HEAD, and,
will be merged to RELENG_5 after testing.</td>
</tr>
</table>
&footer;