Break out PREEMPTION and SMP stability parts of the general stability

task into two separate items now that the resolutions are becoming more clear. Note the current condition of pth. Move the in6_pcbnotify() item from "show stopper" to "testing" since the fix is merged. Add a TODO item for the routing socket netisr concerns. Note that some changes have been committed, but will need to be merged after testing.
svn path=/www/; revision=22059
2004-08-22 21:56:28 +00:00 · 2004-08-22 21:56:28 +00:00 · 8a4063e29b · 2020-12-08 03:00:23 +00:00
commit 8a4063e29b
parent 8b2c953922
1 changed files with 40 additions and 19 deletions
--- a/en/releases/5.3R/todo.sgml
+++ b/en/releases/5.3R/todo.sgml
@ -1,7 +1,7 @@
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" [
 <!ENTITY base CDATA "../..">
 <!ENTITY email 'freebsd-qa'>
-<!ENTITY date "$FreeBSD: www/en/releases/5.3R/todo.sgml,v 1.42 2004/08/21 17:42:49 rwatson Exp $">
+<!ENTITY date "$FreeBSD: www/en/releases/5.3R/todo.sgml,v 1.43 2004/08/22 12:41:39 blackend Exp $">
 <!ENTITY title "FreeBSD 5.3 Open Issues">
 <!ENTITY % includes SYSTEM "../../includes.sgml"> %includes;
 <!ENTITY % developers SYSTEM "../../developers.sgml"> %developers;
@ -30,16 +30,23 @@
      </tr>

  <tr>
-    <td>General instability and lockups under high load</td>
-    <td>&status.new;</td>
+    <td>PREEMPTION-related hangs involving threads</td>
+    <td>&status.wip;</td>
    <td>&a.scottl;, &a.julian;</td>
-    <td>Problems persist with crashes and hangs under heavy load, especially
-      under SMP.  The recent introduction of full-scale preemption exacerbated
-      the problem, though preemption has been turned off temporarily while
-      this problem is debugged.  Speculation on the source of the problem
-      seems to center around problems in the scheduler and appear to be common
-      to both the 4BSD and ULE schedulers.  This needs to be driven to root
-      cause and fixed in order for 5.3 to be considered STABLE.</td>
+    <td>PREEMPTION appears to increase the chances of triggering a race
+      condition in the thread context management and scheduling code.
+      Patches to mitigate the problem have been developed, with on-going
+      work to come up with the correct solution prior to 5.3.</td>
+  </tr>
+
+  <tr>
+    <td>SMP instability under load</td>
+    <td>&status.wip;</td>
+    <td>&a.dwhite;, &a.alc;</td>
+    <td>High load on SMP systems appears to result in a hard hang related
+      to VM IPI.  &a.dwhite; has prepared a candidate patch that appears to
+      resolve this instability, which is currently in testing for merge to
+      the CVS HEAD.</td>
  </tr>

  <tr>
@ -59,15 +66,6 @@
      correctly as of the improved NFS support for disconnection changes.</td>
  </tr>

-  <tr>
-    <td>in6_pcbnotify() panic with TCP</td>
-    <td>&status.wip;</td>
-    <td>&a.rwatson;</td>
-    <td>&a.kuriyama; has reportged a failed locking assertion with IPv6
-      TCP notifications.  A patch has been committed to the CVS HEAD, and,
-      will be merged to RELENG_5 after testing.</td>
-  </tr>
-
  <tr>
    <td>poll()/select() application wedge reports with debug.mpsafenet=1</td>
    <td>&status.wip;</td>
@ -200,6 +198,20 @@
      interactivity for taps and button press events for some users.</td>
  </tr>

+  <tr>
+    <td>Increased and configurable netisr queue max depth for routing
+      sockets</td>
+    <td>&status.wip;</td>
+    <td>&a.rwatson;</td>
+    <td>As part of the MPSAFE network stack work, delivery of routing socket
+      messages was moved to queued dispatch via netisr rather than direct
+      dispatch from the routing code.  However, the risks of lost routing
+      messages for routing daemons are high; respond by increasing the max
+      depth beyond a default interface max depth of 50 to 128, and allow it
+      to be user-configured.  This change is in CVS HEAD, and needs to be
+      merged to RELENG_5 after testing.</td>
+  </tr>
+
 </table>

 <h3>Desired features for 5.3-RELEASE</h3>
@ -520,6 +532,15 @@
      testing is needed.</td>
  </tr>

+  <tr>
+    <td>in6_pcbnotify() panic with TCP</td>
+    <td>&status.wip;</td>
+    <td>&a.rwatson;</td>
+    <td>&a.kuriyama; has reportged a failed locking assertion with IPv6
+      TCP notifications.  A patch has been committed to the CVS HEAD, and,
+      will be merged to RELENG_5 after testing.</td>
+  </tr>
+
 </table>

    &footer;