With input from kris, substanatially overhaul this article. Reflect the

discontinuation of 4.X and alpha packages; document the few times that
full builds are done; document what you need to do with incomplete or
failed builds; better document how to interrupt a build; add writeups
for the new status scripts; and add a section on Dealing With Build Errors.
This commit is contained in:
Mark Linimon 2007-06-21 04:03:56 +00:00
parent 32f6b041de
commit 96eb6af541
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=30311

View file

@ -18,6 +18,7 @@
<year>2004</year>
<year>2005</year>
<year>2006</year>
<year>2007</year>
<holder role="mailto:portmgr@FreeBSD.org">The &os; Ports
Management Team</holder>
</copyright>
@ -34,32 +35,55 @@
<title>Introduction and Conventions</title>
<para>In order to provide pre-compiled binaries of third-party
applications for &os;, the ports collection is regularly
applications for &os;, the Ports Collection is regularly
built on one of the <quote>Package Building Clusters.</quote>
Currently, there are two such clusters:
<hostid role="fqdn">pointyhat.FreeBSD.org</hostid> and
<hostid role="fqdn">dosirak.kr.FreeBSD.org</hostid>.</para>
Currently, the main cluster in use is at
<ulink url="http://pointyhat.FreeBSD.org"></ulink>.</para>
<para>Most of the package building magic occurs under the
<filename>/var/portbuild</filename> directory. Unless
otherwise specified, all paths will be relative to
this location. <replaceable>${arch}</replaceable> will
be used to specify one of the package architectures
(&i386;, alpha, &sparc64;, ia64, and amd64), and
(amd64, &i386;, ia64, and &sparc64;), and
<replaceable>${branch}</replaceable> will be used
to specify the build branch (4, 5, 5-exp, 6, 6-exp, 7).
to specify the build branch (5, 5-exp, 6, 6-exp, 7).
</para>
<note>
<para>Packages are no longer built for Release 4, nor
for the alpha architecture.</para>
</note>
<para>The scripts that control all of this live in
<filename>/var/portbuild/scripts/</filename>. These are the
checked-out copies from
<filename>/usr/ports/Tools/portbuild/scripts/</filename>.</para>
<para>Typically, incremental builds are done that use previous
packages as dependendencies; this takes less time, and puts less
load on the mirrors. Full builds are usually only done:</para>
<itemizedlist>
<listitem><para>right after release time, for the
<literal>-STABLE</literal> branches</para></listitem>
<listitem><para>every month or so, for <literal>-CURRENT</literal>
</para></listitem>
<listitem><para>for experimental builds</para></listitem>
</itemizedlist>
</sect1>
<sect1 id="management">
<title>Build Client Management</title>
<para>The &i386;, alpha, amd64, and two &sparc64; clients currently
netboot from <hostid>pointyhat</hostid>; the other sparc64 client
and ia64 clients are self-hosted. In all cases they set themselves
<para>The &i386; clients currently
netboot from <hostid>pointyhat</hostid>; the other clients
are self-hosted. In all cases they set themselves
up at boot-time to prepare to build packages.</para>
<para>In the latest round of portbuild updates,
<para>Although connected nodes are supported,
<replaceable>disconnected</replaceable> cluster node support has
been added. A disconnected node is
one that does not mount the cluster master via NFS. It could be
@ -71,8 +95,14 @@
<para>The
<username>ports-<replaceable>${arch}</replaceable></username>
user can &man.ssh.1; as <username>root</username> onto
each of the <replaceable>${arch}</replaceable> nodes.</para>
user can &man.ssh.1; as <username>root</username>
(not as <username>ports-<replaceable>${arch}</replaceable></username>)
onto
each of the <replaceable>${arch}</replaceable> nodes
which are connected nodes. In general, this is not true of the
disconnected nodes. Use <command>sudo</command> and check the
<hostid>portbuild.<replaceable>hostname</replaceable>.conf</hostid>
for the user and access details.</para>
<para>The <command>scripts/allgohans</command> script can
be used to run a command on all of the
@ -83,7 +113,7 @@
build cluster, and schedule which nodes build which ports.
This script is not very robust, and has a tendency to die.
It is best to start up this script on the build master
(either <hostid>pointyhat</hostid> or <hostid>dosirak</hostid>)
(e.g. <hostid>pointyhat</hostid>)
after boot time using a &man.while.1; loop.
</para>
</sect1>
@ -147,12 +177,6 @@
are used to perform the builds. Most useful are:</para>
<itemizedlist>
<listitem>
<para><command>dopackages.4</command> - Perform a
4.X build
</para>
</listitem>
<listitem>
<para><command>dopackages.5</command> - Perform a
5.X build
@ -186,6 +210,12 @@
</listitem>
</itemizedlist>
<note>
<para>As of early 2007, the 5-exp branch is just another 6-exp
branch in disguise. For instance, it will have
<filename>INDEX-6</filename>.</para>
</note>
<para>These are wrappers around <command>dopackages</command>,
and are all symlinked to <command>dopackages.wrapper</command>.
New branch wrapper scripts can be created by symlinking
@ -278,7 +308,7 @@
was spent rebuilding things that were going to
fail anyway. Conversely, the other clusters
are slow enough that it would be a waste of time
to try and build <literal>BROKEN</literal> ports.
to try and build <literal>BROKEN</literal> ports).
</para>
</listitem>
@ -296,13 +326,6 @@
</para>
</listitem>
<listitem>
<para><literal>-nodoccvs</literal> - Do not
<command>cvs update</command> the
<literal>doc</literal> tree during preprocessing. (obsolete)
</para>
</listitem>
<listitem>
<para><literal>-norestr</literal> - Do not attempt to build
<literal>RESTRICTED</literal> ports.
@ -333,6 +356,26 @@
</listitem>
</itemizedlist>
<para>If the last build finished cleanly you don't need to delete
anything; if it was interrupted you just need to run
<literal>dosetupnodes</literal> on all clients for the
the relevant branch. <filename>errors/</filename>,
<filename>logs/</filename>, <filename>packages/</filename>, and so
forth, are cleaned by the scripts. If you are short of space,
you can also clean out <filename>ports/distfiles/</filename>.
Leave the <filename>latest/</filename> directory alone; it is
a symlink for the webserver.</para>
<note>
<para><literal>dosetupnodes</literal> is supposed to be run from
the <literal>dopackages</literal> script in the
<literal>-restart</literal> case, but it can be a good idea to
run it by hand and then verify that the clients all have the
expected job load. Sometimes,
<literal>dosetupnode</literal> cannot clean up a build and you
need to do it by hand. (This is a bug.)</para>
</note>
<para>Make sure the <replaceable>${arch}</replaceable> build
is run as the ports-<replaceable>${arch}</replaceable> user
or it will complain loudly.</para>
@ -340,7 +383,7 @@
<note><para>The actual package build itself occurs in two
identical phases. The reason for this is that sometimes
transient problems (e.g. NFS failures, FTP sites being
unreachable, etc.) may halt the build. Doing things
unreachable, etc.) may halt a build. Doing things
in two phases is a workaround for these types of
problems.</para></note>
@ -367,6 +410,24 @@
<filename>Makefile</filename> with no <makevar>SUBDIR</makevar>s
in it. This is probably a bug.</para>
</note>
<example>
<title>Update the i386-6 tree and do a complete build</title>
<para><command>dopackages.6 i386 -nocvs -norestr -nofinish</command></para>
</example>
<example>
<title>Restart an interrupted amd64-5 build witout updating</title>
<para><command>dopackages.5 amd64 -nocvs -noportscvs -norestr -continue -noindex -noduds -nofinish</command></para>
</example>
<example>
<title>Post-process a completed sparc64-7 tree</title>
<para><command>dopackages.7 sparc64 -finish</command></para>
</example>
</sect1>
<sect1 id="anatomy">
@ -458,39 +519,121 @@
<sect1 id="interrupting">
<title>Interrupting a Build</title>
<para>Sending a <literal>HUP</literal> signal to the
<command>dopackages*</command> shell processes or to any
<command>make</command> process invoked by those scripts
is usually sufficient to interrupt the build. The
<para>Interrupting a build is a bit messy. First you need to
identify the tty in which it's running (either record the output
of &man.tty.1; when you start the build, or use <command>ps x</command>
to identify it. You need to make sure that nothing else important
is running in this tty, e.g. <command>ps -t p1</command> or whatever.
Then either <command>kill -HUP</command> in there by e.g.
<command>ps -t p1 | awk '{print $1}' | xargs kill -HUP</command>. Replace
<replaceable>p1</replaceable> by whatever the tty is, of course.</para>
<para>The
package builds dispatched by <command>make</command> to
the client machines will clean themselves up after a
few minutes (check with <command>ps x</command> until they
all go away). The following command usually does the trick:</para>
all go away).</para>
<screen>&prompt.user; <userinput>killall -HUP sh ssh make</userinput></screen>
<para>If you don't kill &man.make.1;, then it will spawn more jobs.
If you don't kill <command>dopackages</command>, then it will restart
the entire build. If you don't kill the <command>pdispatch</command>
processes, they'll keep going (or respawn) until they've built their
package.</para>
<para>Remove the
<para>To free up resources, you will need to clean up by running
<command>dosetupnode</command> on each client machine. For example,
in &man.csh.1;:
<screen>&prompt.user; <userinput>cd ~/loads; foreach i (*); /var/portbuild/scripts/dosetupnode i386 5-exp $i -norsync &; done</userinput></screen>
The <literal>-norsync</literal> says not to bother resyncing the
entire build data (ports tree, etc) on any remote machines, and it
will just clean up old <literal>chroot</literal>s and then reset the
build queue for that machine.</para>
<para>If you forget to do this, then the old build
<literal>chroot</literal>s won't be cleaned up for 24 hours, and no
new jobs will be dispatched in their place since
<hostid>pointyhat</hostid> thinks the job slot is still occupied.</para>
<para>To check, <command>cat ~/loads/*</command> to display the
status of client machines; the first column is the number of jobs
it thinks is running, and this should be roughly concordant
with the load average. <literal>loads</literal> is refreshed
every 2 minutes. If you do <command>ps x | grep pdispatch</command>
and it's less than the number of jobs that <literal>loads</literal>
thinks are in use, you're in trouble.</para>
<para>You may have problem with the <command>umount</command>
commands hanging. If so, you are going to have to use the
<command>allgohans</command> script to run an &man.ssh.1;
command across all clients for that buildenv. For example:
<screen>ssh -l root gohan24 df</screen>
will get you a df, and
<screen>allgohans "umount -f pointyhat.freebsd.org:/var/portbuild/i386/6-exp/ports"
allgohans "umount -f pointyhat.freebsd.org:/var/portbuild/i386/6-exp/src"</screen>
are supposed to get rid of the hanging mounts. You will have to
keep doing them since there can be multiple mounts.</para>
<note>
<para>Ignore the following:
<screen>umount: pointyhat.freebsd.org:/var/portbuild/i386/6-exp/ports: statfs: No such file or directory
umount: pointyhat.freebsd.org:/var/portbuild/i386/6-exp/ports: unknown file system
umount: Cleanup of /x/tmp/6-exp/chroot/53837/compat/linux/proc failed!
/x/tmp/6-exp/chroot/53837/compat/linux/proc: not a file system root directory</screen>
The former 2 mean that that client didn't have those mounted;
the latter 2 are a bug.</para>
<para>You may also see messages about <literal>procfs</literal>.</para>
</note>
<para>After you have done all the above, remove the
<filename><replaceable>${arch}</replaceable>/lock</filename>
file before trying to restart the build.
file before trying to restart the build. If you don't,
<filename>dopackages</filename> will simply exit.
</para>
<para>If you have to do a <command>cvs update</command> before
restarting, you may have to rebuild either <filename>duds</filename>,
<filename>INDEX</filename>, or both. If you are doing the latter
manually, you will also have to rebuild
<filename>packages/All/Makefile</filename> via the
<command>makeparallel</command> script.</para>
</sect1>
<sect1 id="monitoring">
<title>Monitoring the Build</title>
<para>The
<para>You can use either
<command>showrunning</command> or <command>python straslivy.py</command>
to show the packages currently being built. The
<command>scripts/stats <replaceable>${branch}</replaceable></command>
command counts the number of packages currently built.</para>
command shows the number of packages already built.</para>
<para>Running <command>cat /var/portbuild/*/loads/*</command>
shows the client loads and number of concurrent builds in
progress.</para>
progress. The files that have been recently updated are the clients
that are online; the others are the offline clients.</para>
<note>
<para>The <command>pdispatch</command> command does the dispatching
of work onto the client, and post-processing.
<command>ptimeout.host</command> is a watchdog that kills a build
after timeouts. So, having 50 <command>pdispatch</command>
processes but only 4 &man.ssh.1; processes means 46
<command>pdispatch</command>es are idle, waiting to get an
idle node.</para>
</note>
<para>Running <command>tail -f <replaceable>${arch}</replaceable>/<replaceable>${branch}</replaceable>/build.log</command>
shows the overall build progress.</para>
<para>If a build is failing, and it is not immediately obvious
from the port build log as to why, you can preserve the
<para>If a port build is failing, and it is not immediately obvious
from the log as to why, you can preserve the
<literal>WRKDIR</literal> for further analysis. To do this,
touch a file called <filename>.keep</filename> in the port's
directory. The next time the cluster tries to build this port,
@ -503,6 +646,79 @@
<filename>/var/portbuild</filename> file system becomes full
then <trademark>Bad Things</trademark> happen.
</para>
<para>The status of all current builds is generated twice an hour
and posted to
<ulink url="http://pointyhat.FreeBSD.org/errorlogs/packagestats.html"></ulink>.
For each <literal>buildenv</literal>, the following is displayed:</para>
<itemizedlist>
<listitem>
<para><literal>cvs date</literal> is the contents of
<filename>cvsdone</filename>. This is why we recommend that you
update <filename>cvsdone</filename> for <literal>-exp</literal>
runs (see below).</para>
</listitem>
<listitem>
<para>date of <literal>latest log</literal></para>
</listitem>
<listitem>
<para>number of lines in <literal>INDEX</literal></para>
</listitem>
<listitem>
<para>the number of current <literal>build logs</literal></para>
</listitem>
<listitem>
<para>the number of completed <literal>packages</literal></para>
</listitem>
<listitem>
<para>the number of <literal>errors</literal></para>
</listitem>
<listitem>
<para>the number of duds (shown as <literal>skipped</literal>)</para>
</listitem>
<listitem>
<para><literal>missing</literal> shows the difference between
<filename>INDEX</filename> and the other columns. If you have
restarted a run after a <command>cvs update</command>, there
will likely be duplicates in the build log and error columns,
and this column will be meaningless. (The script is naive).</para>
</listitem>
<listitem>
<para><literal>running</literal> and <literal>completed</literal>
are guesses based on a &man.grep.1; of <filename>build.log</filename>.
</para>
</listitem>
</itemizedlist>
</sect1>
<sect1 id="errors">
<title>Dealing With Build Errors</title>
<para>The easiest way to track build failures is to receive
the emailed logs and sort them to a folder, so you can maintain a
running list of current failures and detect new ones easily.
To do this, add an email address to
<filename><replaceable>${branch}</replaceable>/portbuild.conf</filename>.
You can easily bounce the new ones to maintainers.</para>
<para>After a port appears broken on every build combination
multiple times, it is time to mark it <literal>BROKEN</literal>.
Two weeks' notification for the maintainers seems fair.</para>
<note>
<para>To avoid build errors with ports that need to be manually
fetched, put the distfiles into
<filename>~ftp/pub/FreeBSD/distfiles</filename>.</para>
</note>
</sect1>
<sect1 id="release">
@ -637,15 +853,37 @@
<literal>bsd.port.mk</literal>), or to test large sweeping
upgrades. The current experimental patches branch is
<literal>6-exp</literal> on the &i386;
architecture.</para>
architecture. (For the moment, we are also using
<literal>5-exp</literal> as a secondary branch, but
actually using the bits from <literal>RELENG_6</literal>.)</para>
<para>In general, an experimental patches build is run the same
way as any other build. However, before running the
<literal>dopackages</literal> script, you must apply the required
patches to the ports tree. It is always a good idea to save
way as any other build, except that you should first update the
ports tree to the latest version and then apply your patches.
To do the former, you can use the following:
<screen>&prompt.user; <userinput>cvs -R update -dP > update.out</userinput>
&prompt.user; <userinput>date > cvsdone</userinput></screen>
This will most closely simulate what the <literal>dopackages</literal>
script does. (While <filename>cvsdone</filename> is merely
informative, it can be a help.)</para>
<para>You will need to edit <filename>update.out</filename> to look
for lines beginning with <literal>^M</literal>, <literal>^C</literal>,
or <literal>^?</literal> and then deal with them.</para>
<para>It is always a good idea to save
original copies of all changed files, as well as a list of what
you are changing. You can then look back on this list when doing
the final commit.</para>
the final commit, to make sure you are committing exactly what you
tested.</para>
<para>Since the machine is shared, someone else may delete your
changes by mistake, so keep a copy of them in e.g. your home
directory on <hostid>freefall</hostid>. Don't use
<filename>tmp/</filename>; since <hostid>pointyhat</hostid>
itself runs some version of <literal>-CURRENT</literal>, you
can expect reboots (if nothing else, for updates).</para>
<para>In order to have a good control case with which to compare
failures, you should first do a package build of the branch on
@ -733,7 +971,7 @@
<screen>&prompt.user; <userinput>cd /var/portbuild/i386/6/ports</userinput></screen>
<note><para>Be sure to cvs update this tree to the same date as
<note><para>Be sure to <literal>cvs update</literal> this tree to the same date as
the experimental patches tree.</para></note>
<para>The following command will set up the control branch for
@ -764,7 +1002,9 @@
of their dependencies.</para>
<para>You can check the progress of this
partial build the same way you would a regular build. Once all
partial build the same way you would a regular build.</para>
<para>Once all
the errors have been resolved, you can commit the package set.
After committing, it is customary to send a <literal>HEADS
UP</literal> email to <ulink