649 lines
22 KiB
Text
649 lines
22 KiB
Text
This will be the location of the DocBook version of the FreeBSD Handbook,
|
|
which will eventually obsolete the version currently in doc/handbook/.
|
|
|
|
Interested parties should examine
|
|
|
|
<URL:http://www.nothing-going-on.demon.co.uk/FreeBSD/docbook-migration.html>
|
|
|
|
and get in touch with Nik Clayton (either to nik@FreeBSD.ORG or via the
|
|
FreeBSD-doc mailing list) if they have specific questions.
|
|
|
|
All the scripts mentioned here can also be downloaded by doing to
|
|
|
|
<URL:http://www.freebsd.org/~nik/script_name>
|
|
|
|
for example,
|
|
|
|
<URL:http://www.freebsd.org/~nik/entity-cdata.pl>
|
|
|
|
|
|
------------------------------------------------------------------------
|
|
The Handbook is midway through the conversion process. It will almost
|
|
certainly not convert to other formats cleanly
|
|
------------------------------------------------------------------------
|
|
|
|
|
|
Actions
|
|
|
|
This list explains what's been done so far, so the Japanese team can
|
|
track my changes. All actions took place on freefall.
|
|
|
|
1. Initial conversion to DocBook
|
|
|
|
Checked out a copy of the doc repository to ~/cvs/. Then used 2 scripts
|
|
to convert the handbook to its initial DocBook format. The 2 scripts are
|
|
2docbook.sh and entity-cdata.pl, both of which can be found in ~nik/bin/.
|
|
|
|
2docbook.sh calls entity-cdata.pl as necessary.
|
|
|
|
% cd ~/cvs/doc/handbook
|
|
% 2docbook.sh
|
|
|
|
This created handbook-db.sgml in ~/cvs/doc/handbook. This file contains
|
|
syntactically valid (but quite ugly) SGML. This file was then moved to
|
|
the doc/en/handbook directory and renamed to handbook.sgml. The
|
|
conversion process left a few spurious changes in the old handbook files
|
|
which I don't want to commit, so I removed them and updated the
|
|
repository.
|
|
|
|
The new file was then committed.
|
|
|
|
% mv handbook-db.sgml ~/cvs/doc/en/handbook/handbook.sgml
|
|
% rm *.sgml
|
|
% cvs update
|
|
% cd ~/cvs/doc/en/handbook
|
|
% cvs add handbook.sgml
|
|
% cvs commit
|
|
|
|
2. handbook.sgml was loaded into XEmacs 20.30 (straight from the ports
|
|
collection) and sgml-mode was turned on. My .emacs file contains the
|
|
following hook:
|
|
|
|
(add-hook 'sgml-mode-hook
|
|
(function
|
|
(lambda()
|
|
(setq sgml-omittag nil)
|
|
(setq sgml-indent-data t))))
|
|
|
|
This configures psgml to not omit any tags that the DTD lists as
|
|
omittable, and to indent data in the same way that markup is indented.
|
|
|
|
The following function was pasted into the *scratch* buffer, and then
|
|
"M-x eval-current-buffer" was run.
|
|
|
|
(defun sgml-indent-buffer
|
|
"Indents the current buffer, one line at a time"
|
|
(interactive "*")
|
|
(save-excursion
|
|
(goto-char (point-min))
|
|
(while (= (forward-line 1) 0)
|
|
(sgml-indent-or-tab))))
|
|
|
|
In the handbook.sgml buffer, the point was placed on the first character
|
|
of the first line, and "M-x sgml-indent-buffer" was run.
|
|
|
|
The changes were then committed.
|
|
|
|
3. Refilled the Handbook -- this rewraps the lines as necessary. This was
|
|
done by placing the point on the first <book> tag, and running
|
|
"M-x sgml-fill-element".
|
|
|
|
This takes about 10 minutes to run.
|
|
|
|
It also reformats some sections that should not be reformatted, including
|
|
examples of text on the screen, PGP key blocks and so on. They will be
|
|
fixed in a later commit.
|
|
|
|
4. Removed spurious markup. The conversion process has left a lot of
|
|
|
|
<para></para>
|
|
|
|
entries in the handbook, and they need to be removed. There are a
|
|
number of places this happens, and the rules are slightly different
|
|
each time.
|
|
|
|
For example,
|
|
|
|
====================================================================
|
|
Original markup Changed markup
|
|
--------------------------------------------------------------
|
|
|
|
<listitem> <listitem>
|
|
<para></para> <para>A real paragraph</para>
|
|
|
|
<para>A real paragraph</para>
|
|
|
|
--------------------------------------------------------------
|
|
|
|
<para>A real paragraph</para> <para>A real paragraph</para>
|
|
</listitem>
|
|
<para></para>
|
|
|
|
</listitem>
|
|
|
|
--------------------------------------------------------------
|
|
|
|
<para>A real paragraph</para> <para>A real paragraph</para>
|
|
|
|
<para></para> <para>Another paragraph</para>
|
|
|
|
<para>Another paragraph</para>
|
|
====================================================================
|
|
|
|
Notice the last example. It's not enough to simply put together a
|
|
regexp that matches (all whitespace)<para></para>(allwhitespace)
|
|
and removes it, since that would leave you with
|
|
|
|
<para>A realparagraph</para>
|
|
<para>Another paragraph</para>
|
|
|
|
In the end I got bored of trying to write this using Emacs regexps,
|
|
and knocked together a quick Perl script to do it. It's ~nik/bin/para.pl
|
|
on freefall.
|
|
|
|
5. Got halfway through looking for filenames, and marking them up as such.
|
|
|
|
There are a lot ( :-( ) of filenames in the Handbook. The conversion
|
|
process did a pretty good job of marking them as <filename>...</filename>
|
|
but it wasn't perfect.
|
|
|
|
I'm halfway through (line 16704) going through the Handbook, eyeballing
|
|
each line and changing things like <emphasis remap="tt">...</emphasis>
|
|
to <filename>...</filename> where appropriate.
|
|
|
|
The remainder will follow tomorrow evening.
|
|
|
|
6. Finished the first sweep marking up filenames.
|
|
|
|
If it looked like a filename (but wasn't a command for the user to type
|
|
in) it's been marked up with <filename>...</filename>.
|
|
|
|
If it had already been marked up as <filename>...</filename> but wasn't
|
|
a filename, the markup was changed to <emphasis remap="tt">...</emphasis>
|
|
|
|
PSGML and Xemacs are very useful for this, using "C-c =" to change
|
|
existing markup.
|
|
|
|
Synchronising with changes 5 and 6 will involve examining the diffs
|
|
and changing by hand I'm afraid. It could not be automated.
|
|
|
|
7. Start replacing `` and '' with <quote> and </quote>. Don't change
|
|
things indiscriminately, but look at the context to see if the change is
|
|
appropriate. There are still many `` and '' occurences which should be
|
|
changed to some other element.
|
|
|
|
This was done using a regexp search/replace, looking for the regexp
|
|
|
|
``\([^']\)''
|
|
|
|
and replacing with
|
|
|
|
<quote>\1</quote>
|
|
|
|
Not all the `` '' pairs were changed, since in some cases they delimit
|
|
filenames, options and so on.
|
|
|
|
8. As with change 7, but replace with <command> ... </command> as
|
|
necessary.
|
|
|
|
9. Remove the `` and '' from options.
|
|
|
|
``<option>...</option>'' becomes <option>...</option>
|
|
|
|
10. Converted appropriate occurences of
|
|
|
|
<emphasis remap=tt>...</emphasis>
|
|
|
|
to
|
|
|
|
<filename>...</filename>
|
|
|
|
11. As above, but changing to <command>...</command>. Modified the Emacs
|
|
regexp slightly to search for
|
|
|
|
<emphasis[ \n\t]+remap=tt>\([^<]+\)</emphasis>
|
|
|
|
which matches elements spread over two lines.
|
|
|
|
12. Looked for explanatory notes in the text (typically prefixed by "note",
|
|
"Note:" or "<para><blockquote><para><emphasis role=bf>Note:</emphasis>"
|
|
and marked them up as 'note' elements.
|
|
|
|
This change involves markup changes *and* text changes. This is because
|
|
text like
|
|
|
|
<para>Note: The foo file is only used once, and can be deleted.</para>
|
|
|
|
became
|
|
|
|
<note>
|
|
<para>The foo file is only used once, and can be deleted.</para>
|
|
</note>
|
|
|
|
13. Look for text marked up as an acronym and alter as necessary. The
|
|
automatic conversion tended to mark any string of upper case letters
|
|
as acronyms, which is not always right.
|
|
|
|
The difference between an <acronym> and <abbrev> is subtle -- in a
|
|
nutshell, an acronym is pronounceble, an abbreviation isn't.
|
|
|
|
14. Another sweep for "`" and "``" (and their closing equivalents),
|
|
replacing them with the right markup (since most of the time they're
|
|
used to 'delimit' filenames or options from the surrounding text.
|
|
|
|
The only quotes left now are either around items for which I'm not 100%
|
|
sure which element to use, or in literal blocks as part of commands the
|
|
user types in.
|
|
|
|
15. Look for double quotes not used in attributes and alter to the
|
|
appropriate markup (or remove as necessary). A useful Emacs regexp
|
|
when doing the search replace is
|
|
|
|
\([^=]\)"\([^ \t\n]+[^"]+\)"\([^>]\)
|
|
|
|
and replace with
|
|
|
|
\1<quote>\2</quote>\3
|
|
|
|
or whatever the replacement element is.
|
|
|
|
Converted '"' into <quote>, <literal>, <command>, <application>,
|
|
<filename>, <emphasis>, <option> or removed it as neccessary.
|
|
|
|
16. A general cleanup to get it to validate. The original conversion
|
|
process left some <sect?>'s with just a title, which is invalid,
|
|
they must contain a <para> or similar element.
|
|
|
|
Also fixed a couple of typos in the tags. The document should now
|
|
validate, save for the undefined external entitites.
|
|
|
|
17. Created a new FreeBSD Doc. Project DTD in the ../../sgml directory.
|
|
Changed the declaration at the top of the handbook to use this new
|
|
DTD.
|
|
|
|
18. Yet more things that should be filenames marked up as such.
|
|
|
|
19. Use the new <hostid> element to mark up hostnames, IP addresses and
|
|
such. The markup choice is as follows.
|
|
|
|
<hostid>...</hostid> is a simple hostname.
|
|
<hostid role="ipaddr">...</hostid> is an IP address.
|
|
<hostid role="domainname">...</hostid> is a domain name.
|
|
<hostid role="fqdn">...</hostid> is a fully qualified domain name.
|
|
<hostid role="netmask">...</hostid> is a netmask.
|
|
<hostid role="mac">...</hostid> is a network card MAC address.
|
|
|
|
These might migrate to being separate elements in the future. However,
|
|
if they do then changing the markup can be done automatically.
|
|
|
|
20. Convert <emphasis remap=it>...</emphasis> to plain <emphasis> in some
|
|
cases. I'm pretty certain that all the <emphasis>...</emphasis>
|
|
markup is correct now, which makes searching for markup that does
|
|
need changing much easier.
|
|
|
|
21. Replace the last few occurences of curly quoted items (`` and '')
|
|
with the right markup.
|
|
|
|
22. Almost the last lot. I missed a diff I'd done at home. There's a
|
|
section in the handbook that talks about kernel options, where the
|
|
quoted options are quoted with `` and ''. Fix them so that standard
|
|
double quotes are used (so they can cut-n-pasted).
|
|
|
|
23. Start working on <emphasis remap=bf>...</emphasis>
|
|
|
|
Convert the first lot to <command>...</command>
|
|
|
|
24. Fixed manual page references to use the right markup, which is
|
|
|
|
<citerefentry>
|
|
<refentrytitle>page_name</refentrytitle>
|
|
<manvolnum>number</manvolnum>
|
|
</citerefentry>
|
|
|
|
Did this with a regexp search for
|
|
|
|
\([a-z-_\.]+\)(\([1-9]\))
|
|
|
|
and replacing with
|
|
|
|
<citerefentry><refentrytitle>\1</refentrytitle><manvolnum>\2</manvolnum>
|
|
|
|
Since most of the page references had <command>, <emphasis>, or
|
|
<ulink> elements wrapped around them, you then have to sweep through the
|
|
file looking for "><cite" and using C-c C-k to kill the markup
|
|
immediately before and after.
|
|
|
|
25. <emphasis remap=..>...</emphasis> -> <literal>...</literal>
|
|
|
|
26. <emphasis remap=..>...</emphasis> -> <makevar>...</makevar>
|
|
|
|
27. <emphasis remap=..>...</emphasis> -> <maketarget>...</maketarget>
|
|
|
|
28. Fix up some uses of <screen> and the use of <emphasis> elements within
|
|
and near it. Most of the time this consisted of replacing the <emphasis>
|
|
with <replaceable> or <userinput>.
|
|
|
|
29. Fixed up more references to manpages that used <ulink> to use
|
|
<citerefentry>. These were missed at step 24 because they didn't
|
|
include a section number. No references to man.cgi now exist in
|
|
handbook.sgml.
|
|
|
|
30. Create two entities, prompt.root and prompt.user. Use these anywhere
|
|
the OS prompt is displayed, depending on whether the user should be
|
|
a normal user or root.
|
|
|
|
Also markup other prompts (e.g., the DOS prompt C:\> that occurs in
|
|
some places) as <prompt>s.
|
|
|
|
31. Reviewing the use of <informalexample> and <screen>.
|
|
|
|
In some cases <informalexample> wasn't appropriate, and the markup was
|
|
changed to <programlisting> or other.
|
|
|
|
In some cases there were spurious <para> elements before and after the
|
|
<informalexample>. These were removed.
|
|
|
|
Reformatted text within <screen> elements because the whitespace *is*
|
|
significant.
|
|
|
|
Added <prompt> and <userinput> elements within <screen> where necessary.
|
|
|
|
If I spotted inappropriate use of markup within the immediate vicinity
|
|
of the <informalexample> elements then I fixed that (mostly the use of
|
|
<emphasis remap="...">).
|
|
|
|
This is part one of these changes -- there's a load of them, and this
|
|
goes up to line 11,284 or thereabouts, roughly one third of the way
|
|
through.
|
|
|
|
32. Continuing the work from the previous commit. This takes us up to
|
|
the beginning of Chapter 16, "The Cutting Edge".
|
|
|
|
33. Finished sweep. If it's white space sensitive (examples, program
|
|
listings, PGP signatures...) the white space is now correct. I may
|
|
have missed one or two on the way, I'll catch them later.
|
|
|
|
34. Removed repeated spaces from the end of stops, . , ! ? : ;
|
|
|
|
Some parts of the handbook had single spaces after stops, some had double
|
|
or triple. While the typographical convention for monospaced fonts may
|
|
be to use double spaces after them, that doesn't apply here. TeX will
|
|
ignore them, as will HTML. If we need for a plain text version of the
|
|
Handbook then the stylesheet / conversion mechanism can insert them
|
|
as necessary.
|
|
|
|
Searching for
|
|
|
|
_\([;:!\.\?,]\) +_
|
|
|
|
in Emacs and replacing with
|
|
|
|
_\1 _
|
|
|
|
(ignore the '_', they're just to delineate the regexps) does the job
|
|
quite nicely. However, you can't do this everywhere, since some of the
|
|
double spaces might be in program listings or other literal sections
|
|
(e.g., the BSD Copyright), so you need to sit and bounce on the 'y' or
|
|
'n' key as appropriate for each occurence of a stop.
|
|
|
|
35. Some paragraphs have leading space(s). E.g.,
|
|
|
|
<para> There is some leading space here.</para>
|
|
|
|
Get rid of it, doing an emacs search/replace for
|
|
|
|
<para> +\([^ ]\)
|
|
|
|
and replacing with
|
|
|
|
<para>\1
|
|
|
|
This can be done globally.
|
|
|
|
36. A lot of </para> tags have leading whitespace before them. Remove it. Do
|
|
this (in Emacs) by searching for
|
|
|
|
\s-+</para>
|
|
|
|
and replacing with
|
|
|
|
</para>
|
|
|
|
Do this for all occurences *except* where the element immediately before
|
|
the </para> is one of <itemizedlist>, <orderedlist>, <variablelist>,
|
|
<procedure>. The <para>...</para> wrapping these elements is mostly
|
|
redundant, and will be removed later.
|
|
|
|
37. With the agreement of the Japanese team, a change in the way I'm doing
|
|
things.
|
|
|
|
I'm now working through from the beginning of the handbook to end,
|
|
correcting as I go. I'll commit in chunks of 5,000 lines (or
|
|
thereabouts).
|
|
|
|
Most of the changes fall into the following categories.
|
|
|
|
* <emphasis remap=bf> --> <emphasis>
|
|
|
|
* Spurious <para>s around <*list>s deleted (but not reformatted)
|
|
|
|
"C-c -" in Emacs SGML mode (when the point is on an element starting
|
|
or end tags) will delete that element's starting or end tags.
|
|
|
|
* Marked smileys with <!-- smiley --> for possible future deletion
|
|
|
|
* Deleting <emphasis>, around
|
|
|
|
<term><emphasis>...</emphasis></term> -> <term>...</term>
|
|
|
|
* Fine tuning markup choices in some cases
|
|
|
|
- <filename>C:</filename> -> <devicename>C:</devicename>
|
|
|
|
* Extra <note>s here and there.
|
|
|
|
* Some <*list>s to <procedure> (and <listitem>s to <step>)
|
|
|
|
* ASCII emphasis converted to <emphasis>
|
|
|
|
i.e., do it like *this* -> do it like <emphasis>this</emphasis>
|
|
|
|
* <symbol> -> <replaceable>
|
|
|
|
There are very few whitespace changes, although a few have probably
|
|
cropped up. The vast majority of the whitespace changes will happen in
|
|
one megacommit, hopefully some time next week.
|
|
|
|
38. As above, to line 11490.
|
|
|
|
39. . . . to line 15126 . . .
|
|
|
|
40. . . . to line 20370 . . .
|
|
|
|
41. . . . to line 24997 . . .
|
|
|
|
42. Brief interruption, small changes to keep it validating.
|
|
|
|
43. . . . to line 30118 . . .
|
|
|
|
44. . . . to line 35973 . . .
|
|
|
|
45. . . . to end of file!
|
|
|
|
46. <emphasis remap=..> -> <emphasis>
|
|
<literal remap=..> -> <literal>
|
|
<command remap=..> -> <command>
|
|
|
|
Or deleted <emphasis ..> altogether in some cases.
|
|
|
|
More redundant <para>..</para>'s removed.
|
|
|
|
47. Removed white space after <title> and before </title>. Use two search and
|
|
replaces
|
|
|
|
\s-+</title> -> </title>
|
|
<title>\s-+ -> <title>
|
|
|
|
48. Use the correct ISO entities for dashes. According to a TeX manual I have
|
|
kicking around here,
|
|
|
|
daughter-in-law, X-rated = hyphen = -
|
|
pages 13--67 = en-dash = –
|
|
yes---or no? = em-dash = —
|
|
0, 1, and -1 = minus sign = −
|
|
|
|
49. Step 1. Find <ulink url="mailto:...">...</ulink> and change the <ulink>
|
|
to <email>.
|
|
|
|
Can't do this globally. Some of the links are odd (i.e,. the link
|
|
is not their e-mail address but is their name, eg
|
|
|
|
<ulink url="mailto:nik@freebsd.org">Nik Clayton</ulink>
|
|
|
|
which would turn to
|
|
|
|
<email>Nik Clayton</email>
|
|
|
|
which isn't very useful. Ignore these ones, and do the others.
|
|
(i.e., the ones that look like
|
|
|
|
<ulink url="mailto:nik@freebsd.org">nik@freebsd.org</ulink>
|
|
|
|
This Emacs regexp does the job.
|
|
|
|
Search for: <ulink\s-+url="mailto[^>]+>\([^<]+\)</ulink>
|
|
Replace with: <email>\1</email>
|
|
|
|
Step 2. A lot of the <email>...</email> sets will have '<' and '>'
|
|
embedded in them (as entities). These can be removed, since the stylesheet
|
|
will add them;
|
|
|
|
Search for: <email><\([^&]+\)></email>
|
|
Replace with: <email>\1</email>
|
|
|
|
Step 3. The trick now is to turn
|
|
|
|
<ulink url="mailto:nik@freebsd.org">Nik Clayton</ulink>
|
|
|
|
into
|
|
|
|
Nik Clayton <email>nik@freebsd.org</email>
|
|
|
|
This step could (possibly) have been done first, and then steps
|
|
1 and 2 could be done globally. I haven't done this because of
|
|
concerns about the ordering of names within languages. This
|
|
transformation is fairly simple in English, I've no idea what
|
|
it's like in Japanese.
|
|
|
|
Search for: <ulink\s-+url="mailto:\([^"]+\)">\([^<]+\)</ulink>
|
|
Replace with: \2 <email>\1</email>
|
|
|
|
Step 4. Remove leading and trailing spaces that may have slipped in
|
|
|
|
Search for: <email>\s-+
|
|
Replace with: <email>
|
|
|
|
Search for: \s-+</email>
|
|
Replace with: </email>
|
|
|
|
50. <abbrev> -> <acronym> in some cases.
|
|
|
|
51. Fixup erroneous or extraneous <para>...</para> elements.
|
|
|
|
52. <foo
|
|
id="bar">
|
|
...
|
|
|
|
changed to
|
|
|
|
<foo id="bar">
|
|
...
|
|
|
|
Before people complain that "Hang on, now you can't find out what the
|
|
allocated ID values are with a simple 'grep'" I'll say that's not a
|
|
problem. I plan to introduce a target in the Makefile (probably
|
|
something like 'handbook.id' which will automatically generate this
|
|
list doing a proper SGML parse.
|
|
|
|
53. Where kernel options ("options INET" for example) are listed, wrap them
|
|
in <literal>...</literal>.
|
|
|
|
54. <quote> -> ”
|
|
</quote> -> “
|
|
|
|
55. * \s-+</programlisting> replaced with </programlisting> globally.
|
|
|
|
* Fixup use of <symbol> with more appropriate element
|
|
|
|
* Fixup wrong occurence of 'dollar'Id'dollar'
|
|
|
|
* Fixup references to 'make' variables, and strim off the surrounding
|
|
${...}, it can be added back by the stylesheet at presentation
|
|
time.
|
|
|
|
* More insertions or deletions of <para>...</para> as appropriate.
|
|
|
|
56. Add values for the 'id' attribute for those <chapters> that don't
|
|
have them.
|
|
|
|
57. Split the Handbook into individual files, called chapter.sgml, stored
|
|
in directories named after the value of the id attribute on the
|
|
chapter.
|
|
|
|
Added chapters.ent, which lists the entities used to refer to the
|
|
chapters. Update handbook.sgml to refer to this file and use entity
|
|
references to pull everything in.
|
|
|
|
58. Added chapter.decl, which contains a declaration for a DocBook
|
|
chapter.
|
|
|
|
Added
|
|
|
|
<!--
|
|
Local Variables:
|
|
mode: sgml
|
|
sgml-declaration: "../chapter.decl"
|
|
sgml-indent-data: t
|
|
sgml-omittag: nil
|
|
sgml-shorttag: nil
|
|
sgml-always-quote-attributes: t
|
|
sgml-minimize-attributes: max
|
|
sgml-parent-document: ("../handbook.sgml" "part" "chapter")
|
|
End:
|
|
-->
|
|
|
|
to the bottom of each chapter.sgml file, so Emacs can do something
|
|
useful with it. This uses the new chapter.decl file.
|
|
|
|
59. Add similar local variables (but not sgml-declaration or
|
|
sgml-parent-document to handbook.sgml.
|
|
|
|
60. Added authors.ent. This is v1.93 of doc/handbook/authors.sgml, with
|
|
these changes;
|
|
|
|
1. Remove '<tt>' and '</tt>'.
|
|
|
|
2. Search/replace
|
|
|
|
\s-+<htmlurl url='mailto:\([^']+\)'\s-+name='[^']+'>
|
|
|
|
with
|
|
|
|
<email>\1</email>
|
|
|
|
(there's a leading space before <email>)
|
|
|
|
Added an ENTITY line to handbook.sgml to use the new entities.
|
|
|
|
61. Removed the prompt.* entities from handbook.sgml and added them to
|
|
freebsd.dtd.
|
|
|
|
62. Do step 60, but for version 1.8 of doc/handbook/lists.sgml. Call
|
|
the transformed file mailing-lists.ent, and add an ENTITY line for
|
|
it to handbook.sgml
|
|
|
|
63. Add <!ENTITY rel.current CDATA "2.2.6"> to handbook.sgml, from
|
|
r1.83 of doc/handbook/handbook.sgml.
|
|
|
|
64. Fix line 125 of kerneldebug/chapter.sgml, & -> &
|
|
|