<filename>...</filename> As with most of these conversions, I won't have got all of them in this first pass.
200 lines
6.9 KiB
Text
200 lines
6.9 KiB
Text
This will be the location of the DocBook version of the FreeBSD Handbook,
|
|
which will eventually obsolete the version currently in doc/handbook/.
|
|
|
|
Interested parties should examine
|
|
|
|
<URL:http://www.nothing-going-on.demon.co.uk/FreeBSD/docbook-migration.html>
|
|
|
|
and get in touch with Nik Clayton (either to nik@FreeBSD.ORG or via the
|
|
FreeBSD-doc mailing list) if they have specific questions.
|
|
|
|
All the scripts mentioned here can also be downloaded by doing to
|
|
|
|
<URL:http://www.freebsd.org/~nik/script_name>
|
|
|
|
for example,
|
|
|
|
<URL:http://www.freebsd.org/~nik/entity-cdata.pl>
|
|
|
|
|
|
------------------------------------------------------------------------
|
|
The Handbook is midway through the conversion process. It will almost
|
|
certainly not convert to other formats cleanly
|
|
------------------------------------------------------------------------
|
|
|
|
|
|
Actions
|
|
|
|
This list explains what's been done so far, so the Japanese team can
|
|
track my changes. All actions took place on freefall.
|
|
|
|
1. Initial conversion to DocBook
|
|
|
|
Checked out a copy of the doc repository to ~/cvs/. Then used 2 scripts
|
|
to convert the handbook to its initial DocBook format. The 2 scripts are
|
|
2docbook.sh and entity-cdata.pl, both of which can be found in ~nik/bin/.
|
|
|
|
2docbook.sh calls entity-cdata.pl as necessary.
|
|
|
|
% cd ~/cvs/doc/handbook
|
|
% 2docbook.sh
|
|
|
|
This created handbook-db.sgml in ~/cvs/doc/handbook. This file contains
|
|
syntactically valid (but quite ugly) SGML. This file was then moved to
|
|
the doc/en/handbook directory and renamed to handbook.sgml. The
|
|
conversion process left a few spurious changes in the old handbook files
|
|
which I don't want to commit, so I removed them and updated the
|
|
repository.
|
|
|
|
The new file was then committed.
|
|
|
|
% mv handbook-db.sgml ~/cvs/doc/en/handbook/handbook.sgml
|
|
% rm *.sgml
|
|
% cvs update
|
|
% cd ~/cvs/doc/en/handbook
|
|
% cvs add handbook.sgml
|
|
% cvs commit
|
|
|
|
2. handbook.sgml was loaded into XEmacs 20.30 (straight from the ports
|
|
collection) and sgml-mode was turned on. My .emacs file contains the
|
|
following hook:
|
|
|
|
(add-hook 'sgml-mode-hook
|
|
(function
|
|
(lambda()
|
|
(setq sgml-omittag nil)
|
|
(setq sgml-indent-data t))))
|
|
|
|
This configures psgml to not omit any tags that the DTD lists as
|
|
omittable, and to indent data in the same way that markup is indented.
|
|
|
|
The following function was pasted into the *scratch* buffer, and then
|
|
"M-x eval-current-buffer" was run.
|
|
|
|
(defun sgml-indent-buffer
|
|
"Indents the current buffer, one line at a time"
|
|
(interactive "*")
|
|
(save-excursion
|
|
(goto-char (point-min))
|
|
(while (= (forward-line 1) 0)
|
|
(sgml-indent-or-tab))))
|
|
|
|
In the handbook.sgml buffer, the point was placed on the first character
|
|
of the first line, and "M-x sgml-indent-buffer" was run.
|
|
|
|
The changes were then committed.
|
|
|
|
3. Refilled the Handbook -- this rewraps the lines as necessary. This was
|
|
done by placing the point on the first <book> tag, and running
|
|
"M-x sgml-fill-element".
|
|
|
|
This takes about 10 minutes to run.
|
|
|
|
It also reformats some sections that should not be reformatted, including
|
|
examples of text on the screen, PGP key blocks and so on. They will be
|
|
fixed in a later commit.
|
|
|
|
4. Removed spurious markup. The conversion process has left a lot of
|
|
|
|
<para></para>
|
|
|
|
entries in the handbook, and they need to be removed. There are a
|
|
number of places this happens, and the rules are slightly different
|
|
each time.
|
|
|
|
For example,
|
|
|
|
====================================================================
|
|
Original markup Changed markup
|
|
--------------------------------------------------------------
|
|
|
|
<listitem> <listitem>
|
|
<para></para> <para>A real paragraph</para>
|
|
|
|
<para>A real paragraph</para>
|
|
|
|
--------------------------------------------------------------
|
|
|
|
<para>A real paragraph</para> <para>A real paragraph</para>
|
|
</listitem>
|
|
<para></para>
|
|
|
|
</listitem>
|
|
|
|
--------------------------------------------------------------
|
|
|
|
<para>A real paragraph</para> <para>A real paragraph</para>
|
|
|
|
<para></para> <para>Another paragraph</para>
|
|
|
|
<para>Another paragraph</para>
|
|
====================================================================
|
|
|
|
Notice the last example. It's not enough to simply put together a
|
|
regexp that matches (all whitespace)<para></para>(allwhitespace)
|
|
and removes it, since that would leave you with
|
|
|
|
<para>A realparagraph</para>
|
|
<para>Another paragraph</para>
|
|
|
|
In the end I got bored of trying to write this using Emacs regexps,
|
|
and knocked together a quick Perl script to do it. It's ~nik/bin/para.pl
|
|
on freefall.
|
|
|
|
5. Got halfway through looking for filenames, and marking them up as such.
|
|
|
|
There are a lot ( :-( ) of filenames in the Handbook. The conversion
|
|
process did a pretty good job of marking them as <filename>...</filename>
|
|
but it wasn't perfect.
|
|
|
|
I'm halfway through (line 16704) going through the Handbook, eyeballing
|
|
each line and changing things like <emphasis remap="tt">...</emphasis>
|
|
to <filename>...</filename> where appropriate.
|
|
|
|
The remainder will follow tomorrow evening.
|
|
|
|
6. Finished the first sweep marking up filenames.
|
|
|
|
If it looked like a filename (but wasn't a command for the user to type
|
|
in) it's been marked up with <filename>...</filename>.
|
|
|
|
If it had already been marked up as <filename>...</filename> but wasn't
|
|
a filename, the markup was changed to <emphasis remap="tt">...</emphasis>
|
|
|
|
PSGML and Xemacs are very useful for this, using "C-c =" to change
|
|
existing markup.
|
|
|
|
Synchronising with changes 5 and 6 will involve examining the diffs
|
|
and changing by hand I'm afraid. It could not be automated.
|
|
|
|
7. Start replacing `` and '' with <quote> and </quote>. Don't change
|
|
things indiscriminately, but look at the context to see if the change is
|
|
appropriate. There are still many `` and '' occurences which should be
|
|
changed to some other element.
|
|
|
|
This was done using a regexp search/replace, looking for the regexp
|
|
|
|
``\([^']\)''
|
|
|
|
and replacing with
|
|
|
|
<query>\1</query>
|
|
|
|
Not all the `` '' pairs were changed, since in some cases they delimit
|
|
filenames, options and so on.
|
|
|
|
8. As with change 7, but replace with <command> ... </command> as
|
|
necessary.
|
|
|
|
9. Remove the `` and '' from options.
|
|
|
|
``<option>...</option>'' becomes <option>...</option>
|
|
|
|
10. Converted appropriate occurences of
|
|
|
|
<emphasis remap=tt>...</emphasis>
|
|
|
|
to
|
|
|
|
<filename>...</filename>
|
|
|