Edit for clarity and style. Try to persuade the hippo and the pogo stick

that they are not good for each other.
This commit is contained in:
Warren Block 2013-07-12 03:21:55 +00:00
parent 9ec2aca708
commit 5bbb3c8791
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=42257

View file

@ -34,10 +34,10 @@
<chapter id="xml-primer">
<title>XML Primer</title>
<para>The majority of FDP documentation is written in applications
of XML. This chapter explains exactly what that means, how to
read and understand the source to the documentation, and the sort
of XML tricks you will see used in the documentation.</para>
<para>Most FDP documentation is written with markup languages based
on <acronym>XML</acronym>. This chapter explains what that means, how to
read and understand the documentation source, and the
<acronym>XML</acronym> techniques used.</para>
<para>Portions of this section were inspired by Mark Galassi's
<ulink
@ -47,31 +47,31 @@
<sect1 id="xml-primer-overview">
<title>Overview</title>
<para>Way back when, electronic text was simple to deal with.
Admittedly, you had to know which character set your document
was written in (ASCII, EBCDIC, or one of a number of others) but
<para>In the original days of computers, electronic text was simple.
There were a few character sets like <acronym>ASCII</acronym> or <acronym>EBCDIC</acronym>, but
that was about it. Text was text, and what you saw really was
what you got. No frills, no formatting, no intelligence.</para>
<para>Inevitably, this was not enough. Once you have text in a
machine-usable format, you expect machines to be able to use it
and manipulate it intelligently. You would like to indicate
<para>Inevitably, this was not enough. When text is in a
machine-usable format, machines are expected to be able to use
and manipulate it intelligently. Authors want to indicate
that certain phrases should be emphasized, or added to a
glossary, or be hyperlinks. You might want filenames to be
glossary, or made into hyperlinks. Filenames could be
shown in a <quote>typewriter</quote> style font for viewing on
screen, but as <quote>italics</quote> when printed, or any of a
myriad of other options for presentation.</para>
<para>It was once hoped that Artificial Intelligence (AI) would
make this easy. Your computer would read in the document and
make this easy. The computer would read the document and
automatically identify key phrases, filenames, text that the
reader should type in, examples, and more. Unfortunately, real
life has not happened quite like that, and our computers require
some assistance before they can meaningfully process our
life has not happened quite like that, and computers still require
assistance before they can meaningfully process
text.</para>
<para>More precisely, they need help identifying what is what.
Let's look at this text:</para>
Consider this text:</para>
<blockquote>
<para>To remove <filename>/tmp/foo</filename> use
@ -100,42 +100,40 @@
document must typically be done by a person&mdash;after all, if
computers could recognize the text sufficiently well to add the
markup then there would be no need to add it in the first place.
This <emphasis>increases the cost</emphasis> (i.e., the effort
This <emphasis>increases the cost</emphasis> (the effort
required) to create the document.</para>
<para>The previous example is actually represented in this
document like this:</para>
<programlisting><![CDATA[<para>To remove <filename>/tmp/foo</filename> use &man.rm.1;.</para>
<programlisting><sgmltag class="starttag">para</sgmltag>To remove <sgmltag class="starttag">filename</sgmltag>/tmp/foo<sgmltag class="endtag">filename</sgmltag> use &man.rm.1;.<sgmltag class="endtag">para</sgmltag>
<screen>&prompt.user; <userinput>rm /tmp/foo</userinput></screen>]]></programlisting>
<sgmltag class="starttag">screen</sgmltag>&prompt.user; <sgmltag class="starttag">userinput</sgmltag>rm /tmp/foo<sgmltag class="endtag">userinput</sgmltag><sgmltag class="endtag">screen</sgmltag></programlisting>
<para>As you can see, the markup is clearly separate from the
<para>The markup is clearly separate from the
content.</para>
<para>Obviously, if you are going to use markup you need to define
what your markup means, and how it should be interpreted. You
will need a markup language that you can follow when marking up
your documents.</para>
<para>Markup languages define what
what the markup means and how it should be interpreted.</para>
<para>Of course, one markup language might not be enough. A
markup language for technical documentation has very different
requirements than a markup language that was to be used for
requirements than a markup language that is intended for
cookery recipes. This, in turn, would be very different from a
markup language used to describe poetry. What you really need
is a first language that you use to write these other markup
markup language used to describe poetry. What is really needed
is a first language used to write these other markup
languages. A <emphasis>meta markup language</emphasis>.</para>
<para>This is exactly what the eXtensible Markup
Language (XML) is. Many markup languages have been written in
XML, including the two most used by the FDP, XHTML and
Language (<acronym>XML</acronym>) is. Many markup languages have been written in
<acronym>XML</acronym>, including the two most used by the <acronym>FDP</acronym>, <acronym>XHTML</acronym> and
DocBook.</para>
<para>Each language definition is more properly called a grammar,
vocabulary, schema or Document Type Definition (DTD). There
are various languages to specify an XML grammar, for example,
DTD (yes, it also means the specification language itself),
XML Schema (XSD) or RELANG NG. The schema specifies the name
vocabulary, schema or Document Type Definition (<acronym>DTD</acronym>). There
are various languages to specify an <acronym>XML</acronym> grammar, for example,
<acronym>DTD</acronym> (yes, it also means the specification language itself),
<acronym>XML</acronym> Schema (<acronym>XSD</acronym>) or <acronym>RELANG NG</acronym>. The schema specifies the name
of the elements that can be used, what order they appear in (and
whether some markup can be used inside other markup) and related
information.</para>
@ -144,7 +142,7 @@
<emphasis>complete</emphasis> specification of all the elements
that are allowed to appear, the order in which they should
appear, which elements are mandatory, which are optional, and so
forth. This makes it possible to write an XML
forth. This makes it possible to write an <acronym>XML</acronym>
<emphasis>parser</emphasis> which reads in both the schema and a
document which claims to conform to the schema. The parser can
then confirm whether or not all the elements required by the vocabulary
@ -155,34 +153,34 @@
<note>
<para>This processing simply confirms that the choice of
elements, their ordering, and so on, conforms to that listed
in the grammar. It does <emphasis>not</emphasis> check that you
have used <emphasis>appropriate</emphasis> markup for the
content. If you tried to mark up all the filenames in your
document as function names, the parser would not flag this as
an error (assuming, of course, that your schema defines elements
in the grammar. It does <emphasis>not</emphasis> check whether
<emphasis>appropriate</emphasis> markup has been used for the
content. If all the filenames in a
document were marked up as function names, the parser would not flag this as
an error (assuming, of course, that the schema defines elements
for filenames and functions, and that they are allowed to
appear in the same place).</para>
</note>
<para>It is likely that most of your contributions to the
Documentation Project will consist of content marked up in
either XHTML or DocBook, rather than alterations to the schemas.
For this reason this book will not touch on how to write a
<para>It is likely that most contributions to the
Documentation Project will be content marked up in
either <acronym>XHTML</acronym> or DocBook, rather than alterations to the schemas.
For this reason, this book will not touch on how to write a
vocabulary.</para>
</sect1>
<sect1 id="xml-primer-elements">
<title>Elements, Tags, and Attributes</title>
<para>All the vocabularies written in XML share certain characteristics.
This is hardly surprising, as the philosophy behind XML will
<para>All the vocabularies written in <acronym>XML</acronym> share certain characteristics.
This is hardly surprising, as the philosophy behind <acronym>XML</acronym> will
inevitably show through. One of the most obvious manifestations
of this philosophy is that of <emphasis>content</emphasis> and
<emphasis>elements</emphasis>.</para>
<para>Your documentation (whether it is a single web page, or a
lengthy book) is considered to consist of content. This content
is then divided (and further subdivided) into elements. The
<para>Documentation, whether it is a single web page, or a
lengthy book, is considered to consist of content. This content
is then divided and further subdivided into elements. The
purpose of adding markup is to name and identify the boundaries
of these elements for further processing.</para>
@ -195,21 +193,21 @@
that was direct speech, or the name of a character in the
story.</para>
<para>You might like to think of this as <quote>chunking</quote>
content. At the very top level you have one chunk, the book.
Look a little deeper, and you have more chunks, the individual
<para>It may be helpful to think of this as <quote>chunking</quote>
content. At the very top level is one chunk, the book.
Look a little deeper, and there are more chunks, the individual
chapters. These are chunked further into paragraphs, footnotes,
character names, and so on.</para>
<para>Notice how you can make this differentiation between
different elements of the content without resorting to any XML
terms. It really is surprisingly straightforward. You could do
this with a highlighter pen and a printout of the book, using
<para>Notice how this differentiation between
different elements of the content can be made without resorting to any <acronym>XML</acronym>
terms. It really is surprisingly straightforward. This could be done
with a highlighter pen and a printout of the book, using
different colors to indicate different chunks of content.</para>
<para>Of course, we do not have an electronic highlighter pen, so
we need some other way of indicating which element each piece of
content belongs to. In languages written in XML (XHTML,
content belongs to. In languages written in <acronym>XML</acronym> (<acronym>XHTML</acronym>,
DocBook, et al) this is done by means of
<emphasis>tags</emphasis>.</para>
@ -223,59 +221,54 @@
<para>For an element called
<replaceable>element-name</replaceable> the start tag will
normally look like
<sgmltag><replaceable>element-name</replaceable></sgmltag>. The
<sgmltag class="starttag"><replaceable>element-name</replaceable></sgmltag>. The
corresponding closing tag for this element is
<sgmltag>/<replaceable>element-name</replaceable></sgmltag>.</para>
<sgmltag class="endtag"><replaceable>element-name</replaceable></sgmltag>.</para>
<example>
<title>Using an Element (Start and End Tags)</title>
<para>XHTML has an element for indicating that the content
<para><acronym>XHTML</acronym> has an element for indicating that the content
enclosed by the element is a paragraph, called
<sgmltag>p</sgmltag>.</para>
<programlisting><![CDATA[<p>This is a paragraph. It starts with the start tag for
<programlisting><sgmltag class="starttag">p</sgmltag>This is a paragraph. It starts with the start tag for
the 'p' element, and it will end with the end tag for the 'p'
element.</p>
element.<sgmltag class="endtag">p</sgmltag>
<p>This is another paragraph. But this one is much shorter.</p>]]></programlisting>
<sgmltag class="starttag">p</sgmltag>This is another paragraph. But this one is much shorter.<sgmltag class="endtag">p</sgmltag></programlisting>
</example>
<para>Some elements have no
content. For example, in XHTML you can indicate that you want a
horizontal line to appear in the document.</para>
<para>For such elements, that have no content at all, XML introduced
a shorthand form, which is ccompletely equivalent to the above
form:</para>
<programlisting><![CDATA[<hr/>]]></programlisting>
content. For example, in <acronym>XHTML</acronym>, a
horizontal line can be included in the document.
For these <quote>empty</quote> elements, <acronym>XML</acronym> introduced
a shorthand form that is completely equivalent to the two-tag
version:</para>
<example>
<title>Using an Element (Without Content)</title>
<title>Using an Element Without Content</title>
<para>XHTML has an element for indicating a horizontal rule,
<para><acronym>XHTML</acronym> has an element for indicating a horizontal rule,
called <sgmltag>hr</sgmltag>. This element does not wrap
content, so it looks like this.</para>
content, so it looks like this:</para>
<programlisting><![CDATA[<p>One paragraph.</p>
<hr></hr>
<programlisting><sgmltag class="starttag">p</sgmltag>One paragraph.<sgmltag class="endtag">p</sgmltag>
<sgmltag class="starttag">hr</sgmltag><sgmltag class="endtag">hr</sgmltag>
<p>This is another paragraph. A horizontal rule separates this
from the previous paragraph.</p>]]></programlisting>
<sgmltag class="starttag">p</sgmltag>This is another paragraph. A horizontal rule separates this
from the previous paragraph.<sgmltag class="endtag">p</sgmltag></programlisting>
<para>For such elements, that have no content at all, XML introduced
a shorthand form, which is ccompletely equivalent to the above
form:</para>
<para>The shorthand version consists of a single tag:</para>
<programlisting><![CDATA[<p>One paragraph.</p>
<hr/>
<programlisting><sgmltag class="starttag">p</sgmltag>One paragraph.<sgmltag class="endtag">p</sgmltag>
<sgmltag class="emptytag">hr</sgmltag>
<p>This is another paragraph. A horizontal rule separates this
from the previous paragraph.</p>]]></programlisting>
<sgmltag class="starttag">p</sgmltag>This is another paragraph. A horizontal rule separates this
from the previous paragraph.<sgmltag class="endtag">p</sgmltag></programlisting>
</example>
<para>If it is not obvious by now, elements can contain other
<para>As shown above, elements can contain other
elements. In the book example earlier, the book element
contained all the chapter elements, which in turn contained all
the paragraph elements, and so on.</para>
@ -283,11 +276,11 @@
<example>
<title>Elements within Elements; <sgmltag>em</sgmltag></title>
<programlisting><![CDATA[<p>This is a simple <em>paragraph</em> where some
of the <em>words</em> have been <em>emphasized</em>.</p>]]></programlisting>
<programlisting><sgmltag class="starttag">p</sgmltag>This is a simple <sgmltag class="starttag">em</sgmltag>paragraph<sgmltag class="endtag">em</sgmltag> where some
of the <sgmltag class="starttag">em</sgmltag>words<sgmltag class="endtag">em</sgmltag> have been <sgmltag class="starttag">em</sgmltag>emphasized<sgmltag class="endtag">em</sgmltag>.<sgmltag class="endtag">p</sgmltag></programlisting>
</example>
<para>The grammar will specify the rules detailing which elements can
<para>The grammar consists of rules that describe which elements can
contain other elements, and exactly what they can
contain.</para>
@ -298,10 +291,10 @@
<para>An element is a conceptual part of your document. An
element has a defined start and end. The tags mark where the
element starts and end.</para>
element starts and ends.</para>
<para>When this document (or anyone else knowledgeable about
XML) refers to <quote>the <sgmltag>p</sgmltag> tag</quote>
<acronym>XML</acronym>) refers to <quote>the <sgmltag class="starttag">p</sgmltag> tag</quote>
they mean the literal text consisting of the three characters
<literal>&lt;</literal>, <literal>p</literal>, and
<literal>&gt;</literal>. But the phrase <quote>the
@ -323,13 +316,13 @@
take the form
<literal><replaceable>attribute-name</replaceable>="<replaceable>attribute-value</replaceable>"</literal>.</para>
<para>In XHTML, the
<para>In <acronym>XHTML</acronym>, the
<sgmltag>p</sgmltag> element has an attribute called
<sgmltag>align</sgmltag>, which suggests an alignment
<sgmltag class="attribute">align</sgmltag>, which suggests an alignment
(justification) for the paragraph to the program displaying the
XHTML.</para>
<acronym>XHTML</acronym>.</para>
<para>The <literal>align</literal> attribute can take one of four
<para>The <sgmltag class="attribute">align</sgmltag> attribute can take one of four
defined values, <literal>left</literal>,
<literal>center</literal>, <literal>right</literal> and
<literal>justify</literal>. If the attribute is not specified
@ -338,59 +331,57 @@
<example>
<title>Using An Element with An Attribute</title>
<programlisting><![CDATA[<p align="left">The inclusion of the align attribute
on this paragraph was superfluous, since the default is left.</p>
<programlisting><sgmltag class="starttag">p align="left"</sgmltag>The inclusion of the align attribute
on this paragraph was superfluous, since the default is left.<sgmltag class="endtag">p</sgmltag>
<p align="center">This may appear in the center.</p>]]></programlisting>
<sgmltag class="starttag">p align="center"</sgmltag>This may appear in the center.<sgmltag class="endtag">p</sgmltag></programlisting>
</example>
<para>Some attributes will only take specific values, such as
<para>Some attributes only take specific values, such as
<literal>left</literal> or <literal>justify</literal>. Others
will allow you to enter anything you want.</para>
allow any value.</para>
<example>
<title>Single Quotes Around Attributes</title>
<programlisting><![CDATA[<p align='right'>I am on the right!</p>]]></programlisting>
<programlisting><sgmltag class="starttag">p align='right'</sgmltag>I am on the right!<sgmltag class="endtag">p</sgmltag></programlisting>
</example>
<para>XML requires you to quote each attribute value with either
single or double quotes. It is more habitual to use double quotes
but you may use single quotes, as well. Using single quotes is
practical if you want to include double quotes in the attribute
value.</para>
<para>Attribute values in <acronym>XML</acronym> must be enclosed
in either single or double quotes. Double quotes are
traditional. Single quotes are useful when the attribute
value contains double quotes.</para>
<para>The information on attributes, elements, and tags is stored
in XML catalogs. The various Documentation Project tools use
these catalog files to validate your work. The tools in
<filename role="package">textproc/docproj</filename> include a
variety of XML catalog files. The FreeBSD Documentation
Project includes its own set of catalog files. Your tools need
to know about both sorts of catalog files.</para>
<para>Information about attributes, elements, and tags is stored
in catalog files. The Documentation Project uses standard
DocBook catalogs and includes additional catalogs for
&os;-specific features. Paths to the catalog files are defined
in an environment variable so they can be found by the document
build tools.</para>
<sect2>
<title>For You to Do&hellip;</title>
<title>To Do&hellip;</title>
<para>In order to run the examples in this document you will
need to install some software on your system and ensure that
an environment variable is set correctly.</para>
<para>Before running the examples in this document,
application software must be installed and the catalog
environment variable configured.</para>
<procedure>
<step>
<para>Download and install
<para>Install
<filename role="package">textproc/docproj</filename> from
the FreeBSD ports system. This is a
<emphasis>meta-port</emphasis> that should download and
install all of the programs and supporting files that are
used by the Documentation Project.</para>
the &os; Ports Collection. This is a
<emphasis>meta-port</emphasis> that downloads and
installs the standard programs and supporting files needed
by the Documentation Project.</para>
</step>
<step>
<para>Add lines to your shell startup files to set
<envar>SGML_CATALOG_FILES</envar>. (If you are not working
on the English version of the documentation, you will want
to substitute the correct directory for your
language.)</para>
<para>Add lines to the shell startup files to set
<envar>SGML_CATALOG_FILES</envar>. When working on non-English
versions of the documentation, replace
<replaceable>en_US.ISO8859-1</replaceable> with the appropriate directory for the
target language.</para>
<example id="xml-primer-envars">
<title><filename>.profile</filename>, for &man.sh.1; and
@ -402,7 +393,7 @@ SGML_CATALOG_FILES=${SGML_ROOT}/docbook/4.1/catalog:$SGML_CATALOG_FILES
SGML_CATALOG_FILES=${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
SGML_CATALOG_FILES=${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
SGML_CATALOG_FILES=/usr/doc/share/xml/catalog:$SGML_CATALOG_FILES
SGML_CATALOG_FILES=/usr/doc/en_US.ISO8859-1/share/xml/catalog:$SGML_CATALOG_FILES
SGML_CATALOG_FILES=/usr/doc/<replaceable>en_US.ISO8859-1</replaceable>/share/xml/catalog:$SGML_CATALOG_FILES
export SGML_CATALOG_FILES</programlisting>
</example>
@ -416,11 +407,11 @@ setenv SGML_CATALOG_FILES ${SGML_ROOT}/docbook/4.1/catalog:$SGML_CATALOG_FILES
setenv SGML_CATALOG_FILES ${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
setenv SGML_CATALOG_FILES ${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
setenv SGML_CATALOG_FILES /usr/doc/share/xml/catalog:$SGML_CATALOG_FILES
setenv SGML_CATALOG_FILES /usr/doc/en_US.ISO8859-1/share/xml/catalog:$SGML_CATALOG_FILES</programlisting>
setenv SGML_CATALOG_FILES /usr/doc/<replaceable>en_US.ISO8859-1</replaceable>/share/xml/catalog:$SGML_CATALOG_FILES</programlisting>
</example>
<para>Then either log out, and log back in again, or run
those commands from the command line to set the variable
<para>After making these changes, either log out and log back in again, or run
the commands from the command line to set the variable
values.</para>
</step>
</procedure>
@ -428,67 +419,65 @@ setenv SGML_CATALOG_FILES /usr/doc/en_US.ISO8859-1/share/xml/catalog:$SGML_CATAL
<procedure>
<step>
<para>Create <filename>example.xml</filename>, and enter
the following text:</para>
this text:</para>
<programlisting><![CDATA[<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<programlisting><sgmltag class="starttag">!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"</sgmltag>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>An Example XHTML File</title>
</head>
<sgmltag class="starttag">html xmlns="http://www.w3.org/1999/xhtml"</sgmltag>
<sgmltag class="starttag">head</sgmltag>
<sgmltag class="starttag">title</sgmltag>An Example XHTML File<sgmltag class="endtag">title</sgmltag>
<sgmltag class="endtag">head</sgmltag>
<body>
<p>This is a paragraph containing some text.</p>
<sgmltag class="starttag">body</sgmltag>
<sgmltag class="starttag">p</sgmltag>This is a paragraph containing some text.<sgmltag class="endtag">p</sgmltag>
<p>This paragraph contains some more text.</p>
<sgmltag class="starttag">p</sgmltag>This paragraph contains some more text.<sgmltag class="endtag">p</sgmltag>
<p align="right">This paragraph might be right-justified.</p>
</body>
</html>]]></programlisting>
<sgmltag class="starttag">p align="right"</sgmltag>This paragraph might be right-justified.<sgmltag class="endtag">p</sgmltag>
<sgmltag class="endtag">body</sgmltag>
<sgmltag class="endtag">html</sgmltag></programlisting>
</step>
<step>
<para>Try to validate this file using an XML parser.</para>
<para>Try to validate this file using an <acronym>XML</acronym> parser.</para>
<para>Part of
<filename role="package">textproc/docproj</filename> is
<para><filename role="package">textproc/docproj</filename> includes
the <command>xmllint</command>
<link linkend="xml-primer-validating">validating
parser</link>.</para>
<para>Use <command>xmllint</command> in the following way to
check that your document is valid:</para>
<para>Use <command>xmllint</command> to
validate the document:</para>
<screen>&prompt.user; <userinput>xmllint --valid --noout example.xml</userinput></screen>
<para>As you will see, <command>xmllint</command> returns
without displaying any output. This means that your
<para><command>xmllint</command> returns
without displaying any output, showing that the
document validated successfully.</para>
</step>
<step>
<para>See what happens when required elements are omitted.
Try removing the <sgmltag>title</sgmltag> and
<sgmltag>/title</sgmltag> tags, and re-run the
Delete the line with the <sgmltag class="starttag">title</sgmltag> and
<sgmltag class="endtag">/title</sgmltag> tags, and re-run the
validation.</para>
<screen>&prompt.user; <userinput>xmllint --valid --noout example.xml</userinput>
example.xml:5: element head: validity error : Element head content does not follow the DTD, expecting ((script | style | meta | link | object | isindex)* , ((title , (script | style | meta | link | object | isindex)* , (base , (script | style | meta | link | object | isindex)*)?) | (base , (script | style | meta | link | object | isindex)* , title , (script | style | meta | link | object | isindex)*))), got ()</screen>
<para>This line tells you that the validation error comes from
<para>This shows that the validation error comes from
the <replaceable>fifth</replaceable> line of the
<replaceable>example.xml</replaceable> file and that the
content of the <sgmltag>head</sgmltag> is the part, which
does not follow the rules described by the XHTML grammar.</para>
content of the <sgmltag class="starttag">head</sgmltag> is the part which
does not follow the rules of the <acronym>XHTML</acronym> grammar.</para>
<para>Below this line <command>xmllint</command> will show you
the line where the error has been found and will also mark the
exact character position with a ^ sign.</para>
<para>Then <command>xmllint</command> shows
the line where the error was found and marks the
exact character position with a <literal>^</literal> sign.</para>
</step>
<step>
<para>Put the <sgmltag>title</sgmltag> element back
in.</para>
<para>Replace the <sgmltag>title</sgmltag> element.</para>
</step>
</procedure>
</sect2>
@ -497,17 +486,15 @@ example.xml:5: element head: validity error : Element head content does not foll
<sect1 id="xml-primer-doctype-declaration">
<title>The DOCTYPE Declaration</title>
<para>The beginning of each document that you write may specify
the name of the DTD that the document conforms to in case you use
the DTD specification language. Other specification languages, like
XML Schema and RELAX NG are not referred in the source document.
This DOCTYPE declaration serves the XML parsers so that they can
determine the DTD and ensure that the document does conform to it.</para>
<para>The beginning of each document can specify
the name of the <acronym>DTD</acronym> to which the document conforms.
This DOCTYPE declaration is used by <acronym>XML</acronym> parsers to
identify the <acronym>DTD</acronym> and ensure that the document does conform to it.</para>
<para>A typical declaration for a document written to conform with
version 1.0 of the XHTML DTD looks like this:</para>
version 1.0 of the <acronym>XHTML</acronym> <acronym>DTD</acronym> looks like this:</para>
<programlisting><![CDATA[<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">]]></programlisting>
<programlisting><sgmltag class="starttag">!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"</sgmltag></programlisting>
<para>That line contains a number of different components.</para>
@ -516,9 +503,8 @@ example.xml:5: element head: validity error : Element head content does not foll
<term><literal>&lt;!</literal></term>
<listitem>
<para>Is the <emphasis>indicator</emphasis> that indicates
that this is an XML declaration. This line is declaring
the document type.</para>
<para>The <emphasis>indicator</emphasis> shows
this is an <acronym>XML</acronym> declaration.</para>
</listitem>
</varlistentry>
@ -526,7 +512,7 @@ example.xml:5: element head: validity error : Element head content does not foll
<term><literal>DOCTYPE</literal></term>
<listitem>
<para>Shows that this is an XML declaration for the
<para>Shows that this is an <acronym>XML</acronym> declaration of the
document type.</para>
</listitem>
</varlistentry>
@ -545,18 +531,18 @@ example.xml:5: element head: validity error : Element head content does not foll
<term><literal>PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"</literal></term>
<listitem>
<para>Lists the Formal Public Identifier (FPI)
<para>Lists the Formal Public Identifier (<acronym>FPI</acronym>)
<indexterm>
<primary>Formal Public Identifier</primary>
</indexterm>
for the DTD that this document conforms to. Your XML
parser will use this to find the correct DTD when
for the <acronym>DTD</acronym> to which this document conforms. The <acronym>XML</acronym>
parser uses this to find the correct <acronym>DTD</acronym> when
processing this document.</para>
<para><literal>PUBLIC</literal> is not a part of the FPI,
but indicates to the XML processor how to find the DTD
referenced in the FPI. Other ways of telling the XML
parser how to find the DTD are shown <link
<para><literal>PUBLIC</literal> is not a part of the <acronym>FPI</acronym>,
but indicates to the <acronym>XML</acronym> processor how to find the <acronym>DTD</acronym>
referenced in the <acronym>FPI</acronym>. Other ways of telling the <acronym>XML</acronym>
parser how to find the <acronym>DTD</acronym> are shown <link
linkend="xml-primer-fpi-alternatives">later</link>.</para>
</listitem>
</varlistentry>
@ -565,7 +551,7 @@ example.xml:5: element head: validity error : Element head content does not foll
<term><literal>"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"</literal></term>
<listitem>
<para>A local filename or an URL to find the DTD.</para>
<para>A local filename or a <acronym>URL</acronym> to find the <acronym>DTD</acronym>.</para>
</listitem>
</varlistentry>
@ -573,25 +559,24 @@ example.xml:5: element head: validity error : Element head content does not foll
<term><literal>&gt;</literal></term>
<listitem>
<para>Returns to the document.</para>
<para>Ends the declaration and returns to the document.</para>
</listitem>
</varlistentry>
</variablelist>
<sect2>
<title>Formal Public Identifiers (FPIs)
<title>Formal Public Identifiers (<acronym>FPI</acronym>s)
<indexterm significance="preferred">
<primary>Formal Public Identifier</primary>
</indexterm></title>
<note>
<para>You do not need to know this, but it is useful
background, and might help you debug problems when your XML
processor can not locate the DTD you are using.</para>
<para>It is not necessary to know this, but it is useful
background, and might help debug problems when the <acronym>XML</acronym>
processor can not locate the <acronym>DTD</acronym>.</para>
</note>
<para>FPIs must follow a specific syntax. This syntax is as
follows:</para>
<para><acronym>FPI</acronym>s must follow a specific syntax:</para>
<programlisting>"<replaceable>Owner</replaceable>//<replaceable>Keyword</replaceable> <replaceable>Description</replaceable>//<replaceable>Language</replaceable>"</programlisting>
@ -600,16 +585,16 @@ example.xml:5: element head: validity error : Element head content does not foll
<term><replaceable>Owner</replaceable></term>
<listitem>
<para>This indicates the owner of the FPI.</para>
<para>The owner of the <acronym>FPI</acronym>.</para>
<para>If this string starts with <quote>ISO</quote> then
this is an ISO owned FPI. For example, the FPI
<para>The beginning of the string identifies the owner
of the <acronym>FPI</acronym>. For example, the <acronym>FPI</acronym>
<literal>"ISO 8879:1986//ENTITIES Greek
Symbols//EN"</literal> lists
<literal>ISO 8879:1986</literal> as being the owner for
the set of entities for Greek symbols. ISO 8879:1986 is
the ISO number for the SGML standard, the predecessor
(and a superset) of XML.</para>
the set of entities for Greek symbols. <acronym>ISO</acronym> 8879:1986 is
the International Organization for Standardization (<acronym>ISO</acronym>) number for the <acronym>SGML</acronym> standard, the predecessor
(and a superset) of <acronym>XML</acronym>.</para>
<para>Otherwise, this string will either look like
<literal>-//<replaceable>Owner</replaceable></literal>
@ -620,21 +605,21 @@ example.xml:5: element head: validity error : Element head content does not foll
<para>If the string starts with <literal>-</literal> then
the owner information is unregistered, with a
<literal>+</literal> it identifies it as being
<literal>+</literal> identifying it as
registered.</para>
<para>ISO 9070:1991 defines how registered names are
generated; it might be derived from the number of an ISO
publication, an ISBN code, or an organization code
assigned according to ISO 6523. In addition, a
<para><acronym>ISO</acronym> 9070:1991 defines how registered names are
generated. It might be derived from the number of an <acronym>ISO</acronym>
publication, an <acronym>ISBN</acronym> code, or an organization code
assigned according to <acronym>ISO</acronym> 6523. Additionally, a
registration authority could be created in order to
assign registered names. The ISO council delegated this
assign registered names. The <acronym>ISO</acronym> council delegated this
to the American National Standards Institute
(ANSI).</para>
(<acronym>ANSI</acronym>).</para>
<para>Because the FreeBSD Project has not been registered
the owner string is <literal>-//FreeBSD</literal>. And
as you can see, the W3C are not a registered owner
<para>Because the &os; Project has not been registered,
the owner string is <literal>-//&os;</literal>. As
seen in the example, the <acronym>W3C</acronym> are not a registered owner
either.</para>
</listitem>
</varlistentry>
@ -648,10 +633,10 @@ example.xml:5: element head: validity error : Element head content does not foll
keywords are <literal>DTD</literal>,
<literal>ELEMENT</literal>, <literal>ENTITIES</literal>,
and <literal>TEXT</literal>. <literal>DTD</literal> is
used only for DTD files, <literal>ELEMENT</literal> is
usually used for DTD fragments that contain only entity
used only for <acronym>DTD</acronym> files, <literal>ELEMENT</literal> is
usually used for <acronym>DTD</acronym> fragments that contain only entity
or element declarations. <literal>TEXT</literal> is
used for XML content (text and tags).</para>
used for <acronym>XML</acronym> content (text and tags).</para>
</listitem>
</varlistentry>
@ -659,10 +644,10 @@ example.xml:5: element head: validity error : Element head content does not foll
<term><replaceable>Description</replaceable></term>
<listitem>
<para>Any description you want to supply for the contents
<para>Any description can be given for the contents
of this file. This may include version numbers or any
short text that is meaningful to you and unique for the
XML system.</para>
short text that is meaningful and unique for the
<acronym>XML</acronym> system.</para>
</listitem>
</varlistentry>
@ -670,7 +655,7 @@ example.xml:5: element head: validity error : Element head content does not foll
<term><replaceable>Language</replaceable></term>
<listitem>
<para>This is an ISO two-character code that identifies
<para>An <acronym>ISO</acronym> two-character code that identifies
the native language for the file. <literal>EN</literal>
is used for English.</para>
</listitem>
@ -680,48 +665,45 @@ example.xml:5: element head: validity error : Element head content does not foll
<sect3>
<title><filename>catalog</filename> Files</title>
<para>If you use the syntax above and process this document
using an XML processor, the processor will need to have
some way of turning the FPI into the name of the file on
your computer that contains the DTD.</para>
<para>In order to do this it can use a catalog file. A
<para>With the syntax above,
an <acronym>XML</acronym> processor needs to have
some way of turning the <acronym>FPI</acronym> into the name of the file
containing the <acronym>DTD</acronym>. A
catalog file (typically called <filename>catalog</filename>)
contains lines that map FPIs to filenames. For example, if
contains lines that map <acronym>FPI</acronym>s to filenames. For example, if
the catalog file contained the line:</para>
<!-- XXX: mention XML catalog or maybe replace this totally and only cover XML catalog -->
<programlisting>PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "1.0/transitional.dtd"</programlisting>
<para>The XML processor would know to look up the DTD from
<filename>transitional.dtd</filename> in the
<filename>1.0</filename> subdirectory of whichever directory
held the <filename>catalog</filename> file that contained
that line.</para>
<para>The <acronym>XML</acronym> processor knows that the <acronym>DTD</acronym> is
called <filename>transitional.dtd</filename> in the
<filename>1.0</filename> subdirectory of the directory that
held the <filename>catalog</filename> file.</para>
<para>Look at the contents of
<para>Examine the contents of
<filename>/usr/local/share/xml/dtd/xhtml/catalog.xml</filename>.
This is the catalog file for the XHTML DTDs that will have
been installed as part of the <filename
This is the catalog file for the <acronym>XHTML</acronym> <acronym>DTD</acronym>s that was
installed as part of the <filename
role="package">textproc/docproj</filename> port.</para>
</sect3>
<sect3>
<title><envar>SGML_CATALOG_FILES</envar></title>
<para>In order to locate a <filename>catalog</filename> file,
your XML processor will need to know where to look. Many
of them feature command line parameters for specifying the
<para>To locate a <filename>catalog</filename> file,
the <acronym>XML</acronym> processor must know where to look. Many
feature command line parameters for specifying the
path to one or more catalogs.</para>
<para>In addition, you can set
<envar>SGML_CATALOG_FILES</envar> to point to the files.
This environment variable should consist of a
<para>In addition,
<envar>SGML_CATALOG_FILES</envar> can be set to point to the files.
This environment variable consists of a
colon-separated list of catalog files (including their full
path).</para>
<para>Typically, you will want to include the following
<para>Typically, the list includes these
files:</para>
<itemizedlist>
@ -742,33 +724,30 @@ example.xml:5: element head: validity error : Element head content does not foll
</listitem>
</itemizedlist>
<para>You should <link linkend="xml-primer-envars">already
have done this</link>.</para>
<para>This was done <link linkend="xml-primer-envars">earlier</link>.</para>
</sect3>
</sect2>
<sect2 id="xml-primer-fpi-alternatives">
<title>Alternatives to FPIs</title>
<title>Alternatives to <acronym>FPI</acronym>s</title>
<para>Instead of using an FPI to indicate the DTD that the
document conforms to (and therefore, which file on the system
contains the DTD) you can explicitly specify the name of the
file.</para>
<para>Instead of using an <acronym>FPI</acronym> to indicate the <acronym>DTD</acronym> to which
the document conforms (and therefore, which file on the system
contains the <acronym>DTD</acronym>), the filename can be explicitly specified.</para>
<para>The syntax for this is slightly different:</para>
<para>The syntax is slightly different:</para>
<programlisting><![CDATA[<!DOCTYPE html SYSTEM "/path/to/file.dtd">]]></programlisting>
<programlisting><sgmltag class="starttag">!DOCTYPE html SYSTEM "/path/to/file.dtd"</sgmltag></programlisting>
<para>The <literal>SYSTEM</literal> keyword indicates that the
XML processor should locate the DTD in a system specific
fashion. This typically (but not always) means the DTD will
<acronym>XML</acronym> processor should locate the <acronym>DTD</acronym> in a system specific
fashion. This typically (but not always) means the <acronym>DTD</acronym> will
be provided as a filename.</para>
<para>Using FPIs is preferred for reasons of portability. You
do not want to have to ship a copy of the DTD around with your
document, and if you used the <literal>SYSTEM</literal>
identifier then everyone would need to keep their DTDs in the
same place.</para>
<para>Using <acronym>FPI</acronym>s is preferred for reasons of portability.
If the <literal>SYSTEM</literal>
identifier is used, then the <acronym>DTD</acronym> must be provided and kept in the same location
for everyone.</para>
</sect2>
</sect1>