Whitespace-only fixes. Translators, please ignore.

This commit is contained in:
Warren Block 2013-07-12 04:33:43 +00:00
parent 28530c6850
commit 8225d6648b
Notes: svn2git 2020-12-08 03:00:23 +00:00
svn path=/head/; revision=42260

View file

@ -35,8 +35,8 @@
<title>XML Primer</title>
<para>Most FDP documentation is written with markup languages based
on <acronym>XML</acronym>. This chapter explains what that means, how to
read and understand the documentation source, and the
on <acronym>XML</acronym>. This chapter explains what that means,
how to read and understand the documentation source, and the
<acronym>XML</acronym> techniques used.</para>
<para>Portions of this section were inspired by Mark Galassi's
@ -47,27 +47,27 @@
<sect1 id="xml-primer-overview">
<title>Overview</title>
<para>In the original days of computers, electronic text was simple.
There were a few character sets like <acronym>ASCII</acronym> or <acronym>EBCDIC</acronym>, but
that was about it. Text was text, and what you saw really was
what you got. No frills, no formatting, no intelligence.</para>
<para>In the original days of computers, electronic text was
simple. There were a few character sets like
<acronym>ASCII</acronym> or <acronym>EBCDIC</acronym>, but that
was about it. Text was text, and what you saw really was what
you got. No frills, no formatting, no intelligence.</para>
<para>Inevitably, this was not enough. When text is in a
machine-usable format, machines are expected to be able to use
and manipulate it intelligently. Authors want to indicate
that certain phrases should be emphasized, or added to a
glossary, or made into hyperlinks. Filenames could be
shown in a <quote>typewriter</quote> style font for viewing on
screen, but as <quote>italics</quote> when printed, or any of a
myriad of other options for presentation.</para>
and manipulate it intelligently. Authors want to indicate that
certain phrases should be emphasized, or added to a glossary, or
made into hyperlinks. Filenames could be shown in a
<quote>typewriter</quote> style font for viewing on screen, but
as <quote>italics</quote> when printed, or any of a myriad of
other options for presentation.</para>
<para>It was once hoped that Artificial Intelligence (AI) would
make this easy. The computer would read the document and
automatically identify key phrases, filenames, text that the
reader should type in, examples, and more. Unfortunately, real
life has not happened quite like that, and computers still require
assistance before they can meaningfully process
life has not happened quite like that, and computers still
require assistance before they can meaningfully process
text.</para>
<para>More precisely, they need help identifying what is what.
@ -95,13 +95,14 @@
the markup from the user, so the user is not distracted by
it.</para>
<para>The extra information stored in the markup <emphasis>adds
value</emphasis> to the document. Adding the markup to the
document must typically be done by a person&mdash;after all, if
computers could recognize the text sufficiently well to add the
markup then there would be no need to add it in the first place.
This <emphasis>increases the cost</emphasis> (the effort
required) to create the document.</para>
<para>The extra information stored in the markup
<emphasis>adds value</emphasis> to the document. Adding the
markup to the document must typically be done by a
person&mdash;after all, if computers could recognize the text
sufficiently well to add the markup then there would be no need
to add it in the first place. This
<emphasis>increases the cost</emphasis> (the effort required) to
create the document.</para>
<para>The previous example is actually represented in this
document like this:</para>
@ -110,79 +111,83 @@
<sgmltag class="starttag">screen</sgmltag>&prompt.user; <sgmltag class="starttag">userinput</sgmltag>rm /tmp/foo<sgmltag class="endtag">userinput</sgmltag><sgmltag class="endtag">screen</sgmltag></programlisting>
<para>The markup is clearly separate from the
content.</para>
<para>The markup is clearly separate from the content.</para>
<para>Markup languages define
what the markup means and how it should be interpreted.</para>
<para>Markup languages define what the markup means and how it
should be interpreted.</para>
<para>Of course, one markup language might not be enough. A
markup language for technical documentation has very different
requirements than a markup language that is intended for
cookery recipes. This, in turn, would be very different from a
markup language used to describe poetry. What is really needed
is a first language used to write these other markup
languages. A <emphasis>meta markup language</emphasis>.</para>
requirements than a markup language that is intended for cookery
recipes. This, in turn, would be very different from a markup
language used to describe poetry. What is really needed is a
first language used to write these other markup languages. A
<emphasis>meta markup language</emphasis>.</para>
<para>This is exactly what the eXtensible Markup
Language (<acronym>XML</acronym>) is. Many markup languages have been written in
<acronym>XML</acronym>, including the two most used by the <acronym>FDP</acronym>, <acronym>XHTML</acronym> and
DocBook.</para>
Language (<acronym>XML</acronym>) is. Many markup languages
have been written in <acronym>XML</acronym>, including the two
most used by the <acronym>FDP</acronym>,
<acronym>XHTML</acronym> and DocBook.</para>
<para>Each language definition is more properly called a grammar,
vocabulary, schema or Document Type Definition (<acronym>DTD</acronym>). There
are various languages to specify an <acronym>XML</acronym> grammar, for example,
<acronym>DTD</acronym> (yes, it also means the specification language itself),
<acronym>XML</acronym> Schema (<acronym>XSD</acronym>) or <acronym>RELANG NG</acronym>. The schema specifies the name
of the elements that can be used, what order they appear in (and
whether some markup can be used inside other markup) and related
information.</para>
vocabulary, schema or Document Type Definition
(<acronym>DTD</acronym>). There are various languages to
specify an <acronym>XML</acronym> grammar, for example,
<acronym>DTD</acronym> (yes, it also means the specification
language itself), <acronym>XML</acronym> Schema
(<acronym>XSD</acronym>) or <acronym>RELANG NG</acronym>. The
schema specifies the name of the elements that can be used, what
order they appear in (and whether some markup can be used inside
other markup) and related information.</para>
<para id="xml-primer-validating">A schema is a
<emphasis>complete</emphasis> specification of all the elements
that are allowed to appear, the order in which they should
appear, which elements are mandatory, which are optional, and so
forth. This makes it possible to write an <acronym>XML</acronym>
<emphasis>parser</emphasis> which reads in both the schema and a
document which claims to conform to the schema. The parser can
then confirm whether or not all the elements required by the vocabulary
are in the document in the right order, and whether there are
any errors in the markup. This is normally referred to as
forth. This makes it possible to write an
<acronym>XML</acronym> <emphasis>parser</emphasis> which reads
in both the schema and a document which claims to conform to the
schema. The parser can then confirm whether or not all the
elements required by the vocabulary are in the document in the
right order, and whether there are any errors in the markup.
This is normally referred to as
<quote>validating the document</quote>.</para>
<note>
<para>This processing simply confirms that the choice of
elements, their ordering, and so on, conforms to that listed
in the grammar. It does <emphasis>not</emphasis> check whether
<emphasis>appropriate</emphasis> markup has been used for the
content. If all the filenames in a
document were marked up as function names, the parser would not flag this as
an error (assuming, of course, that the schema defines elements
for filenames and functions, and that they are allowed to
appear in the same place).</para>
in the grammar. It does <emphasis>not</emphasis> check
whether <emphasis>appropriate</emphasis> markup has been used
for the content. If all the filenames in a document were
marked up as function names, the parser would not flag this as
an error (assuming, of course, that the schema defines
elements for filenames and functions, and that they are
allowed to appear in the same place).</para>
</note>
<para>It is likely that most contributions to the
Documentation Project will be content marked up in
either <acronym>XHTML</acronym> or DocBook, rather than alterations to the schemas.
For this reason, this book will not touch on how to write a
vocabulary.</para>
<para>It is likely that most contributions to the Documentation
Project will be content marked up in either
<acronym>XHTML</acronym> or DocBook, rather than alterations to
the schemas. For this reason, this book will not touch on how
to write a vocabulary.</para>
</sect1>
<sect1 id="xml-primer-elements">
<title>Elements, Tags, and Attributes</title>
<para>All the vocabularies written in <acronym>XML</acronym> share certain characteristics.
This is hardly surprising, as the philosophy behind <acronym>XML</acronym> will
inevitably show through. One of the most obvious manifestations
of this philosophy is that of <emphasis>content</emphasis> and
<para>All the vocabularies written in <acronym>XML</acronym> share
certain characteristics. This is hardly surprising, as the
philosophy behind <acronym>XML</acronym> will inevitably show
through. One of the most obvious manifestations of this
philosophy is that of <emphasis>content</emphasis> and
<emphasis>elements</emphasis>.</para>
<para>Documentation, whether it is a single web page, or a
lengthy book, is considered to consist of content. This content
is then divided and further subdivided into elements. The
purpose of adding markup is to name and identify the boundaries
of these elements for further processing.</para>
<para>Documentation, whether it is a single web page, or a lengthy
book, is considered to consist of content. This content is then
divided and further subdivided into elements. The purpose of
adding markup is to name and identify the boundaries of these
elements for further processing.</para>
<para>For example, consider a typical book. At the very top
level, the book is itself an element. This <quote>book</quote>
@ -193,44 +198,45 @@
that was direct speech, or the name of a character in the
story.</para>
<para>It may be helpful to think of this as <quote>chunking</quote>
content. At the very top level is one chunk, the book.
Look a little deeper, and there are more chunks, the individual
chapters. These are chunked further into paragraphs, footnotes,
character names, and so on.</para>
<para>It may be helpful to think of this as
<quote>chunking</quote> content. At the very top level is one
chunk, the book. Look a little deeper, and there are more
chunks, the individual chapters. These are chunked further into
paragraphs, footnotes, character names, and so on.</para>
<para>Notice how this differentiation between
different elements of the content can be made without resorting to any <acronym>XML</acronym>
terms. It really is surprisingly straightforward. This could be done
with a highlighter pen and a printout of the book, using
different colors to indicate different chunks of content.</para>
<para>Notice how this differentiation between different elements
of the content can be made without resorting to any
<acronym>XML</acronym> terms. It really is surprisingly
straightforward. This could be done with a highlighter pen and
a printout of the book, using different colors to indicate
different chunks of content.</para>
<para>Of course, we do not have an electronic highlighter pen, so
we need some other way of indicating which element each piece of
content belongs to. In languages written in <acronym>XML</acronym> (<acronym>XHTML</acronym>,
DocBook, et al) this is done by means of
<emphasis>tags</emphasis>.</para>
content belongs to. In languages written in
<acronym>XML</acronym> (<acronym>XHTML</acronym>, DocBook, et
al) this is done by means of <emphasis>tags</emphasis>.</para>
<para>A tag is used to identify where a particular element starts,
and where the element ends. <emphasis>The tag is not part of
the element itself</emphasis>. Because each grammar was normally
written to mark up specific types of information, each one will
recognize different elements, and will therefore have different
names for the tags.</para>
the element itself</emphasis>. Because each grammar was
normally written to mark up specific types of information, each
one will recognize different elements, and will therefore have
different names for the tags.</para>
<para>For an element called
<replaceable>element-name</replaceable> the start tag will
normally look like
<sgmltag class="starttag"><replaceable>element-name</replaceable></sgmltag>. The
corresponding closing tag for this element is
<sgmltag class="endtag"><replaceable>element-name</replaceable></sgmltag>.</para>
normally look like <sgmltag
class="starttag"><replaceable>element-name</replaceable></sgmltag>.
The corresponding closing tag for this element is <sgmltag
class="endtag"><replaceable>element-name</replaceable></sgmltag>.</para>
<example>
<title>Using an Element (Start and End Tags)</title>
<para><acronym>XHTML</acronym> has an element for indicating that the content
enclosed by the element is a paragraph, called
<sgmltag>p</sgmltag>.</para>
<para><acronym>XHTML</acronym> has an element for indicating
that the content enclosed by the element is a paragraph,
called <sgmltag>p</sgmltag>.</para>
<programlisting><sgmltag class="starttag">p</sgmltag>This is a paragraph. It starts with the start tag for
the 'p' element, and it will end with the end tag for the 'p'
@ -239,19 +245,18 @@
<sgmltag class="starttag">p</sgmltag>This is another paragraph. But this one is much shorter.<sgmltag class="endtag">p</sgmltag></programlisting>
</example>
<para>Some elements have no
content. For example, in <acronym>XHTML</acronym>, a
horizontal line can be included in the document.
For these <quote>empty</quote> elements, <acronym>XML</acronym> introduced
a shorthand form that is completely equivalent to the two-tag
version:</para>
<para>Some elements have no content. For example, in
<acronym>XHTML</acronym>, a horizontal line can be included in
the document. For these <quote>empty</quote> elements,
<acronym>XML</acronym> introduced a shorthand form that is
completely equivalent to the two-tag version:</para>
<example>
<title>Using an Element Without Content</title>
<para><acronym>XHTML</acronym> has an element for indicating a horizontal rule,
called <sgmltag>hr</sgmltag>. This element does not wrap
content, so it looks like this:</para>
<para><acronym>XHTML</acronym> has an element for indicating a
horizontal rule, called <sgmltag>hr</sgmltag>. This element
does not wrap content, so it looks like this:</para>
<programlisting><sgmltag class="starttag">p</sgmltag>One paragraph.<sgmltag class="endtag">p</sgmltag>
<sgmltag class="starttag">hr</sgmltag><sgmltag class="endtag">hr</sgmltag>
@ -268,10 +273,10 @@
from the previous paragraph.<sgmltag class="endtag">p</sgmltag></programlisting>
</example>
<para>As shown above, elements can contain other
elements. In the book example earlier, the book element
contained all the chapter elements, which in turn contained all
the paragraph elements, and so on.</para>
<para>As shown above, elements can contain other elements. In the
book example earlier, the book element contained all the chapter
elements, which in turn contained all the paragraph elements,
and so on.</para>
<example>
<title>Elements Within Elements; <sgmltag>em</sgmltag></title>
@ -280,8 +285,8 @@
of the <sgmltag class="starttag">em</sgmltag>words<sgmltag class="endtag">em</sgmltag> have been <sgmltag class="starttag">em</sgmltag>emphasized<sgmltag class="endtag">em</sgmltag>.<sgmltag class="endtag">p</sgmltag></programlisting>
</example>
<para>The grammar consists of rules that describe which elements can
contain other elements, and exactly what they can
<para>The grammar consists of rules that describe which elements
can contain other elements, and exactly what they can
contain.</para>
<important>
@ -294,12 +299,13 @@
element starts and ends.</para>
<para>When this document (or anyone else knowledgeable about
<acronym>XML</acronym>) refers to <quote>the <sgmltag class="starttag">p</sgmltag> tag</quote>
<acronym>XML</acronym>) refers to
<quote>the <sgmltag class="starttag">p</sgmltag> tag</quote>
they mean the literal text consisting of the three characters
<literal>&lt;</literal>, <literal>p</literal>, and
<literal>&gt;</literal>. But the phrase <quote>the
<sgmltag>p</sgmltag> element</quote> refers to the whole
element.</para>
<literal>&gt;</literal>. But the phrase
<quote>the <sgmltag>p</sgmltag> element</quote> refers to the
whole element.</para>
<para>This distinction <emphasis>is</emphasis> very subtle. But
keep it in mind.</para>
@ -316,14 +322,14 @@
take the form
<literal><replaceable>attribute-name</replaceable>="<replaceable>attribute-value</replaceable>"</literal>.</para>
<para>In <acronym>XHTML</acronym>, the
<sgmltag>p</sgmltag> element has an attribute called
<sgmltag class="attribute">align</sgmltag>, which suggests an alignment
(justification) for the paragraph to the program displaying the
<acronym>XHTML</acronym>.</para>
<para>In <acronym>XHTML</acronym>, the <sgmltag>p</sgmltag>
element has an attribute called
<sgmltag class="attribute">align</sgmltag>, which suggests an
alignment (justification) for the paragraph to the program
displaying the <acronym>XHTML</acronym>.</para>
<para>The <sgmltag class="attribute">align</sgmltag> attribute can take one of four
defined values, <literal>left</literal>,
<para>The <sgmltag class="attribute">align</sgmltag> attribute can
take one of four defined values, <literal>left</literal>,
<literal>center</literal>, <literal>right</literal> and
<literal>justify</literal>. If the attribute is not specified
then the default is <literal>left</literal>.</para>
@ -349,8 +355,8 @@
<para>Attribute values in <acronym>XML</acronym> must be enclosed
in either single or double quotes. Double quotes are
traditional. Single quotes are useful when the attribute
value contains double quotes.</para>
traditional. Single quotes are useful when the attribute value
contains double quotes.</para>
<para>Information about attributes, elements, and tags is stored
in catalog files. The Documentation Project uses standard
@ -371,17 +377,17 @@
<para>Install
<filename role="package">textproc/docproj</filename> from
the &os; Ports Collection. This is a
<emphasis>meta-port</emphasis> that downloads and
installs the standard programs and supporting files needed
by the Documentation Project.</para>
<emphasis>meta-port</emphasis> that downloads and installs
the standard programs and supporting files needed by the
Documentation Project.</para>
</step>
<step>
<para>Add lines to the shell startup files to set
<envar>SGML_CATALOG_FILES</envar>. When working on non-English
versions of the documentation, replace
<replaceable>en_US.ISO8859-1</replaceable> with the appropriate directory for the
target language.</para>
<envar>SGML_CATALOG_FILES</envar>. When working on
non-English versions of the documentation, replace
<replaceable>en_US.ISO8859-1</replaceable> with the
appropriate directory for the target language.</para>
<example id="xml-primer-envars">
<title><filename>.profile</filename>, for &man.sh.1; and
@ -410,9 +416,9 @@ setenv SGML_CATALOG_FILES /usr/doc/share/xml/catalog:$SGML_CATALOG_FILES
setenv SGML_CATALOG_FILES /usr/doc/<replaceable>en_US.ISO8859-1</replaceable>/share/xml/catalog:$SGML_CATALOG_FILES</programlisting>
</example>
<para>After making these changes, either log out and log back in again, or run
the commands from the command line to set the variable
values.</para>
<para>After making these changes, either log out and log
back in again, or run the commands from the command line
to set the variable values.</para>
</step>
</procedure>
@ -439,41 +445,44 @@ setenv SGML_CATALOG_FILES /usr/doc/<replaceable>en_US.ISO8859-1</replaceable>/sh
</step>
<step>
<para>Try to validate this file using an <acronym>XML</acronym> parser.</para>
<para>Try to validate this file using an
<acronym>XML</acronym> parser.</para>
<para><filename role="package">textproc/docproj</filename> includes
the <command>xmllint</command>
<para><filename role="package">textproc/docproj</filename>
includes the <command>xmllint</command>
<link linkend="xml-primer-validating">validating
parser</link>.</para>
<para>Use <command>xmllint</command> to
validate the document:</para>
<para>Use <command>xmllint</command> to validate the
document:</para>
<screen>&prompt.user; <userinput>xmllint --valid --noout example.xml</userinput></screen>
<para><command>xmllint</command> returns
without displaying any output, showing that the
document validated successfully.</para>
<para><command>xmllint</command> returns without displaying
any output, showing that the document validated
successfully.</para>
</step>
<step>
<para>See what happens when required elements are omitted.
Delete the line with the <sgmltag class="starttag">title</sgmltag> and
<sgmltag class="endtag">/title</sgmltag> tags, and re-run the
validation.</para>
Delete the line with the
<sgmltag class="starttag">title</sgmltag> and
<sgmltag class="endtag">/title</sgmltag> tags, and re-run
the validation.</para>
<screen>&prompt.user; <userinput>xmllint --valid --noout example.xml</userinput>
example.xml:5: element head: validity error : Element head content does not follow the DTD, expecting ((script | style | meta | link | object | isindex)* , ((title , (script | style | meta | link | object | isindex)* , (base , (script | style | meta | link | object | isindex)*)?) | (base , (script | style | meta | link | object | isindex)* , title , (script | style | meta | link | object | isindex)*))), got ()</screen>
<para>This shows that the validation error comes from
the <replaceable>fifth</replaceable> line of the
<para>This shows that the validation error comes from the
<replaceable>fifth</replaceable> line of the
<replaceable>example.xml</replaceable> file and that the
content of the <sgmltag class="starttag">head</sgmltag> is the part which
does not follow the rules of the <acronym>XHTML</acronym> grammar.</para>
content of the <sgmltag class="starttag">head</sgmltag> is
the part which does not follow the rules of the
<acronym>XHTML</acronym> grammar.</para>
<para>Then <command>xmllint</command> shows
the line where the error was found and marks the
exact character position with a <literal>^</literal> sign.</para>
<para>Then <command>xmllint</command> shows the line where
the error was found and marks the exact character position
with a <literal>^</literal> sign.</para>
</step>
<step>
@ -486,13 +495,15 @@ example.xml:5: element head: validity error : Element head content does not foll
<sect1 id="xml-primer-doctype-declaration">
<title>The DOCTYPE Declaration</title>
<para>The beginning of each document can specify
the name of the <acronym>DTD</acronym> to which the document conforms.
This DOCTYPE declaration is used by <acronym>XML</acronym> parsers to
identify the <acronym>DTD</acronym> and ensure that the document does conform to it.</para>
<para>The beginning of each document can specify the name of the
<acronym>DTD</acronym> to which the document conforms. This
DOCTYPE declaration is used by <acronym>XML</acronym> parsers to
identify the <acronym>DTD</acronym> and ensure that the document
does conform to it.</para>
<para>A typical declaration for a document written to conform with
version 1.0 of the <acronym>XHTML</acronym> <acronym>DTD</acronym> looks like this:</para>
version 1.0 of the <acronym>XHTML</acronym>
<acronym>DTD</acronym> looks like this:</para>
<programlisting><sgmltag class="starttag">!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"</sgmltag></programlisting>
@ -512,8 +523,8 @@ example.xml:5: element head: validity error : Element head content does not foll
<term><literal>DOCTYPE</literal></term>
<listitem>
<para>Shows that this is an <acronym>XML</acronym> declaration of the
document type.</para>
<para>Shows that this is an <acronym>XML</acronym>
declaration of the document type.</para>
</listitem>
</varlistentry>
@ -528,21 +539,27 @@ example.xml:5: element head: validity error : Element head content does not foll
</varlistentry>
<varlistentry>
<term><literal>PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"</literal></term>
<term><literal>PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"</literal></term>
<listitem>
<para>Lists the Formal Public Identifier (<acronym>FPI</acronym>)
<para>Lists the Formal Public Identifier
(<acronym>FPI</acronym>)
<indexterm>
<primary>Formal Public Identifier</primary>
</indexterm>
for the <acronym>DTD</acronym> to which this document conforms. The <acronym>XML</acronym>
parser uses this to find the correct <acronym>DTD</acronym> when
processing this document.</para>
for the <acronym>DTD</acronym> to which this document
conforms. The <acronym>XML</acronym> parser uses this to
find the correct <acronym>DTD</acronym> when processing
this document.</para>
<para><literal>PUBLIC</literal> is not a part of the <acronym>FPI</acronym>,
but indicates to the <acronym>XML</acronym> processor how to find the <acronym>DTD</acronym>
referenced in the <acronym>FPI</acronym>. Other ways of telling the <acronym>XML</acronym>
parser how to find the <acronym>DTD</acronym> are shown <link
<para><literal>PUBLIC</literal> is not a part of the
<acronym>FPI</acronym>, but indicates to the
<acronym>XML</acronym> processor how to find the
<acronym>DTD</acronym> referenced in the
<acronym>FPI</acronym>. Other ways of telling the
<acronym>XML</acronym> parser how to find the
<acronym>DTD</acronym> are shown <link
linkend="xml-primer-fpi-alternatives">later</link>.</para>
</listitem>
</varlistentry>
@ -551,7 +568,8 @@ example.xml:5: element head: validity error : Element head content does not foll
<term><literal>"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"</literal></term>
<listitem>
<para>A local filename or a <acronym>URL</acronym> to find the <acronym>DTD</acronym>.</para>
<para>A local filename or a <acronym>URL</acronym> to find
the <acronym>DTD</acronym>.</para>
</listitem>
</varlistentry>
@ -559,24 +577,29 @@ example.xml:5: element head: validity error : Element head content does not foll
<term><literal>&gt;</literal></term>
<listitem>
<para>Ends the declaration and returns to the document.</para>
<para>Ends the declaration and returns to the
document.</para>
</listitem>
</varlistentry>
</variablelist>
<sect2>
<title>Formal Public Identifiers (<acronym>FPI</acronym>s)</title>
<title>Formal Public Identifiers
(<acronym>FPI</acronym>s)</title>
<indexterm significance="preferred">
<primary>Formal Public Identifier</primary>
</indexterm>
<note>
<para>It is not necessary to know this, but it is useful
background, and might help debug problems when the <acronym>XML</acronym>
processor can not locate the <acronym>DTD</acronym>.</para>
background, and might help debug problems when the
<acronym>XML</acronym> processor can not locate the
<acronym>DTD</acronym>.</para>
</note>
<para><acronym>FPI</acronym>s must follow a specific syntax:</para>
<para><acronym>FPI</acronym>s must follow a specific
syntax:</para>
<programlisting>"<replaceable>Owner</replaceable>//<replaceable>Keyword</replaceable> <replaceable>Description</replaceable>//<replaceable>Language</replaceable>"</programlisting>
@ -587,14 +610,18 @@ example.xml:5: element head: validity error : Element head content does not foll
<listitem>
<para>The owner of the <acronym>FPI</acronym>.</para>
<para>The beginning of the string identifies the owner
of the <acronym>FPI</acronym>. For example, the <acronym>FPI</acronym>
<para>The beginning of the string identifies the owner of
the <acronym>FPI</acronym>. For example, the
<acronym>FPI</acronym>
<literal>"ISO 8879:1986//ENTITIES Greek
Symbols//EN"</literal> lists
<literal>ISO 8879:1986</literal> as being the owner for
the set of entities for Greek symbols. <acronym>ISO</acronym> 8879:1986 is
the International Organization for Standardization (<acronym>ISO</acronym>) number for the <acronym>SGML</acronym> standard, the predecessor
(and a superset) of <acronym>XML</acronym>.</para>
the set of entities for Greek symbols.
<acronym>ISO</acronym> 8879:1986 is the International
Organization for Standardization
(<acronym>ISO</acronym>) number for the
<acronym>SGML</acronym> standard, the predecessor (and a
superset) of <acronym>XML</acronym>.</para>
<para>Otherwise, this string will either look like
<literal>-//<replaceable>Owner</replaceable></literal>
@ -608,19 +635,21 @@ example.xml:5: element head: validity error : Element head content does not foll
<literal>+</literal> identifying it as
registered.</para>
<para><acronym>ISO</acronym> 9070:1991 defines how registered names are
generated. It might be derived from the number of an <acronym>ISO</acronym>
publication, an <acronym>ISBN</acronym> code, or an organization code
assigned according to <acronym>ISO</acronym> 6523. Additionally, a
<para><acronym>ISO</acronym> 9070:1991 defines how
registered names are generated. It might be derived
from the number of an <acronym>ISO</acronym>
publication, an <acronym>ISBN</acronym> code, or an
organization code assigned according to
<acronym>ISO</acronym> 6523. Additionally, a
registration authority could be created in order to
assign registered names. The <acronym>ISO</acronym> council delegated this
to the American National Standards Institute
(<acronym>ANSI</acronym>).</para>
assign registered names. The <acronym>ISO</acronym>
council delegated this to the American National
Standards Institute (<acronym>ANSI</acronym>).</para>
<para>Because the &os; Project has not been registered,
the owner string is <literal>-//&os;</literal>. As
seen in the example, the <acronym>W3C</acronym> are not a registered owner
either.</para>
the owner string is <literal>-//&os;</literal>. As seen
in the example, the <acronym>W3C</acronym> are not a
registered owner either.</para>
</listitem>
</varlistentry>
@ -632,11 +661,13 @@ example.xml:5: element head: validity error : Element head content does not foll
information in the file. Some of the most common
keywords are <literal>DTD</literal>,
<literal>ELEMENT</literal>, <literal>ENTITIES</literal>,
and <literal>TEXT</literal>. <literal>DTD</literal> is
used only for <acronym>DTD</acronym> files, <literal>ELEMENT</literal> is
usually used for <acronym>DTD</acronym> fragments that contain only entity
or element declarations. <literal>TEXT</literal> is
used for <acronym>XML</acronym> content (text and tags).</para>
and <literal>TEXT</literal>. <literal>DTD</literal> is
used only for <acronym>DTD</acronym> files,
<literal>ELEMENT</literal> is usually used for
<acronym>DTD</acronym> fragments that contain only
entity or element declarations. <literal>TEXT</literal>
is used for <acronym>XML</acronym> content (text and
tags).</para>
</listitem>
</varlistentry>
@ -655,9 +686,9 @@ example.xml:5: element head: validity error : Element head content does not foll
<term><replaceable>Language</replaceable></term>
<listitem>
<para>An <acronym>ISO</acronym> two-character code that identifies
the native language for the file. <literal>EN</literal>
is used for English.</para>
<para>An <acronym>ISO</acronym> two-character code that
identifies the native language for the file.
<literal>EN</literal> is used for English.</para>
</listitem>
</varlistentry>
</variablelist>
@ -665,46 +696,46 @@ example.xml:5: element head: validity error : Element head content does not foll
<sect3>
<title><filename>catalog</filename> Files</title>
<para>With the syntax above,
an <acronym>XML</acronym> processor needs to have
some way of turning the <acronym>FPI</acronym> into the name of the file
containing the <acronym>DTD</acronym>. A
catalog file (typically called <filename>catalog</filename>)
contains lines that map <acronym>FPI</acronym>s to filenames. For example, if
the catalog file contained the line:</para>
<para>With the syntax above, an <acronym>XML</acronym>
processor needs to have some way of turning the
<acronym>FPI</acronym> into the name of the file containing
the <acronym>DTD</acronym>. A catalog file (typically
called <filename>catalog</filename>) contains lines that map
<acronym>FPI</acronym>s to filenames. For example, if the
catalog file contained the line:</para>
<!-- XXX: mention XML catalog or maybe replace this totally and only cover XML catalog -->
<programlisting>PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "1.0/transitional.dtd"</programlisting>
<para>The <acronym>XML</acronym> processor knows that the <acronym>DTD</acronym> is
called <filename>transitional.dtd</filename> in the
<para>The <acronym>XML</acronym> processor knows that the
<acronym>DTD</acronym> is called
<filename>transitional.dtd</filename> in the
<filename>1.0</filename> subdirectory of the directory that
held the <filename>catalog</filename> file.</para>
<para>Examine the contents of
<filename>/usr/local/share/xml/dtd/xhtml/catalog.xml</filename>.
This is the catalog file for the <acronym>XHTML</acronym> <acronym>DTD</acronym>s that was
installed as part of the <filename
This is the catalog file for the <acronym>XHTML</acronym>
<acronym>DTD</acronym>s that was installed as part of the
<filename
role="package">textproc/docproj</filename> port.</para>
</sect3>
<sect3>
<title><envar>SGML_CATALOG_FILES</envar></title>
<para>To locate a <filename>catalog</filename> file,
the <acronym>XML</acronym> processor must know where to look. Many
feature command line parameters for specifying the
path to one or more catalogs.</para>
<para>To locate a <filename>catalog</filename> file, the
<acronym>XML</acronym> processor must know where to look.
Many feature command line parameters for specifying the path
to one or more catalogs.</para>
<para>In addition,
<envar>SGML_CATALOG_FILES</envar> can be set to point to the files.
This environment variable consists of a
colon-separated list of catalog files (including their full
path).</para>
<para>In addition, <envar>SGML_CATALOG_FILES</envar> can be
set to point to the files. This environment variable
consists of a colon-separated list of catalog files
(including their full path).</para>
<para>Typically, the list includes these
files:</para>
<para>Typically, the list includes these files:</para>
<itemizedlist>
<listitem>
@ -724,30 +755,34 @@ example.xml:5: element head: validity error : Element head content does not foll
</listitem>
</itemizedlist>
<para>This was done <link linkend="xml-primer-envars">earlier</link>.</para>
<para>This was done
<link linkend="xml-primer-envars">earlier</link>.</para>
</sect3>
</sect2>
<sect2 id="xml-primer-fpi-alternatives">
<title>Alternatives to <acronym>FPI</acronym>s</title>
<para>Instead of using an <acronym>FPI</acronym> to indicate the <acronym>DTD</acronym> to which
the document conforms (and therefore, which file on the system
contains the <acronym>DTD</acronym>), the filename can be explicitly specified.</para>
<para>Instead of using an <acronym>FPI</acronym> to indicate the
<acronym>DTD</acronym> to which the document conforms (and
therefore, which file on the system contains the
<acronym>DTD</acronym>), the filename can be explicitly
specified.</para>
<para>The syntax is slightly different:</para>
<programlisting><sgmltag class="starttag">!DOCTYPE html SYSTEM "/path/to/file.dtd"</sgmltag></programlisting>
<para>The <literal>SYSTEM</literal> keyword indicates that the
<acronym>XML</acronym> processor should locate the <acronym>DTD</acronym> in a system specific
fashion. This typically (but not always) means the <acronym>DTD</acronym> will
be provided as a filename.</para>
<acronym>XML</acronym> processor should locate the
<acronym>DTD</acronym> in a system specific fashion. This
typically (but not always) means the <acronym>DTD</acronym>
will be provided as a filename.</para>
<para>Using <acronym>FPI</acronym>s is preferred for reasons of portability.
If the <literal>SYSTEM</literal>
identifier is used, then the <acronym>DTD</acronym> must be provided and kept in the same location
for everyone.</para>
<para>Using <acronym>FPI</acronym>s is preferred for reasons of
portability. If the <literal>SYSTEM</literal> identifier is
used, then the <acronym>DTD</acronym> must be provided and
kept in the same location for everyone.</para>
</sect2>
</sect1>
@ -1031,9 +1066,11 @@ example.xml:5: element head: validity error : Element head content does not foll
the entity reference <literal>&amp;version;</literal>
replaced with the version number. Most web browsers have
very simplistic parsers which do not handle XML DTD
constructs. Furthermore, the closing <literal>]&lt;</literal>
of the XML context are not recognized properly by browser and
will probably be rendered.</para>
constructs. Furthermore, the closing
<literal>]&lt;</literal> of the XML context are not
recognized properly by browser and will probably be
rendered.</para>
</step>
<step>
@ -1349,20 +1386,19 @@ example.xml:5: element head: validity error : Element head content does not foll
<para>The content model you will probably find most
useful is <literal>CDATA</literal>.</para>
<para><literal>CDATA</literal> is for <quote>Character
Data</quote>. If the parser is in this content model then
it is expecting to see characters, and characters only. In
this model the <literal>&lt;</literal> and
<literal>&amp;</literal> symbols lose their special status,
and will be treated as ordinary characters.</para>
<para><literal>CDATA</literal> is for
<quote>Character Data</quote>. If the parser is in this
content model then it is expecting to see characters, and
characters only. In this model the <literal>&lt;</literal>
and <literal>&amp;</literal> symbols lose their special
status, and will be treated as ordinary characters.</para>
<note>
<para>When you use <literal>CDATA</literal>
in examples of text marked up in
XML, keep in mind that the content of
<para>When you use <literal>CDATA</literal> in examples of
text marked up in XML, keep in mind that the content of
<literal>CDATA</literal> is not validated. You have to
check the included XML text using other means. You
could, for example, write the example in another document,
check the included XML text using other means. You could,
for example, write the example in another document,
validate the example code, and then paste it to your
<literal>CDATA</literal> content.</para>
</note>
@ -1482,8 +1518,8 @@ example.xml:5: element head: validity error : Element head content does not foll
<procedure>
<step>
<para>Modify the <filename>entities.ent</filename> file to contain
the following:</para>
<para>Modify the <filename>entities.ent</filename> file to
contain the following:</para>
<programlisting>&lt;!ENTITY version "1.1"&gt;
&lt;!ENTITY % conditional.text "IGNORE"&gt;
@ -1499,13 +1535,15 @@ example.xml:5: element head: validity error : Element head content does not foll
</step>
<step>
<para>Normalize the <filename>example.xml</filename> file and notice
that the conditional text is not present on the output document.
Now if you set the parameter entity guard to <literal>INCLUDE</literal>
and regenerate the normalized document, it will appear there again.
Of course, this method makes more sense if you have more conditional
chunks that depend on the same condition, for example, whether you are
generating printed or online text.</para>
<para>Normalize the <filename>example.xml</filename> file
and notice that the conditional text is not present on the
output document. Now if you set the parameter entity
guard to <literal>INCLUDE</literal> and regenerate the
normalized document, it will appear there again. Of
course, this method makes more sense if you have more
conditional chunks that depend on the same condition, for
example, whether you are generating printed or online
text.</para>
</step>
</procedure>
</sect2>