Whitespace-only fixes. Translators, please ignore.
This commit is contained in:
parent
28530c6850
commit
8225d6648b
Notes:
svn2git
2020-12-08 03:00:23 +00:00
svn path=/head/; revision=42260
1 changed files with 287 additions and 249 deletions
|
@ -35,8 +35,8 @@
|
|||
<title>XML Primer</title>
|
||||
|
||||
<para>Most FDP documentation is written with markup languages based
|
||||
on <acronym>XML</acronym>. This chapter explains what that means, how to
|
||||
read and understand the documentation source, and the
|
||||
on <acronym>XML</acronym>. This chapter explains what that means,
|
||||
how to read and understand the documentation source, and the
|
||||
<acronym>XML</acronym> techniques used.</para>
|
||||
|
||||
<para>Portions of this section were inspired by Mark Galassi's
|
||||
|
@ -47,27 +47,27 @@
|
|||
<sect1 id="xml-primer-overview">
|
||||
<title>Overview</title>
|
||||
|
||||
<para>In the original days of computers, electronic text was simple.
|
||||
|
||||
There were a few character sets like <acronym>ASCII</acronym> or <acronym>EBCDIC</acronym>, but
|
||||
that was about it. Text was text, and what you saw really was
|
||||
what you got. No frills, no formatting, no intelligence.</para>
|
||||
<para>In the original days of computers, electronic text was
|
||||
simple. There were a few character sets like
|
||||
<acronym>ASCII</acronym> or <acronym>EBCDIC</acronym>, but that
|
||||
was about it. Text was text, and what you saw really was what
|
||||
you got. No frills, no formatting, no intelligence.</para>
|
||||
|
||||
<para>Inevitably, this was not enough. When text is in a
|
||||
machine-usable format, machines are expected to be able to use
|
||||
and manipulate it intelligently. Authors want to indicate
|
||||
that certain phrases should be emphasized, or added to a
|
||||
glossary, or made into hyperlinks. Filenames could be
|
||||
shown in a <quote>typewriter</quote> style font for viewing on
|
||||
screen, but as <quote>italics</quote> when printed, or any of a
|
||||
myriad of other options for presentation.</para>
|
||||
and manipulate it intelligently. Authors want to indicate that
|
||||
certain phrases should be emphasized, or added to a glossary, or
|
||||
made into hyperlinks. Filenames could be shown in a
|
||||
<quote>typewriter</quote> style font for viewing on screen, but
|
||||
as <quote>italics</quote> when printed, or any of a myriad of
|
||||
other options for presentation.</para>
|
||||
|
||||
<para>It was once hoped that Artificial Intelligence (AI) would
|
||||
make this easy. The computer would read the document and
|
||||
automatically identify key phrases, filenames, text that the
|
||||
reader should type in, examples, and more. Unfortunately, real
|
||||
life has not happened quite like that, and computers still require
|
||||
assistance before they can meaningfully process
|
||||
life has not happened quite like that, and computers still
|
||||
require assistance before they can meaningfully process
|
||||
text.</para>
|
||||
|
||||
<para>More precisely, they need help identifying what is what.
|
||||
|
@ -95,13 +95,14 @@
|
|||
the markup from the user, so the user is not distracted by
|
||||
it.</para>
|
||||
|
||||
<para>The extra information stored in the markup <emphasis>adds
|
||||
value</emphasis> to the document. Adding the markup to the
|
||||
document must typically be done by a person—after all, if
|
||||
computers could recognize the text sufficiently well to add the
|
||||
markup then there would be no need to add it in the first place.
|
||||
This <emphasis>increases the cost</emphasis> (the effort
|
||||
required) to create the document.</para>
|
||||
<para>The extra information stored in the markup
|
||||
<emphasis>adds value</emphasis> to the document. Adding the
|
||||
markup to the document must typically be done by a
|
||||
person—after all, if computers could recognize the text
|
||||
sufficiently well to add the markup then there would be no need
|
||||
to add it in the first place. This
|
||||
<emphasis>increases the cost</emphasis> (the effort required) to
|
||||
create the document.</para>
|
||||
|
||||
<para>The previous example is actually represented in this
|
||||
document like this:</para>
|
||||
|
@ -110,79 +111,83 @@
|
|||
|
||||
<sgmltag class="starttag">screen</sgmltag>&prompt.user; <sgmltag class="starttag">userinput</sgmltag>rm /tmp/foo<sgmltag class="endtag">userinput</sgmltag><sgmltag class="endtag">screen</sgmltag></programlisting>
|
||||
|
||||
<para>The markup is clearly separate from the
|
||||
content.</para>
|
||||
<para>The markup is clearly separate from the content.</para>
|
||||
|
||||
<para>Markup languages define
|
||||
what the markup means and how it should be interpreted.</para>
|
||||
<para>Markup languages define what the markup means and how it
|
||||
should be interpreted.</para>
|
||||
|
||||
<para>Of course, one markup language might not be enough. A
|
||||
markup language for technical documentation has very different
|
||||
requirements than a markup language that is intended for
|
||||
cookery recipes. This, in turn, would be very different from a
|
||||
markup language used to describe poetry. What is really needed
|
||||
is a first language used to write these other markup
|
||||
languages. A <emphasis>meta markup language</emphasis>.</para>
|
||||
requirements than a markup language that is intended for cookery
|
||||
recipes. This, in turn, would be very different from a markup
|
||||
language used to describe poetry. What is really needed is a
|
||||
first language used to write these other markup languages. A
|
||||
<emphasis>meta markup language</emphasis>.</para>
|
||||
|
||||
<para>This is exactly what the eXtensible Markup
|
||||
Language (<acronym>XML</acronym>) is. Many markup languages have been written in
|
||||
<acronym>XML</acronym>, including the two most used by the <acronym>FDP</acronym>, <acronym>XHTML</acronym> and
|
||||
DocBook.</para>
|
||||
Language (<acronym>XML</acronym>) is. Many markup languages
|
||||
have been written in <acronym>XML</acronym>, including the two
|
||||
most used by the <acronym>FDP</acronym>,
|
||||
<acronym>XHTML</acronym> and DocBook.</para>
|
||||
|
||||
<para>Each language definition is more properly called a grammar,
|
||||
vocabulary, schema or Document Type Definition (<acronym>DTD</acronym>). There
|
||||
are various languages to specify an <acronym>XML</acronym> grammar, for example,
|
||||
<acronym>DTD</acronym> (yes, it also means the specification language itself),
|
||||
<acronym>XML</acronym> Schema (<acronym>XSD</acronym>) or <acronym>RELANG NG</acronym>. The schema specifies the name
|
||||
of the elements that can be used, what order they appear in (and
|
||||
whether some markup can be used inside other markup) and related
|
||||
information.</para>
|
||||
vocabulary, schema or Document Type Definition
|
||||
(<acronym>DTD</acronym>). There are various languages to
|
||||
specify an <acronym>XML</acronym> grammar, for example,
|
||||
<acronym>DTD</acronym> (yes, it also means the specification
|
||||
language itself), <acronym>XML</acronym> Schema
|
||||
(<acronym>XSD</acronym>) or <acronym>RELANG NG</acronym>. The
|
||||
schema specifies the name of the elements that can be used, what
|
||||
order they appear in (and whether some markup can be used inside
|
||||
other markup) and related information.</para>
|
||||
|
||||
<para id="xml-primer-validating">A schema is a
|
||||
<emphasis>complete</emphasis> specification of all the elements
|
||||
that are allowed to appear, the order in which they should
|
||||
appear, which elements are mandatory, which are optional, and so
|
||||
forth. This makes it possible to write an <acronym>XML</acronym>
|
||||
<emphasis>parser</emphasis> which reads in both the schema and a
|
||||
document which claims to conform to the schema. The parser can
|
||||
then confirm whether or not all the elements required by the vocabulary
|
||||
are in the document in the right order, and whether there are
|
||||
any errors in the markup. This is normally referred to as
|
||||
forth. This makes it possible to write an
|
||||
<acronym>XML</acronym> <emphasis>parser</emphasis> which reads
|
||||
in both the schema and a document which claims to conform to the
|
||||
schema. The parser can then confirm whether or not all the
|
||||
elements required by the vocabulary are in the document in the
|
||||
right order, and whether there are any errors in the markup.
|
||||
This is normally referred to as
|
||||
<quote>validating the document</quote>.</para>
|
||||
|
||||
<note>
|
||||
<para>This processing simply confirms that the choice of
|
||||
elements, their ordering, and so on, conforms to that listed
|
||||
in the grammar. It does <emphasis>not</emphasis> check whether
|
||||
<emphasis>appropriate</emphasis> markup has been used for the
|
||||
content. If all the filenames in a
|
||||
document were marked up as function names, the parser would not flag this as
|
||||
an error (assuming, of course, that the schema defines elements
|
||||
for filenames and functions, and that they are allowed to
|
||||
appear in the same place).</para>
|
||||
in the grammar. It does <emphasis>not</emphasis> check
|
||||
whether <emphasis>appropriate</emphasis> markup has been used
|
||||
for the content. If all the filenames in a document were
|
||||
marked up as function names, the parser would not flag this as
|
||||
an error (assuming, of course, that the schema defines
|
||||
elements for filenames and functions, and that they are
|
||||
allowed to appear in the same place).</para>
|
||||
</note>
|
||||
|
||||
<para>It is likely that most contributions to the
|
||||
Documentation Project will be content marked up in
|
||||
either <acronym>XHTML</acronym> or DocBook, rather than alterations to the schemas.
|
||||
For this reason, this book will not touch on how to write a
|
||||
vocabulary.</para>
|
||||
<para>It is likely that most contributions to the Documentation
|
||||
Project will be content marked up in either
|
||||
<acronym>XHTML</acronym> or DocBook, rather than alterations to
|
||||
the schemas. For this reason, this book will not touch on how
|
||||
to write a vocabulary.</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="xml-primer-elements">
|
||||
<title>Elements, Tags, and Attributes</title>
|
||||
|
||||
<para>All the vocabularies written in <acronym>XML</acronym> share certain characteristics.
|
||||
This is hardly surprising, as the philosophy behind <acronym>XML</acronym> will
|
||||
inevitably show through. One of the most obvious manifestations
|
||||
of this philosophy is that of <emphasis>content</emphasis> and
|
||||
<para>All the vocabularies written in <acronym>XML</acronym> share
|
||||
certain characteristics. This is hardly surprising, as the
|
||||
philosophy behind <acronym>XML</acronym> will inevitably show
|
||||
through. One of the most obvious manifestations of this
|
||||
philosophy is that of <emphasis>content</emphasis> and
|
||||
<emphasis>elements</emphasis>.</para>
|
||||
|
||||
<para>Documentation, whether it is a single web page, or a
|
||||
lengthy book, is considered to consist of content. This content
|
||||
is then divided and further subdivided into elements. The
|
||||
purpose of adding markup is to name and identify the boundaries
|
||||
of these elements for further processing.</para>
|
||||
<para>Documentation, whether it is a single web page, or a lengthy
|
||||
book, is considered to consist of content. This content is then
|
||||
divided and further subdivided into elements. The purpose of
|
||||
adding markup is to name and identify the boundaries of these
|
||||
elements for further processing.</para>
|
||||
|
||||
<para>For example, consider a typical book. At the very top
|
||||
level, the book is itself an element. This <quote>book</quote>
|
||||
|
@ -193,44 +198,45 @@
|
|||
that was direct speech, or the name of a character in the
|
||||
story.</para>
|
||||
|
||||
<para>It may be helpful to think of this as <quote>chunking</quote>
|
||||
content. At the very top level is one chunk, the book.
|
||||
Look a little deeper, and there are more chunks, the individual
|
||||
chapters. These are chunked further into paragraphs, footnotes,
|
||||
character names, and so on.</para>
|
||||
<para>It may be helpful to think of this as
|
||||
<quote>chunking</quote> content. At the very top level is one
|
||||
chunk, the book. Look a little deeper, and there are more
|
||||
chunks, the individual chapters. These are chunked further into
|
||||
paragraphs, footnotes, character names, and so on.</para>
|
||||
|
||||
<para>Notice how this differentiation between
|
||||
different elements of the content can be made without resorting to any <acronym>XML</acronym>
|
||||
terms. It really is surprisingly straightforward. This could be done
|
||||
with a highlighter pen and a printout of the book, using
|
||||
different colors to indicate different chunks of content.</para>
|
||||
<para>Notice how this differentiation between different elements
|
||||
of the content can be made without resorting to any
|
||||
<acronym>XML</acronym> terms. It really is surprisingly
|
||||
straightforward. This could be done with a highlighter pen and
|
||||
a printout of the book, using different colors to indicate
|
||||
different chunks of content.</para>
|
||||
|
||||
<para>Of course, we do not have an electronic highlighter pen, so
|
||||
we need some other way of indicating which element each piece of
|
||||
content belongs to. In languages written in <acronym>XML</acronym> (<acronym>XHTML</acronym>,
|
||||
DocBook, et al) this is done by means of
|
||||
<emphasis>tags</emphasis>.</para>
|
||||
content belongs to. In languages written in
|
||||
<acronym>XML</acronym> (<acronym>XHTML</acronym>, DocBook, et
|
||||
al) this is done by means of <emphasis>tags</emphasis>.</para>
|
||||
|
||||
<para>A tag is used to identify where a particular element starts,
|
||||
and where the element ends. <emphasis>The tag is not part of
|
||||
the element itself</emphasis>. Because each grammar was normally
|
||||
written to mark up specific types of information, each one will
|
||||
recognize different elements, and will therefore have different
|
||||
names for the tags.</para>
|
||||
the element itself</emphasis>. Because each grammar was
|
||||
normally written to mark up specific types of information, each
|
||||
one will recognize different elements, and will therefore have
|
||||
different names for the tags.</para>
|
||||
|
||||
<para>For an element called
|
||||
<replaceable>element-name</replaceable> the start tag will
|
||||
normally look like
|
||||
<sgmltag class="starttag"><replaceable>element-name</replaceable></sgmltag>. The
|
||||
corresponding closing tag for this element is
|
||||
<sgmltag class="endtag"><replaceable>element-name</replaceable></sgmltag>.</para>
|
||||
normally look like <sgmltag
|
||||
class="starttag"><replaceable>element-name</replaceable></sgmltag>.
|
||||
The corresponding closing tag for this element is <sgmltag
|
||||
class="endtag"><replaceable>element-name</replaceable></sgmltag>.</para>
|
||||
|
||||
<example>
|
||||
<title>Using an Element (Start and End Tags)</title>
|
||||
|
||||
<para><acronym>XHTML</acronym> has an element for indicating that the content
|
||||
enclosed by the element is a paragraph, called
|
||||
<sgmltag>p</sgmltag>.</para>
|
||||
<para><acronym>XHTML</acronym> has an element for indicating
|
||||
that the content enclosed by the element is a paragraph,
|
||||
called <sgmltag>p</sgmltag>.</para>
|
||||
|
||||
<programlisting><sgmltag class="starttag">p</sgmltag>This is a paragraph. It starts with the start tag for
|
||||
the 'p' element, and it will end with the end tag for the 'p'
|
||||
|
@ -239,19 +245,18 @@
|
|||
<sgmltag class="starttag">p</sgmltag>This is another paragraph. But this one is much shorter.<sgmltag class="endtag">p</sgmltag></programlisting>
|
||||
</example>
|
||||
|
||||
<para>Some elements have no
|
||||
content. For example, in <acronym>XHTML</acronym>, a
|
||||
horizontal line can be included in the document.
|
||||
For these <quote>empty</quote> elements, <acronym>XML</acronym> introduced
|
||||
a shorthand form that is completely equivalent to the two-tag
|
||||
version:</para>
|
||||
<para>Some elements have no content. For example, in
|
||||
<acronym>XHTML</acronym>, a horizontal line can be included in
|
||||
the document. For these <quote>empty</quote> elements,
|
||||
<acronym>XML</acronym> introduced a shorthand form that is
|
||||
completely equivalent to the two-tag version:</para>
|
||||
|
||||
<example>
|
||||
<title>Using an Element Without Content</title>
|
||||
|
||||
<para><acronym>XHTML</acronym> has an element for indicating a horizontal rule,
|
||||
called <sgmltag>hr</sgmltag>. This element does not wrap
|
||||
content, so it looks like this:</para>
|
||||
<para><acronym>XHTML</acronym> has an element for indicating a
|
||||
horizontal rule, called <sgmltag>hr</sgmltag>. This element
|
||||
does not wrap content, so it looks like this:</para>
|
||||
|
||||
<programlisting><sgmltag class="starttag">p</sgmltag>One paragraph.<sgmltag class="endtag">p</sgmltag>
|
||||
<sgmltag class="starttag">hr</sgmltag><sgmltag class="endtag">hr</sgmltag>
|
||||
|
@ -268,10 +273,10 @@
|
|||
from the previous paragraph.<sgmltag class="endtag">p</sgmltag></programlisting>
|
||||
</example>
|
||||
|
||||
<para>As shown above, elements can contain other
|
||||
elements. In the book example earlier, the book element
|
||||
contained all the chapter elements, which in turn contained all
|
||||
the paragraph elements, and so on.</para>
|
||||
<para>As shown above, elements can contain other elements. In the
|
||||
book example earlier, the book element contained all the chapter
|
||||
elements, which in turn contained all the paragraph elements,
|
||||
and so on.</para>
|
||||
|
||||
<example>
|
||||
<title>Elements Within Elements; <sgmltag>em</sgmltag></title>
|
||||
|
@ -280,8 +285,8 @@
|
|||
of the <sgmltag class="starttag">em</sgmltag>words<sgmltag class="endtag">em</sgmltag> have been <sgmltag class="starttag">em</sgmltag>emphasized<sgmltag class="endtag">em</sgmltag>.<sgmltag class="endtag">p</sgmltag></programlisting>
|
||||
</example>
|
||||
|
||||
<para>The grammar consists of rules that describe which elements can
|
||||
contain other elements, and exactly what they can
|
||||
<para>The grammar consists of rules that describe which elements
|
||||
can contain other elements, and exactly what they can
|
||||
contain.</para>
|
||||
|
||||
<important>
|
||||
|
@ -294,12 +299,13 @@
|
|||
element starts and ends.</para>
|
||||
|
||||
<para>When this document (or anyone else knowledgeable about
|
||||
<acronym>XML</acronym>) refers to <quote>the <sgmltag class="starttag">p</sgmltag> tag</quote>
|
||||
<acronym>XML</acronym>) refers to
|
||||
<quote>the <sgmltag class="starttag">p</sgmltag> tag</quote>
|
||||
they mean the literal text consisting of the three characters
|
||||
<literal><</literal>, <literal>p</literal>, and
|
||||
<literal>></literal>. But the phrase <quote>the
|
||||
<sgmltag>p</sgmltag> element</quote> refers to the whole
|
||||
element.</para>
|
||||
<literal>></literal>. But the phrase
|
||||
<quote>the <sgmltag>p</sgmltag> element</quote> refers to the
|
||||
whole element.</para>
|
||||
|
||||
<para>This distinction <emphasis>is</emphasis> very subtle. But
|
||||
keep it in mind.</para>
|
||||
|
@ -316,14 +322,14 @@
|
|||
take the form
|
||||
<literal><replaceable>attribute-name</replaceable>="<replaceable>attribute-value</replaceable>"</literal>.</para>
|
||||
|
||||
<para>In <acronym>XHTML</acronym>, the
|
||||
<sgmltag>p</sgmltag> element has an attribute called
|
||||
<sgmltag class="attribute">align</sgmltag>, which suggests an alignment
|
||||
(justification) for the paragraph to the program displaying the
|
||||
<acronym>XHTML</acronym>.</para>
|
||||
<para>In <acronym>XHTML</acronym>, the <sgmltag>p</sgmltag>
|
||||
element has an attribute called
|
||||
<sgmltag class="attribute">align</sgmltag>, which suggests an
|
||||
alignment (justification) for the paragraph to the program
|
||||
displaying the <acronym>XHTML</acronym>.</para>
|
||||
|
||||
<para>The <sgmltag class="attribute">align</sgmltag> attribute can take one of four
|
||||
defined values, <literal>left</literal>,
|
||||
<para>The <sgmltag class="attribute">align</sgmltag> attribute can
|
||||
take one of four defined values, <literal>left</literal>,
|
||||
<literal>center</literal>, <literal>right</literal> and
|
||||
<literal>justify</literal>. If the attribute is not specified
|
||||
then the default is <literal>left</literal>.</para>
|
||||
|
@ -349,8 +355,8 @@
|
|||
|
||||
<para>Attribute values in <acronym>XML</acronym> must be enclosed
|
||||
in either single or double quotes. Double quotes are
|
||||
traditional. Single quotes are useful when the attribute
|
||||
value contains double quotes.</para>
|
||||
traditional. Single quotes are useful when the attribute value
|
||||
contains double quotes.</para>
|
||||
|
||||
<para>Information about attributes, elements, and tags is stored
|
||||
in catalog files. The Documentation Project uses standard
|
||||
|
@ -371,17 +377,17 @@
|
|||
<para>Install
|
||||
<filename role="package">textproc/docproj</filename> from
|
||||
the &os; Ports Collection. This is a
|
||||
<emphasis>meta-port</emphasis> that downloads and
|
||||
installs the standard programs and supporting files needed
|
||||
by the Documentation Project.</para>
|
||||
<emphasis>meta-port</emphasis> that downloads and installs
|
||||
the standard programs and supporting files needed by the
|
||||
Documentation Project.</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>Add lines to the shell startup files to set
|
||||
<envar>SGML_CATALOG_FILES</envar>. When working on non-English
|
||||
versions of the documentation, replace
|
||||
<replaceable>en_US.ISO8859-1</replaceable> with the appropriate directory for the
|
||||
target language.</para>
|
||||
<envar>SGML_CATALOG_FILES</envar>. When working on
|
||||
non-English versions of the documentation, replace
|
||||
<replaceable>en_US.ISO8859-1</replaceable> with the
|
||||
appropriate directory for the target language.</para>
|
||||
|
||||
<example id="xml-primer-envars">
|
||||
<title><filename>.profile</filename>, for &man.sh.1; and
|
||||
|
@ -410,9 +416,9 @@ setenv SGML_CATALOG_FILES /usr/doc/share/xml/catalog:$SGML_CATALOG_FILES
|
|||
setenv SGML_CATALOG_FILES /usr/doc/<replaceable>en_US.ISO8859-1</replaceable>/share/xml/catalog:$SGML_CATALOG_FILES</programlisting>
|
||||
</example>
|
||||
|
||||
<para>After making these changes, either log out and log back in again, or run
|
||||
the commands from the command line to set the variable
|
||||
values.</para>
|
||||
<para>After making these changes, either log out and log
|
||||
back in again, or run the commands from the command line
|
||||
to set the variable values.</para>
|
||||
</step>
|
||||
</procedure>
|
||||
|
||||
|
@ -439,41 +445,44 @@ setenv SGML_CATALOG_FILES /usr/doc/<replaceable>en_US.ISO8859-1</replaceable>/sh
|
|||
</step>
|
||||
|
||||
<step>
|
||||
<para>Try to validate this file using an <acronym>XML</acronym> parser.</para>
|
||||
<para>Try to validate this file using an
|
||||
<acronym>XML</acronym> parser.</para>
|
||||
|
||||
<para><filename role="package">textproc/docproj</filename> includes
|
||||
the <command>xmllint</command>
|
||||
<para><filename role="package">textproc/docproj</filename>
|
||||
includes the <command>xmllint</command>
|
||||
<link linkend="xml-primer-validating">validating
|
||||
parser</link>.</para>
|
||||
|
||||
<para>Use <command>xmllint</command> to
|
||||
validate the document:</para>
|
||||
<para>Use <command>xmllint</command> to validate the
|
||||
document:</para>
|
||||
|
||||
<screen>&prompt.user; <userinput>xmllint --valid --noout example.xml</userinput></screen>
|
||||
|
||||
<para><command>xmllint</command> returns
|
||||
without displaying any output, showing that the
|
||||
document validated successfully.</para>
|
||||
<para><command>xmllint</command> returns without displaying
|
||||
any output, showing that the document validated
|
||||
successfully.</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>See what happens when required elements are omitted.
|
||||
Delete the line with the <sgmltag class="starttag">title</sgmltag> and
|
||||
<sgmltag class="endtag">/title</sgmltag> tags, and re-run the
|
||||
validation.</para>
|
||||
Delete the line with the
|
||||
<sgmltag class="starttag">title</sgmltag> and
|
||||
<sgmltag class="endtag">/title</sgmltag> tags, and re-run
|
||||
the validation.</para>
|
||||
|
||||
<screen>&prompt.user; <userinput>xmllint --valid --noout example.xml</userinput>
|
||||
example.xml:5: element head: validity error : Element head content does not follow the DTD, expecting ((script | style | meta | link | object | isindex)* , ((title , (script | style | meta | link | object | isindex)* , (base , (script | style | meta | link | object | isindex)*)?) | (base , (script | style | meta | link | object | isindex)* , title , (script | style | meta | link | object | isindex)*))), got ()</screen>
|
||||
|
||||
<para>This shows that the validation error comes from
|
||||
the <replaceable>fifth</replaceable> line of the
|
||||
<para>This shows that the validation error comes from the
|
||||
<replaceable>fifth</replaceable> line of the
|
||||
<replaceable>example.xml</replaceable> file and that the
|
||||
content of the <sgmltag class="starttag">head</sgmltag> is the part which
|
||||
does not follow the rules of the <acronym>XHTML</acronym> grammar.</para>
|
||||
content of the <sgmltag class="starttag">head</sgmltag> is
|
||||
the part which does not follow the rules of the
|
||||
<acronym>XHTML</acronym> grammar.</para>
|
||||
|
||||
<para>Then <command>xmllint</command> shows
|
||||
the line where the error was found and marks the
|
||||
exact character position with a <literal>^</literal> sign.</para>
|
||||
<para>Then <command>xmllint</command> shows the line where
|
||||
the error was found and marks the exact character position
|
||||
with a <literal>^</literal> sign.</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
|
@ -486,13 +495,15 @@ example.xml:5: element head: validity error : Element head content does not foll
|
|||
<sect1 id="xml-primer-doctype-declaration">
|
||||
<title>The DOCTYPE Declaration</title>
|
||||
|
||||
<para>The beginning of each document can specify
|
||||
the name of the <acronym>DTD</acronym> to which the document conforms.
|
||||
This DOCTYPE declaration is used by <acronym>XML</acronym> parsers to
|
||||
identify the <acronym>DTD</acronym> and ensure that the document does conform to it.</para>
|
||||
<para>The beginning of each document can specify the name of the
|
||||
<acronym>DTD</acronym> to which the document conforms. This
|
||||
DOCTYPE declaration is used by <acronym>XML</acronym> parsers to
|
||||
identify the <acronym>DTD</acronym> and ensure that the document
|
||||
does conform to it.</para>
|
||||
|
||||
<para>A typical declaration for a document written to conform with
|
||||
version 1.0 of the <acronym>XHTML</acronym> <acronym>DTD</acronym> looks like this:</para>
|
||||
version 1.0 of the <acronym>XHTML</acronym>
|
||||
<acronym>DTD</acronym> looks like this:</para>
|
||||
|
||||
<programlisting><sgmltag class="starttag">!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"</sgmltag></programlisting>
|
||||
|
||||
|
@ -512,8 +523,8 @@ example.xml:5: element head: validity error : Element head content does not foll
|
|||
<term><literal>DOCTYPE</literal></term>
|
||||
|
||||
<listitem>
|
||||
<para>Shows that this is an <acronym>XML</acronym> declaration of the
|
||||
document type.</para>
|
||||
<para>Shows that this is an <acronym>XML</acronym>
|
||||
declaration of the document type.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
|
@ -528,21 +539,27 @@ example.xml:5: element head: validity error : Element head content does not foll
|
|||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><literal>PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"</literal></term>
|
||||
<term><literal>PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
||||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"</literal></term>
|
||||
|
||||
<listitem>
|
||||
<para>Lists the Formal Public Identifier (<acronym>FPI</acronym>)
|
||||
<para>Lists the Formal Public Identifier
|
||||
(<acronym>FPI</acronym>)
|
||||
<indexterm>
|
||||
<primary>Formal Public Identifier</primary>
|
||||
</indexterm>
|
||||
for the <acronym>DTD</acronym> to which this document conforms. The <acronym>XML</acronym>
|
||||
parser uses this to find the correct <acronym>DTD</acronym> when
|
||||
processing this document.</para>
|
||||
for the <acronym>DTD</acronym> to which this document
|
||||
conforms. The <acronym>XML</acronym> parser uses this to
|
||||
find the correct <acronym>DTD</acronym> when processing
|
||||
this document.</para>
|
||||
|
||||
<para><literal>PUBLIC</literal> is not a part of the <acronym>FPI</acronym>,
|
||||
but indicates to the <acronym>XML</acronym> processor how to find the <acronym>DTD</acronym>
|
||||
referenced in the <acronym>FPI</acronym>. Other ways of telling the <acronym>XML</acronym>
|
||||
parser how to find the <acronym>DTD</acronym> are shown <link
|
||||
<para><literal>PUBLIC</literal> is not a part of the
|
||||
<acronym>FPI</acronym>, but indicates to the
|
||||
<acronym>XML</acronym> processor how to find the
|
||||
<acronym>DTD</acronym> referenced in the
|
||||
<acronym>FPI</acronym>. Other ways of telling the
|
||||
<acronym>XML</acronym> parser how to find the
|
||||
<acronym>DTD</acronym> are shown <link
|
||||
linkend="xml-primer-fpi-alternatives">later</link>.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
@ -551,7 +568,8 @@ example.xml:5: element head: validity error : Element head content does not foll
|
|||
<term><literal>"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"</literal></term>
|
||||
|
||||
<listitem>
|
||||
<para>A local filename or a <acronym>URL</acronym> to find the <acronym>DTD</acronym>.</para>
|
||||
<para>A local filename or a <acronym>URL</acronym> to find
|
||||
the <acronym>DTD</acronym>.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
|
@ -559,24 +577,29 @@ example.xml:5: element head: validity error : Element head content does not foll
|
|||
<term><literal>></literal></term>
|
||||
|
||||
<listitem>
|
||||
<para>Ends the declaration and returns to the document.</para>
|
||||
<para>Ends the declaration and returns to the
|
||||
document.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
|
||||
<sect2>
|
||||
<title>Formal Public Identifiers (<acronym>FPI</acronym>s)</title>
|
||||
<title>Formal Public Identifiers
|
||||
(<acronym>FPI</acronym>s)</title>
|
||||
|
||||
<indexterm significance="preferred">
|
||||
<primary>Formal Public Identifier</primary>
|
||||
</indexterm>
|
||||
|
||||
<note>
|
||||
<para>It is not necessary to know this, but it is useful
|
||||
background, and might help debug problems when the <acronym>XML</acronym>
|
||||
processor can not locate the <acronym>DTD</acronym>.</para>
|
||||
background, and might help debug problems when the
|
||||
<acronym>XML</acronym> processor can not locate the
|
||||
<acronym>DTD</acronym>.</para>
|
||||
</note>
|
||||
|
||||
<para><acronym>FPI</acronym>s must follow a specific syntax:</para>
|
||||
<para><acronym>FPI</acronym>s must follow a specific
|
||||
syntax:</para>
|
||||
|
||||
<programlisting>"<replaceable>Owner</replaceable>//<replaceable>Keyword</replaceable> <replaceable>Description</replaceable>//<replaceable>Language</replaceable>"</programlisting>
|
||||
|
||||
|
@ -587,14 +610,18 @@ example.xml:5: element head: validity error : Element head content does not foll
|
|||
<listitem>
|
||||
<para>The owner of the <acronym>FPI</acronym>.</para>
|
||||
|
||||
<para>The beginning of the string identifies the owner
|
||||
of the <acronym>FPI</acronym>. For example, the <acronym>FPI</acronym>
|
||||
<para>The beginning of the string identifies the owner of
|
||||
the <acronym>FPI</acronym>. For example, the
|
||||
<acronym>FPI</acronym>
|
||||
<literal>"ISO 8879:1986//ENTITIES Greek
|
||||
Symbols//EN"</literal> lists
|
||||
<literal>ISO 8879:1986</literal> as being the owner for
|
||||
the set of entities for Greek symbols. <acronym>ISO</acronym> 8879:1986 is
|
||||
the International Organization for Standardization (<acronym>ISO</acronym>) number for the <acronym>SGML</acronym> standard, the predecessor
|
||||
(and a superset) of <acronym>XML</acronym>.</para>
|
||||
the set of entities for Greek symbols.
|
||||
<acronym>ISO</acronym> 8879:1986 is the International
|
||||
Organization for Standardization
|
||||
(<acronym>ISO</acronym>) number for the
|
||||
<acronym>SGML</acronym> standard, the predecessor (and a
|
||||
superset) of <acronym>XML</acronym>.</para>
|
||||
|
||||
<para>Otherwise, this string will either look like
|
||||
<literal>-//<replaceable>Owner</replaceable></literal>
|
||||
|
@ -608,19 +635,21 @@ example.xml:5: element head: validity error : Element head content does not foll
|
|||
<literal>+</literal> identifying it as
|
||||
registered.</para>
|
||||
|
||||
<para><acronym>ISO</acronym> 9070:1991 defines how registered names are
|
||||
generated. It might be derived from the number of an <acronym>ISO</acronym>
|
||||
publication, an <acronym>ISBN</acronym> code, or an organization code
|
||||
assigned according to <acronym>ISO</acronym> 6523. Additionally, a
|
||||
<para><acronym>ISO</acronym> 9070:1991 defines how
|
||||
registered names are generated. It might be derived
|
||||
from the number of an <acronym>ISO</acronym>
|
||||
publication, an <acronym>ISBN</acronym> code, or an
|
||||
organization code assigned according to
|
||||
<acronym>ISO</acronym> 6523. Additionally, a
|
||||
registration authority could be created in order to
|
||||
assign registered names. The <acronym>ISO</acronym> council delegated this
|
||||
to the American National Standards Institute
|
||||
(<acronym>ANSI</acronym>).</para>
|
||||
assign registered names. The <acronym>ISO</acronym>
|
||||
council delegated this to the American National
|
||||
Standards Institute (<acronym>ANSI</acronym>).</para>
|
||||
|
||||
<para>Because the &os; Project has not been registered,
|
||||
the owner string is <literal>-//&os;</literal>. As
|
||||
seen in the example, the <acronym>W3C</acronym> are not a registered owner
|
||||
either.</para>
|
||||
the owner string is <literal>-//&os;</literal>. As seen
|
||||
in the example, the <acronym>W3C</acronym> are not a
|
||||
registered owner either.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
|
@ -632,11 +661,13 @@ example.xml:5: element head: validity error : Element head content does not foll
|
|||
information in the file. Some of the most common
|
||||
keywords are <literal>DTD</literal>,
|
||||
<literal>ELEMENT</literal>, <literal>ENTITIES</literal>,
|
||||
and <literal>TEXT</literal>. <literal>DTD</literal> is
|
||||
used only for <acronym>DTD</acronym> files, <literal>ELEMENT</literal> is
|
||||
usually used for <acronym>DTD</acronym> fragments that contain only entity
|
||||
or element declarations. <literal>TEXT</literal> is
|
||||
used for <acronym>XML</acronym> content (text and tags).</para>
|
||||
and <literal>TEXT</literal>. <literal>DTD</literal> is
|
||||
used only for <acronym>DTD</acronym> files,
|
||||
<literal>ELEMENT</literal> is usually used for
|
||||
<acronym>DTD</acronym> fragments that contain only
|
||||
entity or element declarations. <literal>TEXT</literal>
|
||||
is used for <acronym>XML</acronym> content (text and
|
||||
tags).</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
|
@ -655,9 +686,9 @@ example.xml:5: element head: validity error : Element head content does not foll
|
|||
<term><replaceable>Language</replaceable></term>
|
||||
|
||||
<listitem>
|
||||
<para>An <acronym>ISO</acronym> two-character code that identifies
|
||||
the native language for the file. <literal>EN</literal>
|
||||
is used for English.</para>
|
||||
<para>An <acronym>ISO</acronym> two-character code that
|
||||
identifies the native language for the file.
|
||||
<literal>EN</literal> is used for English.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
|
@ -665,46 +696,46 @@ example.xml:5: element head: validity error : Element head content does not foll
|
|||
<sect3>
|
||||
<title><filename>catalog</filename> Files</title>
|
||||
|
||||
<para>With the syntax above,
|
||||
an <acronym>XML</acronym> processor needs to have
|
||||
some way of turning the <acronym>FPI</acronym> into the name of the file
|
||||
containing the <acronym>DTD</acronym>. A
|
||||
catalog file (typically called <filename>catalog</filename>)
|
||||
contains lines that map <acronym>FPI</acronym>s to filenames. For example, if
|
||||
the catalog file contained the line:</para>
|
||||
<para>With the syntax above, an <acronym>XML</acronym>
|
||||
processor needs to have some way of turning the
|
||||
<acronym>FPI</acronym> into the name of the file containing
|
||||
the <acronym>DTD</acronym>. A catalog file (typically
|
||||
called <filename>catalog</filename>) contains lines that map
|
||||
<acronym>FPI</acronym>s to filenames. For example, if the
|
||||
catalog file contained the line:</para>
|
||||
|
||||
<!-- XXX: mention XML catalog or maybe replace this totally and only cover XML catalog -->
|
||||
|
||||
<programlisting>PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "1.0/transitional.dtd"</programlisting>
|
||||
|
||||
<para>The <acronym>XML</acronym> processor knows that the <acronym>DTD</acronym> is
|
||||
called <filename>transitional.dtd</filename> in the
|
||||
<para>The <acronym>XML</acronym> processor knows that the
|
||||
<acronym>DTD</acronym> is called
|
||||
<filename>transitional.dtd</filename> in the
|
||||
<filename>1.0</filename> subdirectory of the directory that
|
||||
held the <filename>catalog</filename> file.</para>
|
||||
|
||||
<para>Examine the contents of
|
||||
<filename>/usr/local/share/xml/dtd/xhtml/catalog.xml</filename>.
|
||||
This is the catalog file for the <acronym>XHTML</acronym> <acronym>DTD</acronym>s that was
|
||||
installed as part of the <filename
|
||||
This is the catalog file for the <acronym>XHTML</acronym>
|
||||
<acronym>DTD</acronym>s that was installed as part of the
|
||||
<filename
|
||||
role="package">textproc/docproj</filename> port.</para>
|
||||
</sect3>
|
||||
|
||||
<sect3>
|
||||
<title><envar>SGML_CATALOG_FILES</envar></title>
|
||||
|
||||
<para>To locate a <filename>catalog</filename> file,
|
||||
the <acronym>XML</acronym> processor must know where to look. Many
|
||||
feature command line parameters for specifying the
|
||||
path to one or more catalogs.</para>
|
||||
<para>To locate a <filename>catalog</filename> file, the
|
||||
<acronym>XML</acronym> processor must know where to look.
|
||||
Many feature command line parameters for specifying the path
|
||||
to one or more catalogs.</para>
|
||||
|
||||
<para>In addition,
|
||||
<envar>SGML_CATALOG_FILES</envar> can be set to point to the files.
|
||||
This environment variable consists of a
|
||||
colon-separated list of catalog files (including their full
|
||||
path).</para>
|
||||
<para>In addition, <envar>SGML_CATALOG_FILES</envar> can be
|
||||
set to point to the files. This environment variable
|
||||
consists of a colon-separated list of catalog files
|
||||
(including their full path).</para>
|
||||
|
||||
<para>Typically, the list includes these
|
||||
files:</para>
|
||||
<para>Typically, the list includes these files:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
|
@ -724,30 +755,34 @@ example.xml:5: element head: validity error : Element head content does not foll
|
|||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>This was done <link linkend="xml-primer-envars">earlier</link>.</para>
|
||||
<para>This was done
|
||||
<link linkend="xml-primer-envars">earlier</link>.</para>
|
||||
</sect3>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="xml-primer-fpi-alternatives">
|
||||
<title>Alternatives to <acronym>FPI</acronym>s</title>
|
||||
|
||||
<para>Instead of using an <acronym>FPI</acronym> to indicate the <acronym>DTD</acronym> to which
|
||||
the document conforms (and therefore, which file on the system
|
||||
contains the <acronym>DTD</acronym>), the filename can be explicitly specified.</para>
|
||||
<para>Instead of using an <acronym>FPI</acronym> to indicate the
|
||||
<acronym>DTD</acronym> to which the document conforms (and
|
||||
therefore, which file on the system contains the
|
||||
<acronym>DTD</acronym>), the filename can be explicitly
|
||||
specified.</para>
|
||||
|
||||
<para>The syntax is slightly different:</para>
|
||||
|
||||
<programlisting><sgmltag class="starttag">!DOCTYPE html SYSTEM "/path/to/file.dtd"</sgmltag></programlisting>
|
||||
|
||||
<para>The <literal>SYSTEM</literal> keyword indicates that the
|
||||
<acronym>XML</acronym> processor should locate the <acronym>DTD</acronym> in a system specific
|
||||
fashion. This typically (but not always) means the <acronym>DTD</acronym> will
|
||||
be provided as a filename.</para>
|
||||
<acronym>XML</acronym> processor should locate the
|
||||
<acronym>DTD</acronym> in a system specific fashion. This
|
||||
typically (but not always) means the <acronym>DTD</acronym>
|
||||
will be provided as a filename.</para>
|
||||
|
||||
<para>Using <acronym>FPI</acronym>s is preferred for reasons of portability.
|
||||
If the <literal>SYSTEM</literal>
|
||||
identifier is used, then the <acronym>DTD</acronym> must be provided and kept in the same location
|
||||
for everyone.</para>
|
||||
<para>Using <acronym>FPI</acronym>s is preferred for reasons of
|
||||
portability. If the <literal>SYSTEM</literal> identifier is
|
||||
used, then the <acronym>DTD</acronym> must be provided and
|
||||
kept in the same location for everyone.</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
|
@ -1031,9 +1066,11 @@ example.xml:5: element head: validity error : Element head content does not foll
|
|||
the entity reference <literal>&version;</literal>
|
||||
replaced with the version number. Most web browsers have
|
||||
very simplistic parsers which do not handle XML DTD
|
||||
constructs. Furthermore, the closing <literal>]<</literal>
|
||||
of the XML context are not recognized properly by browser and
|
||||
will probably be rendered.</para>
|
||||
constructs. Furthermore, the closing
|
||||
<literal>]<</literal> of the XML context are not
|
||||
recognized properly by browser and will probably be
|
||||
rendered.</para>
|
||||
|
||||
</step>
|
||||
|
||||
<step>
|
||||
|
@ -1349,20 +1386,19 @@ example.xml:5: element head: validity error : Element head content does not foll
|
|||
<para>The content model you will probably find most
|
||||
useful is <literal>CDATA</literal>.</para>
|
||||
|
||||
<para><literal>CDATA</literal> is for <quote>Character
|
||||
Data</quote>. If the parser is in this content model then
|
||||
it is expecting to see characters, and characters only. In
|
||||
this model the <literal><</literal> and
|
||||
<literal>&</literal> symbols lose their special status,
|
||||
and will be treated as ordinary characters.</para>
|
||||
<para><literal>CDATA</literal> is for
|
||||
<quote>Character Data</quote>. If the parser is in this
|
||||
content model then it is expecting to see characters, and
|
||||
characters only. In this model the <literal><</literal>
|
||||
and <literal>&</literal> symbols lose their special
|
||||
status, and will be treated as ordinary characters.</para>
|
||||
|
||||
<note>
|
||||
<para>When you use <literal>CDATA</literal>
|
||||
in examples of text marked up in
|
||||
XML, keep in mind that the content of
|
||||
<para>When you use <literal>CDATA</literal> in examples of
|
||||
text marked up in XML, keep in mind that the content of
|
||||
<literal>CDATA</literal> is not validated. You have to
|
||||
check the included XML text using other means. You
|
||||
could, for example, write the example in another document,
|
||||
check the included XML text using other means. You could,
|
||||
for example, write the example in another document,
|
||||
validate the example code, and then paste it to your
|
||||
<literal>CDATA</literal> content.</para>
|
||||
</note>
|
||||
|
@ -1482,8 +1518,8 @@ example.xml:5: element head: validity error : Element head content does not foll
|
|||
|
||||
<procedure>
|
||||
<step>
|
||||
<para>Modify the <filename>entities.ent</filename> file to contain
|
||||
the following:</para>
|
||||
<para>Modify the <filename>entities.ent</filename> file to
|
||||
contain the following:</para>
|
||||
|
||||
<programlisting><!ENTITY version "1.1">
|
||||
<!ENTITY % conditional.text "IGNORE">
|
||||
|
@ -1499,13 +1535,15 @@ example.xml:5: element head: validity error : Element head content does not foll
|
|||
</step>
|
||||
|
||||
<step>
|
||||
<para>Normalize the <filename>example.xml</filename> file and notice
|
||||
that the conditional text is not present on the output document.
|
||||
Now if you set the parameter entity guard to <literal>INCLUDE</literal>
|
||||
and regenerate the normalized document, it will appear there again.
|
||||
Of course, this method makes more sense if you have more conditional
|
||||
chunks that depend on the same condition, for example, whether you are
|
||||
generating printed or online text.</para>
|
||||
<para>Normalize the <filename>example.xml</filename> file
|
||||
and notice that the conditional text is not present on the
|
||||
output document. Now if you set the parameter entity
|
||||
guard to <literal>INCLUDE</literal> and regenerate the
|
||||
normalized document, it will appear there again. Of
|
||||
course, this method makes more sense if you have more
|
||||
conditional chunks that depend on the same condition, for
|
||||
example, whether you are generating printed or online
|
||||
text.</para>
|
||||
</step>
|
||||
</procedure>
|
||||
</sect2>
|
||||
|
|
Loading…
Reference in a new issue