1572 lines
57 KiB
Text
1572 lines
57 KiB
Text
<!-- Copyright (c) 1998, 1999 Nik Clayton, All rights reserved.
|
|
|
|
Redistribution and use in source (SGML DocBook) and 'compiled' forms
|
|
(SGML, HTML, PDF, PostScript, RTF and so forth) with or without
|
|
modification, are permitted provided that the following conditions
|
|
are met:
|
|
|
|
1. Redistributions of source code (SGML DocBook) must retain the above
|
|
copyright notice, this list of conditions and the following
|
|
disclaimer as the first lines of this file unmodified.
|
|
|
|
2. Redistributions in compiled form (transformed to other DTDs,
|
|
converted to PDF, PostScript, RTF and other formats) must reproduce
|
|
the above copyright notice, this list of conditions and the
|
|
following disclaimer in the documentation and/or other materials
|
|
provided with the distribution.
|
|
|
|
THIS DOCUMENTATION IS PROVIDED BY NIK CLAYTON "AS IS" AND ANY EXPRESS OR
|
|
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
|
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
|
DISCLAIMED. IN NO EVENT SHALL NIK CLAYTON BE LIABLE FOR ANY DIRECT,
|
|
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
|
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
|
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
|
|
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
|
|
ANY WAY OUT OF THE USE OF THIS DOCUMENTATION, EVEN IF ADVISED OF THE
|
|
POSSIBILITY OF SUCH DAMAGE.
|
|
-->
|
|
|
|
<chapter id="sgml-primer">
|
|
<title>SGML Primer</title>
|
|
|
|
<para>The majority of FDP documentation is written in applications of
|
|
SGML. This chapter explains exactly what that means, how to read
|
|
and understand the source to the documentation, and the sort of SGML
|
|
tricks you will see used in the documentation.</para>
|
|
|
|
<para>Portions of this section were inspired by Mark Galassi's <ulink
|
|
url="http://nis-www.lanl.gov/~rosalia/mydocs/docbook-intro/docbook-intro.html">Get Going With DocBook</ulink>.</para>
|
|
|
|
<sect1>
|
|
<title>Overview</title>
|
|
|
|
<para>Way back when, electronic text was simple to deal with. Admittedly,
|
|
you had to know which character set your document was written in (ASCII,
|
|
EBCDIC, or one of a number of others) but that was about it. Text was
|
|
text, and what you saw really was what you got. No frills, no
|
|
formatting, no intelligence.</para>
|
|
|
|
<para>Inevitably, this was not enough. Once you have text in a
|
|
machine-usable format, you expect machines to be able to use it and
|
|
manipulate it intelligently. You would like to indicate that certain
|
|
phrases should be emphasised, or added to a glossary, or be hyperlinks.
|
|
You might want filenames to be shown in a “typewriter” style
|
|
font for viewing on screen, but as “italics” when printed,
|
|
or any of a myriad of other options for presentation.</para>
|
|
|
|
<para>It was once hoped that Artificial Intelligence (AI) would make this
|
|
easy. Your computer would read in the document and automatically
|
|
identify key phrases, filenames, text that the reader should type in,
|
|
examples, and more. Unfortunately, real life has not happened quite
|
|
like that, and our computers require some assistance before they can
|
|
meaningfully process our text.</para>
|
|
|
|
<para>More precisely, they need help identifying what is what. You or I
|
|
can look at
|
|
|
|
<blockquote>
|
|
<para>To remove <filename>/tmp/foo</filename> use &man.rm.1;.</para>
|
|
|
|
<screen>&prompt.user; <command>rm /tmp/foo</command></screen>
|
|
</blockquote>
|
|
|
|
and easily see which parts are filenames, which are commands to be typed
|
|
in, which parts are references to manual pages, and so on. But the
|
|
computer processing the document can not. For this we need
|
|
markup.</para>
|
|
|
|
<para>“Markup” is commonly used to describe “adding
|
|
value” or “increasing cost”. The term takes on both
|
|
these meanings when applied to text. Markup is additional text included
|
|
in the document, distinguished from the document's content in some way,
|
|
so that programs that process the document can read the markup and use
|
|
it when making decisions about the document. Editors can hide the
|
|
markup from the user, so the user is not distracted by it.</para>
|
|
|
|
<para>The extra information stored in the markup <emphasis>adds
|
|
value</emphasis> to the document. Adding the markup to the document
|
|
must typically be done by a person—after all, if computers could
|
|
recognise the text sufficiently well to add the markup then there would
|
|
be no need to add it in the first place. This <emphasis>increases the
|
|
cost</emphasis> of the document.</para>
|
|
|
|
<para>The previous example is actually represented in this document like
|
|
this;</para>
|
|
|
|
<programlisting><![ CDATA [
|
|
<para>To remove <filename>/tmp/foo</filename> use &man.rm.1;.</para>
|
|
|
|
<para><command>rm /tmp/foo</command></para>]]></programlisting>
|
|
|
|
<para>As you can see, the markup is clearly separate from the
|
|
content.</para>
|
|
|
|
<para>Obviously, if you are going to use markup you need to define what
|
|
your markup means, and how it should be interpreted. You will need a
|
|
markup language that you can follow when marking up your
|
|
documents.</para>
|
|
|
|
<para>Of course, one markup language might not be enough. A markup
|
|
language for technical documentation has very different requirements
|
|
than a markup language that was to be used for cookery recipes. This,
|
|
in turn, would be very different from a markup language used to describe
|
|
poetry. What you really need is a first language that you use to write
|
|
these other markup languages. A <emphasis>meta markup
|
|
language</emphasis>.</para>
|
|
|
|
<para>This is exactly what the Standard Generalised Markup Language (SGML)
|
|
is. Many markup languages have been written in SGML, including the two
|
|
most used by the FDP, HTML and DocBook.</para>
|
|
|
|
<para>Each language definition is more properly called a Document Type
|
|
Definition (DTD). The DTD specifies the name of the elements that can
|
|
be used, what order they appear in (and whether some markup can be used
|
|
inside other markup) and related information. A DTD is sometimes
|
|
referred to as an <emphasis>application</emphasis> of SGML.</para>
|
|
|
|
<para id="sgml-primer-validating">A DTD is a <emphasis>complete</emphasis>
|
|
specification of all the elements that are allowed to appear, the order
|
|
in which they should appear, which elements are mandatory, which are
|
|
optional, and so forth. This makes it possible to write an SGML
|
|
<emphasis>parser</emphasis> which reads in both the DTD and a document
|
|
which claims to conform to the DTD. The parser can then confirm whether
|
|
or not all the elements required by the DTD are in the document in the
|
|
right order, and whether there are any errors in the markup. This is
|
|
normally referred to as <quote>validating the document</quote>.</para>
|
|
|
|
<note>
|
|
<para>This processing simply confirms that the choice of elements, their
|
|
ordering, and so on, conforms to that listed in the DTD. It does
|
|
<emphasis>not</emphasis> check that you have used
|
|
<emphasis>appropriate</emphasis> markup for the content. If you were
|
|
to try and mark up all the filenames in your document as function
|
|
names, the parser would not flag this as an error (assuming, of
|
|
course, that your DTD defines elements for filenames and functions,
|
|
and that they are allowed to appear in the same place).</para>
|
|
</note>
|
|
|
|
<para>It is likely that most of your contributions to the Documentation
|
|
Project will consist of content marked up in either HTML or DocBook,
|
|
rather than alterations to the DTDs. For this reason this book will
|
|
not touch on how to write a DTD.</para>
|
|
</sect1>
|
|
|
|
<sect1 id="elements">
|
|
<title>Elements, tags, and attributes</title>
|
|
|
|
<para>All the DTDs written in SGML share certain characteristics. This is
|
|
hardly surprising, as the philosophy behind SGML will inevitably show
|
|
through. One of the most obvious manifestations of this philisophy is
|
|
that of <emphasis>content</emphasis> and
|
|
<emphasis>elements</emphasis>.</para>
|
|
|
|
<para>Your documentation (whether it is a single web page, or a lengthy
|
|
book) is considered to consist of content. This content is then divided
|
|
(and further subdivided) into elements. The purpose of adding markup is
|
|
to name and identify the boundaries of these elements for further
|
|
processing.</para>
|
|
|
|
<para>For example, consider a typical book. At the very top level, the
|
|
book is itself an element. This “book” element obviously
|
|
contains chapters, which can be considered to be elements in their own
|
|
right. Each chapter will contain more elements, such as paragraphs,
|
|
quotations, and footnotes. Each paragraph might contain further
|
|
elements, identifying content that was direct speech, or the name of a
|
|
character in the story.</para>
|
|
|
|
<para>You might like to think of this as “chunking” content.
|
|
At the very top level you have one chunk, the book. Look a little
|
|
deeper, and you have more chunks, the individual chapters. These are
|
|
chunked further into paragraphs, footnotes, character names, and so
|
|
on.</para>
|
|
|
|
<para>Notice how you can make this differentation between different
|
|
elements of the content without resorting to any SGML terms. It really
|
|
is surprisingly straightforward. You could do this with a highlighter
|
|
pen and a printout of the book, using different colours to indicate
|
|
different chunks of content.</para>
|
|
|
|
<para>Of course, we do not have an electronic highlighter pen, so we need
|
|
some other way of indicating which element each piece of content belongs
|
|
to. In languages written in SGML (HTML, DocBook, et al) this is done by
|
|
means of <emphasis>tags</emphasis>.</para>
|
|
|
|
<para>A tag is used to identify where a particular element starts, and
|
|
where the element ends. <emphasis>The tag is not part of the element
|
|
itself</emphasis>. Because each DTD was normally written to mark up
|
|
specific types of information, each one will recognise different
|
|
elements, and will therefore have different names for the tags.</para>
|
|
|
|
<para>For an element called <replaceable>element-name</replaceable> the
|
|
start tag will normally look like
|
|
<literal><<replaceable>element-name</replaceable>></literal>. The
|
|
corresponding closing tag for this element is
|
|
<literal></<replaceable>element-name</replaceable>></literal>.</para>
|
|
|
|
<example>
|
|
<title>Using an element (start and end tags)</title>
|
|
|
|
<para>HTML has an element for indicating that the content enclosed by
|
|
the element is a paragraph, called <literal>p</literal>. This
|
|
element has both start and end tags.</para>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<p>This is a paragraph. It starts with the start tag for
|
|
the 'p' element, and it will end with the end tag for the 'p'
|
|
element.</p>
|
|
|
|
<p>This is another paragraph. But this one is much shorter.</p>]]></programlisting>
|
|
</example>
|
|
|
|
<para>Not all elements require an end tag. Some elements have no content.
|
|
For example, in HTML you can indicate that you want a horizontal line to
|
|
appear in the document. Obviously, this line has no content, so just
|
|
the start tag is required for this element.</para>
|
|
|
|
<example>
|
|
<title>Using an element (start tag only)</title>
|
|
|
|
<para>HTML has an element for indicating a horizontal rule, called
|
|
<literal>hr</literal>. This element does not wrap content, so only has
|
|
a start tag.</para>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<p>This is a paragraph.</p>
|
|
|
|
<hr>
|
|
|
|
<p>This is another paragraph. A horizontal rule separates this
|
|
from the previous paragraph.</p>]]></programlisting>
|
|
</example>
|
|
|
|
<para>If it is not obvious by now, elements can contain other elements.
|
|
In the book example earlier, the book element contained all the chapter
|
|
elements, which in turn contained all the paragraph elements, and so
|
|
on.</para>
|
|
|
|
<example>
|
|
<title>Elements within elements; <sgmltag>em</sgmltag></title>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<p>This is a simple <em>paragraph</em> where some
|
|
of the <em>words</em> have been <em>emphasised</em>.</p>]]></programlisting>
|
|
</example>
|
|
|
|
<para>The DTD will specify the rules detailing which elements can contain
|
|
other elements, and exactly what they can contain.</para>
|
|
|
|
<important>
|
|
<para>People often confuse the terms tags and elements, and use the terms
|
|
as if they were interchangeable. They are not.</para>
|
|
|
|
<para>An element is a conceptual part of your document. An element has
|
|
a defined start and end. The tags mark where the element starts and
|
|
end.</para>
|
|
|
|
<para>When this document (or anyone else knowledgable about SGML) refers
|
|
to “the <p> tag” they mean the literal text
|
|
consisting of the three characters <literal><</literal>,
|
|
<literal>p</literal>, and <literal>></literal>. But the phrase
|
|
“the <p> element” refers to the whole element.</para>
|
|
|
|
<para>This distinction <emphasis>is</emphasis> very subtle. But keep it
|
|
in mind.</para>
|
|
</important>
|
|
|
|
<para>Elements can have attributes. An attribute has a name and a value,
|
|
and is used for adding extra information to the element. This might be
|
|
information that indicates how the content should be rendered, or might
|
|
be something that uniquely identifies that occurence of the element, or
|
|
it might be something else.</para>
|
|
|
|
<para>An element's attributes are written <emphasis>inside</emphasis> the
|
|
start tag for that element, and take the form
|
|
<literal><replaceable>attribute-name</replaceable>="<replaceable>attribute-value</replaceable>"</literal>.</para>
|
|
|
|
<para>In sufficiently recent versions of HTML, the <sgmltag>p</sgmltag>
|
|
element has an attribute called <literal>align</literal>, which suggests
|
|
an alignment (justification) for the paragraph to the program displaying
|
|
the HTML.</para>
|
|
|
|
<para>The <literal>align</literal> attribute can take one of four defined
|
|
values, <literal>left</literal>, <literal>center</literal>,
|
|
<literal>right</literal> and <literal>justify</literal>. If the
|
|
attribute is not specified then the default is
|
|
<literal>left</literal>.</para>
|
|
|
|
<example>
|
|
<title>Using an element with an attribute</title>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<p align="left">The inclusion of the align attribute
|
|
on this paragraph was superfluous, since the default is left.</p>
|
|
|
|
<p align="center">This may appear in the center.</p>]]></programlisting>
|
|
</example>
|
|
|
|
<para>Some attributes will only take specific values, such as
|
|
<literal>left</literal> or <literal>justify</literal>. Others will
|
|
allow you to enter anything you want. If you need to include quotes
|
|
(<literal>"</literal>) within an attribute then use single quotes around
|
|
the attribute value.</para>
|
|
|
|
<example>
|
|
<title>Single quotes around attributes</title>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<p align='right'>I'm on the right!</p>]]></programlisting>
|
|
</example>
|
|
|
|
<para>Sometimes you do not need to use quotes around attribute values at
|
|
all. However, the rules for doing this are subtle, and it is far simpler
|
|
just to <emphasis>always</emphasis> quote your attribute values.</para>
|
|
|
|
<sect2>
|
|
<title>For you to do…</title>
|
|
|
|
<para>In order to run the examples in this document you will need to
|
|
install some software on your system and ensure that an environment
|
|
variable is set correctly.</para>
|
|
|
|
<procedure>
|
|
<step>
|
|
<para>Download and install <filename>textproc/docproj</filename>
|
|
from the FreeBSD ports system. This is a
|
|
<emphasis>meta-port</emphasis> that should download and install
|
|
all of the programs and supporting files that are used by the
|
|
Documentation Project.</para>
|
|
</step>
|
|
|
|
<step>
|
|
<para>Add lines to your shell startup files to set
|
|
<envar>SGML_CATALOG_FILES</envar>.</para>
|
|
|
|
<example id="sgml-primer-envars">
|
|
<title><filename>.profile</filename>, for &man.sh.1; and
|
|
&man.bash.1; users</title>
|
|
|
|
<programlisting>
|
|
SGML_ROOT=/usr/local/share/sgml
|
|
SGML_CATALOG_FILES=${SGML_ROOT}/jade/catalog
|
|
SGML_CATALOG_FILES=${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
|
|
SGML_CATALOG_FILES=${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
|
|
SGML_CATALOG_FILES=${SGML_ROOT}/docbook/3.0/catalog:$SGML_CATALOG_FILES
|
|
export SGML_CATALOG_FILES</programlisting>
|
|
</example>
|
|
|
|
<example>
|
|
<title><filename>.login</filename>, for &man.csh.1; and
|
|
&man.tcsh.1; users</title>
|
|
|
|
<programlisting>
|
|
setenv SGML_ROOT /usr/local/share/sgml
|
|
setenv SGML_CATALOG_FILES ${SGML_ROOT}/jade/catalog
|
|
setenv SGML_CATALOG_FILES ${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
|
|
setenv SGML_CATALOG_FILES ${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
|
|
setenv SGML_CATALOG_FILES ${SGML_ROOT}/docbook/3.0/catalog:$SGML_CATALOG_FILES</programlisting>
|
|
</example>
|
|
|
|
<para>Then either log out, and log back in again, or run those
|
|
commands from the command line to set the variable values.</para>
|
|
</step>
|
|
</procedure>
|
|
|
|
<procedure>
|
|
<step>
|
|
<para>Create <filename>example.sgml</filename>, and enter the
|
|
following text;</para>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
|
|
<html>
|
|
<head>
|
|
<title>An example HTML file</title>
|
|
</head>
|
|
|
|
<body>
|
|
<p>This is a paragraph containing some text.</p>
|
|
|
|
<p>This paragraph contains some more text.</p>
|
|
|
|
<p align="right">This paragraph might be right-justified.</p>
|
|
</body>
|
|
</html>]]></programlisting>
|
|
</step>
|
|
|
|
<step>
|
|
<para>Try and validate this file using an SGML parser.</para>
|
|
|
|
<para>Part of <filename>textproc/docproj</filename> is the
|
|
&man.nsgmls.1; <link linkend="sgml-primer-validating">validating
|
|
parser</link>. Normally, &man.nsgmls.1; reads in a document
|
|
marked up according to an SGML DTD and returns a copy of the
|
|
document's Element Structure Information Set (ESIS, but that is
|
|
not important right now).</para>
|
|
|
|
<para>However, when <option>-s</option> is passed as a parameter to
|
|
it, &man.nsgmls.1; will suppress its normal output, and just print
|
|
error messages. This makes it a useful way to check to see if your
|
|
document is valid or not.</para>
|
|
|
|
<para>Use &man.nsgmls.1; to check that your document is
|
|
valid;</para>
|
|
|
|
<screen>&prompt.user; <userinput>nsgmls -s example.sgml</userinput></screen>
|
|
|
|
<para>As you will see, &man.nsgmls.1; returns without displaying any
|
|
output. This means that your document validated
|
|
successfully.</para>
|
|
</step>
|
|
|
|
<step>
|
|
<para>See what happens when required elements are omitted. Try
|
|
removing the <sgmltag>title</sgmltag> and <sgmltag>/title</sgmltag>
|
|
tags, and re-run the validation.</para>
|
|
|
|
<screen>&prompt.user; <userinput>nsgmls -s example.sgml</userinput>
|
|
nsgmls:example.sgml:5:4:E: character data is not allowed here
|
|
nsgmls:example.sgml:6:8:E: end tag for "HEAD" which is not finished</screen>
|
|
|
|
<para>The error output from &man.nsgmls.1; is organised into
|
|
colon-separated groups, or columns.</para>
|
|
|
|
<informaltable frame="none">
|
|
<tgroup cols="2">
|
|
<thead>
|
|
<row>
|
|
<entry>Column</entry>
|
|
<entry>Meaning</entry>
|
|
</row>
|
|
</thead>
|
|
|
|
<tbody>
|
|
<row>
|
|
<entry>1</entry>
|
|
<entry>The name of the program generating the error. This
|
|
will always be <literal>nsgmls</literal>.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>2</entry>
|
|
<entry>The name of the file that contains the error.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>3</entry>
|
|
<entry>Line number where the error appears.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>4</entry>
|
|
<entry>Column number where the error appears.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>5</entry>
|
|
<entry>A one letter code indicating the nature of the
|
|
message. <literal>I</literal> indicates an informational
|
|
message, <literal>W</literal> is for warnings, and
|
|
<literal>E</literal> is for errors<footnote>
|
|
<para>It is not always the fifth column either.
|
|
<command>nsgmls -sv</command> displays
|
|
<literal>nsgmls:I: SP version "1.3"</literal>
|
|
(depending on the installed version). As you can see,
|
|
this is an informational message.</para>
|
|
</footnote>, and <literal>X</literal> is for
|
|
cross-references. As you can see, these messages are
|
|
errors.</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>6</entry>
|
|
<entry>The text of the error message.</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</informaltable>
|
|
|
|
<para>Simply omitting the <sgmltag>title</sgmltag> tags has generated
|
|
2 different errors.</para>
|
|
|
|
<para>The first error indicates that content (in this case,
|
|
characters, rather than the start tag for an element) has occured
|
|
where the SGML parser was expecting something else. In this case,
|
|
the parser was expecting to see one of the start tags for elements
|
|
that are valid inside <sgmltag>head</sgmltag> (such as
|
|
<sgmltag>title</sgmltag>).</para>
|
|
|
|
<para>The second error is because <sgmltag>head</sgmltag> elements
|
|
<emphasis>must</emphasis> contain a <sgmltag>title</sgmltag>
|
|
element. Because it does not &man.nsgmls.1; considers that the
|
|
element has not been properly finished. However, the closing tag
|
|
indicates that the element has been closed before it has been
|
|
finished.</para>
|
|
</step>
|
|
|
|
<step>
|
|
<para>Put the <literal>title</literal> element back in.</para>
|
|
</step>
|
|
</procedure>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="doctype-declaration">
|
|
<title>The DOCTYPE declaration</title>
|
|
|
|
<para>The beginning of each document that you write must specify the name
|
|
of the DTD that the document conforms to. This is so that SGML parsers
|
|
can determine the DTD and ensure that the document does conform to
|
|
it.</para>
|
|
|
|
<para>This information is generally expressed on one line, in the DOCTYPE
|
|
declaration.</para>
|
|
|
|
<para>A typical declaration for a document written to conform with version
|
|
4.0 of the HTML DTD looks like this;</para>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN">]]></programlisting>
|
|
|
|
<para>That line contains a number of different components.</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><literal><!</literal></term>
|
|
|
|
<listitem>
|
|
<para>Is the <emphasis>indicator</emphasis> that indicates that this
|
|
is an SGML declaration. This line is declaring the document type.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><literal>DOCTYPE</literal></term>
|
|
|
|
<listitem>
|
|
<para>Shows that this is an SGML declaration for the document
|
|
type.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><literal>html</literal></term>
|
|
|
|
<listitem>
|
|
<para>Names the first <link linkend="elements">element</link> that
|
|
will appear in the document.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><literal>PUBLIC "-//W3C//DTD HTML 4.0//EN"</literal></term>
|
|
|
|
<listitem>
|
|
<para>Lists the Formal Public Identifier (FPI) for the DTD that this
|
|
document conforms to. Your SGML parser will use this to find the
|
|
correct DTD when processing this document.</para>
|
|
|
|
<para><literal>PUBLIC</literal> is not a part of the FPI, but
|
|
indicates to the SGML processor how to find the DTD referenced in
|
|
the FPI. Other ways of telling the SGML parser how to find the DTD
|
|
are shown <link linkend="fpi-alternatives">later</link>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><literal>></literal></term>
|
|
|
|
<listitem>
|
|
<para>Returns to the document.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
|
|
<sect2>
|
|
<title>Formal Public Identifiers (FPIs)</title>
|
|
|
|
<note>
|
|
<para>You don't need to know this, but it's useful background, and
|
|
might help you debug problems when your SGML processor can't locate
|
|
the DTD you are using.</para>
|
|
</note>
|
|
|
|
<para>FPIs must follow a specific syntax. This syntax is as
|
|
follows;</para>
|
|
|
|
<programlisting>
|
|
"<replaceable>Owner</replaceable>//<replaceable>Keyword</replaceable> <replaceable>Description</replaceable>//<replaceable>Language</replaceable>"</programlisting>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><replaceable>Owner</replaceable></term>
|
|
|
|
<listitem>
|
|
<para>This indicates the owner of the FPI.</para>
|
|
|
|
<para>If this string starts with “ISO” then this is an
|
|
ISO owned FPI. For example, the FPI <literal>"ISO
|
|
8879:1986//ENTITIES Greek Symbols//EN"</literal> lists
|
|
<literal>ISO 8879:1986</literal> as being the owner for the set
|
|
of entities for greek symbols. ISO 8879:1986 is the ISO number
|
|
for the SGML standard.</para>
|
|
|
|
<para>Otherwise, this string will either look like
|
|
<literal>-//<replaceable>Owner</replaceable></literal> or
|
|
<literal>+//<replaceable>Owner</replaceable></literal> (notice
|
|
the only difference is the leading <literal>+</literal> or
|
|
<literal>-</literal>).</para>
|
|
|
|
<para>If the string starts with <literal>-</literal> then the
|
|
owner information is unregistered, with a <literal>+</literal>
|
|
it identifies it as being registered.</para>
|
|
|
|
<para>ISO 9070:1991 defines how registered names are generated; it
|
|
might be derived from the number of an ISO publication, an ISBN
|
|
code, or an organisation code assigned according to ISO 6523. In
|
|
addition, a registration authority could be created in order to
|
|
assign registered names. The ISO council delegated this to the
|
|
American National Standards Institute (ANSI).</para>
|
|
|
|
<para>Because the FreeBSD Project hasn't been registered the
|
|
owner string is <literal>-//FreeBSD</literal>. And as you can
|
|
see, the W3C are not a registered owner either.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><replaceable>Keyword</replaceable></term>
|
|
|
|
<listitem>
|
|
<para>There are several keywords that indicate the type of
|
|
information in the file. Some of the most common keywords are
|
|
<literal>DTD</literal>, <literal>ELEMENT</literal>,
|
|
<literal>ENTITIES</literal>, and <literal>TEXT</literal>.
|
|
<literal>DTD</literal> is used only for DTD files,
|
|
<literal>ELEMENT</literal> is usually used for DTD fragments
|
|
that contain only entity or element declarations.
|
|
<literal>TEXT</literal> is used for SGML content (text and
|
|
tags).</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><replaceable>Description</replaceable></term>
|
|
|
|
<listitem>
|
|
<para>Any description you want to supply for the contents of this
|
|
file. This may include version numbers or any short text that is
|
|
meaningful to you and unique for the SGML system.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><replaceable>Language</replaceable></term>
|
|
|
|
<listitem>
|
|
<para>This is an ISO two-character code that identifies the native
|
|
language for the file. <literal>EN</literal> is used for
|
|
English.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
|
|
<sect3>
|
|
<title><filename>catalog</filename> files</title>
|
|
|
|
<para>If you use the syntax above and try and process this document
|
|
using an SGML processor, the processor will need to have some way of
|
|
turning the FPI into the name of the file on your computer that
|
|
contains the DTD.</para>
|
|
|
|
<para>In order to do this it can use a catalog file. A catalog file
|
|
(typically called <filename>catalog</filename>) contains lines that
|
|
map FPIs to filenames. For example, if the catalog file contained the
|
|
line;</para>
|
|
|
|
<programlisting>
|
|
PUBLIC "-//W3C//DTD HTML 4.0//EN" "4.0/strict.dtd"</programlisting>
|
|
|
|
<para>The SGML processor would know to look up the DTD from
|
|
<filename>strict.dtd</filename> in the <filename>4.0</filename>
|
|
subdirectory of whichever directory held the
|
|
<filename>catalog</filename> file that contained that line.</para>
|
|
|
|
<para>Look at the contents of
|
|
<filename>/usr/local/share/sgml/html/catalog</filename>. This is the
|
|
catalog file for the HTML DTDs that will have been installed as part
|
|
of the <filename>textproc/docproj</filename> port.</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title><envar>SGML_CATALOG_FILES</envar></title>
|
|
|
|
<para>In order to locate a <filename>catalog</filename> file, your
|
|
SGML processor will need to know where to look. Many of them feature
|
|
command line parameters for specifying the path to one or more
|
|
catalogs.</para>
|
|
|
|
<para>In addition, you can set <envar>SGML_CATALOG_FILES</envar> to
|
|
point to the files. This environment variable should consist of a
|
|
colon-separated list of catalog files (including their full
|
|
path).</para>
|
|
|
|
<para>Typically, you will want to include the following files;</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para><filename>/usr/local/share/sgml/docbook/3.0/catalog</filename></para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><filename>/usr/local/share/sgml/html/catalog</filename></para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><filename>/usr/local/share/sgml/iso8879/catalog</filename></para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para><filename>/usr/local/share/sgml/jade/catalog</filename></para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>You should <link linkend="sgml-primer-envars">already have done
|
|
this</link>.</para>
|
|
</sect3>
|
|
</sect2>
|
|
|
|
<sect2 id="fpi-alternatives">
|
|
<title>Alternatives to FPIs</title>
|
|
|
|
<para>Instead of using an FPI to indicate the DTD that the document
|
|
conforms to (and therefore, which file on the system contains the DTD)
|
|
you can explicitly specify the name of the file.</para>
|
|
|
|
<para>The syntax for this is slightly different;</para>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<!DOCTYPE html SYSTEM "/path/to/file.dtd">]]></programlisting>
|
|
|
|
<para>The <literal>SYSTEM</literal> keyword indicates that the SGML
|
|
processor should locate the DTD in a system specific fashion. This
|
|
typically (but not always) means the DTD will be provided as a
|
|
filename.</para>
|
|
|
|
<para>Using FPIs is preferred for reasons of portability. You don't want
|
|
to have to ship a copy of the DTD around with your document, and if
|
|
you used the <literal>SYSTEM</literal> identifier then everyone would
|
|
need to keep their DTDs in the same place.</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="sgml-escape">
|
|
<title>Escaping back to SGML</title>
|
|
|
|
<para>Earlier in this primer I said that SGML is only used when writing a
|
|
DTD. This is not strictly true. There is certain SGML syntax that you
|
|
will want to be able to use within your documents. For example,
|
|
comments can be included in your document, and will be ignored by the
|
|
parser. Comments are entered using SGML syntax. Other uses for SGML
|
|
syntax in your document will be shown later too.</para>
|
|
|
|
<para>Obviously, you need some way of indicating to the SGML processor
|
|
that the following content is not elements within the document, but is
|
|
SGML that the parser should act upon.</para>
|
|
|
|
<para>These sections are marked by <literal><! ... ></literal> in
|
|
your document. Everything between these delimiters is SGML syntax as you
|
|
might find within a DTD.</para>
|
|
|
|
<para>As you may just have realised, the <link
|
|
linkend="doctype-declaration">DOCTYPE declaration</link> is an example
|
|
of SGML syntax that you need to include in your document…</para>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Comments</title>
|
|
|
|
<para>Comments are an SGML construction, and are normally only valid
|
|
inside a DTD. However, as <xref linkend="sgml-escape"> shows, it is
|
|
possible to use SGML syntax within your document.</para>
|
|
|
|
<para>The delimiters for SGML comments is the string
|
|
“<literal>--</literal>”. The first occurence of this string
|
|
opens a comment, and the second closes it.</para>
|
|
|
|
<example>
|
|
<title>SGML generic comment</title>
|
|
|
|
<programlisting>
|
|
<!-- test comment --></programlisting>
|
|
|
|
<programlisting><![ CDATA [
|
|
<!-- This is inside the comment -->
|
|
|
|
<!-- This is another comment -->
|
|
|
|
<!-- This is one way
|
|
of doing multiline comments -->
|
|
|
|
<!-- This is another way of --
|
|
-- doing multiline comments -->]]></programlisting>
|
|
</example>
|
|
|
|
<![ %output.print; [
|
|
<important>
|
|
<title>Use 2 dashes</title>
|
|
|
|
<para>There is a problem with producing the Postscript and PDF versions
|
|
of this document. The above example probably shows just one hyphen
|
|
symbol, <literal>-</literal> after the <literal><!</literal> and
|
|
before the <literal>></literal>.</para>
|
|
|
|
<para>You <emphasis>must</emphasis> use two <literal>-</literal>,
|
|
<emphasis>not</emphasis> one. The Postscript and PDF versions have
|
|
translated the two <literal>-</literal> in the original to a longer,
|
|
more professional <emphasis>em-dash</emphasis>, and broken this
|
|
example in the process.</para>
|
|
|
|
<para>The HTML, plain text, and RTF versions of this document are not
|
|
affected.</para>
|
|
</important>
|
|
]]>
|
|
|
|
<para>If you have used HTML before you may have been shown different rules
|
|
for comments. In particular, you may think that the string
|
|
<literal><!--</literal> opens a comment, and it is only closed by
|
|
<literal>--></literal>.</para>
|
|
|
|
<para>This is <emphasis>not</emphasis> the case. A lot of web browsers
|
|
have broken HTML parsers, and will accept that as valid. However, the
|
|
SGML parsers used by the Documentation Project are much stricter, and
|
|
will reject documents that make that error.</para>
|
|
|
|
<example>
|
|
<title>Errorneous SGML comments</title>
|
|
|
|
<programlisting><![ CDATA [
|
|
<!-- This is in the comment --
|
|
|
|
THIS IS OUTSIDE THE COMMENT!
|
|
|
|
-- back inside the comment -->]]></programlisting>
|
|
|
|
<para>The SGML parser will treat this as though it were actually;</para>
|
|
|
|
<programlisting>
|
|
<!THIS IS OUTSIDE THE COMMENT></programlisting>
|
|
|
|
<para>This is not valid SGML, and may give confusing error
|
|
messages.</para>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<!--------------- This is a very bad idea --------------->]]></programlisting>
|
|
|
|
<para>As the example suggests, <emphasis>do not</emphasis> write
|
|
comments like that.</para>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<!--===================================================-->]]></programlisting>
|
|
|
|
<para>That is a (slightly) better approach, but it still potentially
|
|
confusing to people new to SGML.</para>
|
|
</example>
|
|
|
|
<sect2>
|
|
<title>For you to do…</title>
|
|
|
|
<procedure>
|
|
<step>
|
|
<para>Add some comments to <filename>example.sgml</filename>, and
|
|
check that the file still validates using &man.nsgmls.1;</para>
|
|
</step>
|
|
|
|
<step>
|
|
<para>Add some invalid comments to
|
|
<filename>example.sgml</filename>, and see the error messages that
|
|
&man.nsgmls.1; gives when it encounters an invalid comment.</para>
|
|
</step>
|
|
</procedure>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Entities</title>
|
|
|
|
<para>Entities are a mechanism for assigning names to chunks of
|
|
content. As an SGML parser processes your document, any entities
|
|
it finds are replaced by the content of the entity.</para>
|
|
|
|
<para>This is a good way to have re-usable, easily changeable chunks
|
|
of content in your SGML documents. It is also the only way to
|
|
include one marked up file inside another using SGML.</para>
|
|
|
|
<para>There are two types of entities which can be used in two
|
|
different situations; <emphasis>general entities</emphasis> and
|
|
<emphasis>parameter entities</emphasis>.</para>
|
|
|
|
<sect2 id="general-entities">
|
|
<title>General Entities</title>
|
|
|
|
<para>You can not use general entities in an SGML context (although you
|
|
define them in one). They can only be used in your document. Contrast
|
|
this with <link linkend="parameter-entities">parameter
|
|
entities</link>.</para>
|
|
|
|
<para>Each general entity has a name. When you want to reference a
|
|
general entity (and therefore include whatever text it represents in
|
|
your document), you write
|
|
<literal>&<replaceable>entity-name</replaceable>;</literal>. For
|
|
example, suppose you had an entity called
|
|
<literal>current.version</literal> which expanded to the current
|
|
version number of your product. You could write;</para>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<para>The current version of our product is
|
|
¤t.version;.</para>]]></programlisting>
|
|
|
|
<para>When the version number changes you can simply change the
|
|
definition of the value of the general entity and reprocess your
|
|
document.</para>
|
|
|
|
<para>You can also use general entities to enter characters that you
|
|
could not otherwise include in an SGML document. For example, <
|
|
and & can not normally appear in an SGML document. When the SGML
|
|
parser sees the < symbol it assumes that a tag (either a start tag
|
|
or an end tag) is about to appear, and when it sees the & symbol it
|
|
assumes the next text will be the name of an entity.</para>
|
|
|
|
<para>Fortunately, you can use the two general entities &lt; and
|
|
&amp; whenever you need to include one or other of these </para>
|
|
|
|
<para>A general entity can only be defined within an SGML context.
|
|
Typically, this is done immediately after the DOCTYPE
|
|
declaration.</para>
|
|
|
|
<example>
|
|
<title>Defining general entities</title>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
|
|
<!ENTITY current.version "3.0-RELEASE">
|
|
<!ENTITY last.version "2.2.7-RELEASE">
|
|
]>]]></programlisting>
|
|
|
|
<para>Notice how the DOCTYPE declaration has been extended by adding a
|
|
square bracket at the end of the first line. The two entities are
|
|
then defined over the next two lines, before the square bracket is
|
|
closed, and then the DOCTYPE declaration is closed.</para>
|
|
|
|
<para>The square brackets are necessary to indicate that we are
|
|
extending the DTD indicated by the DOCTYPE declaration.</para>
|
|
</example>
|
|
</sect2>
|
|
|
|
<sect2 id="parameter-entities">
|
|
<title>Parameter entities</title>
|
|
|
|
<para>Like <link linkend="general-entities">general entities</link>,
|
|
parameter entities are used to assign names to reusable chunks of
|
|
text. However, where as general entities can only be used within your
|
|
document, parameter entities can only be used within an <link
|
|
linkend="sgml-escape">SGML context</link>.</para>
|
|
|
|
<para>Parameter entities are defined in a similar way to general
|
|
entities. However, instead of using
|
|
<literal>&<replaceable>entity-name</replaceable>;</literal> to
|
|
refer to them, use
|
|
<literal>%<replaceable>entity-name</replaceable>;</literal><footnote>
|
|
<para><emphasis>P</emphasis>arameter entities use the
|
|
<emphasis>P</emphasis>ercent symbol.</para>
|
|
</footnote>. The definition also includes the <literal>%</literal>
|
|
between the <literal>ENTITY</literal> keyword and the name of the
|
|
entity.</para>
|
|
|
|
<example>
|
|
<title>Defining parameter entities</title>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
|
|
<!ENTITY % param.some "some">
|
|
<!ENTITY % param.text "text">
|
|
<!ENTITY % param.new "%param.some more %param.text">
|
|
|
|
<!-- %param.new now contains "some more text" -->
|
|
]>]]></programlisting>
|
|
</example>
|
|
|
|
<para>This may not seem particularly useful. It will be.</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>For you to do…</title>
|
|
|
|
<procedure>
|
|
<step>
|
|
<para>Add a general entity to
|
|
<filename>example.sgml</filename>.</para>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" [
|
|
<!ENTITY version "1.1">
|
|
]>
|
|
|
|
<html>
|
|
<head>
|
|
<title>An example HTML file</title>
|
|
</head>
|
|
|
|
<!-- You might well have some comments in here as well -->
|
|
|
|
<body>
|
|
<p>This is a paragraph containing some text.</p>
|
|
|
|
<p>This paragraph contains some more text.</p>
|
|
|
|
<p align="right">This paragraph might be right-justified.</p>
|
|
|
|
<p>The current version of this document is: &version;</p>
|
|
</body>
|
|
</html>]]></programlisting>
|
|
</step>
|
|
|
|
<step>
|
|
<para>Validate the document using &man.nsgmls.1;</para>
|
|
</step>
|
|
|
|
<step>
|
|
<para>Load <filename>example.sgml</filename> into your web browser
|
|
(you may need to copy it to <filename>example.html</filename>
|
|
before your browser recognises it as an HTML document).</para>
|
|
|
|
<para>Unless your browser is very advanced, you won't see the entity
|
|
reference <literal>&version;</literal> replaced with the
|
|
version number. Most web browsers have very simplistic parsers
|
|
which do not handle proper SGML<footnote>
|
|
<para>This is a shame. Imagine all the problems and hacks (such
|
|
as Server Side Includes) that could be avoided if they
|
|
did.</para>
|
|
</footnote>.</para>
|
|
</step>
|
|
|
|
<step>
|
|
<para>The solution is to <emphasis>normalise</emphasis> your
|
|
document using an SGML normaliser. The normaliser reads in valid
|
|
SGML and outputs equally valid SGML which has been transformed in
|
|
some way. One of the ways in which the normaliser transforms the
|
|
SGML is to expand all the entity references in the document,
|
|
replacing the entities with the text that they represent.</para>
|
|
|
|
<para>You can use &man.sgmlnorm.1; to do this.</para>
|
|
|
|
<screen>&prompt.user; <userinput>sgmlnorm example.sgml > example.html</userinput></screen>
|
|
|
|
<para>You should find a normalised (i.e., entity references
|
|
expanded) copy of your document in
|
|
<filename>example.html</filename>, ready to load into your web
|
|
browser.</para>
|
|
</step>
|
|
|
|
<step>
|
|
<para>If you look at the output from &man.sgmlnorm.1; you will see
|
|
that it does not include a DOCTYPE declaration at the start. To
|
|
include this you need to use the <option>-d</option>
|
|
option;</para>
|
|
|
|
<screen>&prompt.user; <userinput>sgmlnorm -d example.sgml > example.html</userinput></screen>
|
|
</step>
|
|
</procedure>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Using entities to include files</title>
|
|
|
|
<para>Entities (both <link linkend="general-entities">general</link> and
|
|
<link linkend="parameter-entities">parameter</link>) are particularly
|
|
useful when used to include one file inside another.</para>
|
|
|
|
<sect2 id="include-using-gen-entities">
|
|
<title>Using general entities to include files</title>
|
|
|
|
<para>Suppose you have some content for an SGML book organised into
|
|
files, one file per chapter, called
|
|
<filename>chapter1.sgml</filename>,
|
|
<filename>chapter2.sgml</filename>, and so forth, with a
|
|
<filename>book.sgml</filename> file that will contain these
|
|
chapters.</para>
|
|
|
|
<para>In order to use the contents of these files as the values for your
|
|
entities, you declare them with the <literal>SYSTEM</literal> keyword.
|
|
This directs the SGML parser to use the contents of the named file as
|
|
the value of the entity.</para>
|
|
|
|
<example>
|
|
<title>Using general entities to include files</title>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
|
|
<!ENTITY chapter.1 SYSTEM "chapter1.sgml">
|
|
<!ENTITY chapter.2 SYSTEM "chapter2.sgml">
|
|
<!ENTITY chapter.3 SYSTEM "chapter3.sgml">
|
|
<!-- And so forth -->
|
|
]>
|
|
|
|
<html>
|
|
<!-- Use the entities to load in the chapters -->
|
|
|
|
&chapter.1;
|
|
&chapter.2;
|
|
&chapter.3;
|
|
</html>]]></programlisting>
|
|
</example>
|
|
|
|
<warning>
|
|
<para>When using general entities to include other files within a
|
|
document, the files being included
|
|
(<filename>chapter1.sgml</filename>,
|
|
<filename>chapter2.sgml</filename>, and so on) <emphasis>must
|
|
not</emphasis> start with a DOCTYPE declaration. This is a syntax
|
|
error.</para>
|
|
</warning>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Using parameter entities to include files</title>
|
|
|
|
<para>Recall that parameter entities can only be used inside an SGML
|
|
context. Why then would you want to include a file within an SGML
|
|
context?</para>
|
|
|
|
<para>You can use this to ensure that you can reuse your general
|
|
entities.</para>
|
|
|
|
<para>Suppose that you had many chapters in your document, and you
|
|
reused these chapters in two different books, each book organising the
|
|
chapters in a different fashion.</para>
|
|
|
|
<para>You could list the entities at the top of each book, but this
|
|
quickly becomes cumbersome to manage.</para>
|
|
|
|
<para>Instead, place the general entity definitions inside one file,
|
|
and use a parameter entity to include that file within your
|
|
document.</para>
|
|
|
|
<example>
|
|
<title>Using parameter entities to include files</title>
|
|
|
|
<para>First, place your entity definitions in a separate file, called
|
|
<filename>chapters.ent</filename>. This file contains the
|
|
following;</para>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<!ENTITY chapter.1 SYSTEM "chapter1.sgml">
|
|
<!ENTITY chapter.2 SYSTEM "chapter2.sgml">
|
|
<!ENTITY chapter.3 SYSTEM "chapter3.sgml">]]></programlisting>
|
|
|
|
<para>Now create a parameter entity to refer to the contents of the
|
|
file. Then use the parameter entity to load the file into the
|
|
document, which will then make all the general entities available
|
|
for use. Then use the general entities as before;</para>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
|
|
<!-- Define a parameter entity to load in the chapter general entities -->
|
|
<!ENTITY % chapters SYSTEM "chapters.ent">
|
|
|
|
<!-- Now use the parameter entity to load in this file -->
|
|
%chapters;
|
|
]>
|
|
|
|
<html>
|
|
&chapter.1;
|
|
&chapter.2;
|
|
&chapter.3;
|
|
</html>]]></programlisting>
|
|
</example>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>For you to do…</title>
|
|
|
|
<sect3>
|
|
<title>Use general entities to include files</title>
|
|
|
|
<procedure>
|
|
<step>
|
|
<para>Create three files, <filename>para1.sgml</filename>,
|
|
<filename>para2.sgml</filename>, and
|
|
<filename>para3.sgml</filename>.</para>
|
|
|
|
<para>Put content similar to the following in each file;</para>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<p>This is the first paragraph.</p>]]></programlisting>
|
|
</step>
|
|
|
|
<step>
|
|
<para>Edit <filename>example.sgml</filename> so that it looks like
|
|
this;</para>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
|
|
<!ENTITY version "1.1">
|
|
<!ENTITY para1 SYSTEM "para1.sgml">
|
|
<!ENTITY para2 SYSTEM "para2.sgml">
|
|
<!ENTITY para3 SYSTEM "para3.sgml">
|
|
]>
|
|
|
|
<html>
|
|
<head>
|
|
<title>An example HTML file</title>
|
|
</head>
|
|
|
|
<body>
|
|
<p>The current version of this document is: &version;</p>
|
|
|
|
¶1;
|
|
¶2;
|
|
¶3;
|
|
</body>
|
|
</html>]]></programlisting>
|
|
</step>
|
|
|
|
<step>
|
|
<para>Produce <filename>example.html</filename> by normalising
|
|
<filename>example.sgml</filename>.</para>
|
|
|
|
<screen>&prompt.user; <userinput>sgmlnorm -d example.sgml > example.html</userinput></screen>
|
|
</step>
|
|
|
|
<step>
|
|
<para>Load <filename>example.html</filename> in to your web
|
|
browser, and confirm that the
|
|
<filename>para<replaceable>n</replaceable>.sgml</filename> files
|
|
have been included in <filename>example.html</filename>.</para>
|
|
</step>
|
|
</procedure>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Use parameter entities to include files</title>
|
|
|
|
<note>
|
|
<para>You must have taken the previous steps first.</para>
|
|
</note>
|
|
|
|
<procedure>
|
|
<step>
|
|
<para>Edit <filename>example.sgml</filename> so that it looks like
|
|
this;</para>
|
|
<programlisting>
|
|
<![ CDATA [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
|
|
<!ENTITY % entities SYSTEM "entities.sgml"> %entities;
|
|
]>
|
|
|
|
<html>
|
|
<head>
|
|
<title>An example HTML file</title>
|
|
</head>
|
|
|
|
<body>
|
|
<p>The current version of this document is: &version;</p>
|
|
|
|
¶1;
|
|
¶2;
|
|
¶3;
|
|
</body>
|
|
</html>]]></programlisting>
|
|
</step>
|
|
|
|
<step>
|
|
<para>Create a new file, <filename>entities.sgml</filename>, with
|
|
this content;</para>
|
|
|
|
<programlisting>
|
|
<![ CDATA [<!ENTITY version "1.1">
|
|
<!ENTITY para1 SYSTEM "para1.sgml">
|
|
<!ENTITY para2 SYSTEM "para2.sgml">
|
|
<!ENTITY para3 SYSTEM "para3.sgml">]]></programlisting>
|
|
</step>
|
|
|
|
<step>
|
|
<para>Produce <filename>example.html</filename> by normalising
|
|
<filename>example.sgml</filename>.</para>
|
|
|
|
<screen>&prompt.user; <userinput>sgmlnorm -d example.sgml > example.html</userinput></screen>
|
|
</step>
|
|
|
|
<step>
|
|
<para>Load <filename>example.html</filename> in to your web
|
|
browser, and confirm that the
|
|
<filename>para<replaceable>n</replaceable>.sgml</filename> files
|
|
have been included in <filename>example.html</filename>.</para>
|
|
</step>
|
|
</procedure>
|
|
</sect3>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Marked sections</title>
|
|
|
|
<para>SGML provides a mechanism to indicate that particular pieces of the
|
|
document should be processed in a special way. These are termed
|
|
“marked sections”.</para>
|
|
|
|
<example>
|
|
<title>Structure of a marked section</title>
|
|
|
|
<programlisting>
|
|
<![ <replaceable>KEYWORD</replaceable> [
|
|
Contents of marked section
|
|
]]></programlisting>
|
|
</example>
|
|
|
|
<para>As you would expect, being an SGML construct, a marked section
|
|
starts <literal><!</literal>.</para>
|
|
|
|
<para>The first square bracket begins to delimit the marked
|
|
section.</para>
|
|
|
|
<para><replaceable>KEYWORD</replaceable> describes how this marked
|
|
section should be processed by the parser.</para>
|
|
|
|
<para>The second square bracket indicates that the content of the marked
|
|
section starts here.</para>
|
|
|
|
<para>The marked section is finished by closing the two square brackets,
|
|
and then returning to the document context from the SGML context with
|
|
<literal>></literal></para>
|
|
|
|
<sect2>
|
|
<title>Marked section keywords</title>
|
|
|
|
<sect3>
|
|
<title><literal>CDATA</literal>, <literal>RCDATA</literal></title>
|
|
|
|
<para>These keywords denote the marked sections <emphasis>content
|
|
model</emphasis>, and allow you to change it from the
|
|
default.</para>
|
|
|
|
<para>When an SGML parser is processing a document, it keeps track
|
|
of what is called the “content model”.</para>
|
|
|
|
<para>Briefly, the content model describes what sort of content the
|
|
parser is expecting to see, and what it will do with it when it
|
|
finds it.</para>
|
|
|
|
<para>The two content models you will probably find most useful are
|
|
<literal>CDATA</literal> and <literal>RCDATA</literal>.</para>
|
|
|
|
<para><literal>CDATA</literal> is for “Character Data”. If
|
|
the parser is in this content model then it is expecting to see
|
|
characters, and characters only. In this model the < and &
|
|
symbols lose their special status, and will be treated as ordinary
|
|
characters.</para>
|
|
|
|
<para><literal>RCDATA</literal> is for “Entity references and
|
|
character data” If the parser is in this content model then it
|
|
is expecting to see characters <emphasis>and</emphasis> entities.
|
|
< loses its special status, but & will still be treated as
|
|
starting the beginning of a general entity.</para>
|
|
|
|
<para>This is particularly useful if you are including some verbatim
|
|
text that contains lots of < and & characters. While you
|
|
could go through the text ensuring that every < is converted to a
|
|
&lt; and every & is converted to a &amp;, it can be
|
|
easier to mark the section as only containing CDATA. When the SGML
|
|
parser encounters this it will ignore the < and & symbols
|
|
embedded in the content.</para>
|
|
|
|
<!-- The nesting of CDATA within the next example is disgusting -->
|
|
|
|
<example>
|
|
<title>Using a CDATA marked section</title>
|
|
|
|
<programlisting>
|
|
<para>Here is an example of how you would include some text
|
|
that contained many &lt; and &amp; symbols. The sample
|
|
text is a fragment of HTML. The surrounding text (<para> and
|
|
<programlisting>) are from DocBook.</para>
|
|
|
|
<programlisting>
|
|
<![ CDATA [ <![ CDATA [
|
|
<p>This is a sample that shows you some of the elements within
|
|
HTML. Since the angle brackets are used so many times, it's
|
|
simpler to say the whole example is a CDATA marked section
|
|
than to use the entity names for the left and right angle
|
|
brackets throughout.</p>
|
|
|
|
<ul>
|
|
<li>This is a listitem</li>
|
|
<li>This is a second listitem</li>
|
|
<li>This is a third listitem</li>
|
|
</ul>
|
|
|
|
<p>This is the end of the example.</p>]]>
|
|
]]>
|
|
</programlisting></programlisting>
|
|
|
|
<para>If you look at the source for this document you will see this
|
|
technique used throughout.</para>
|
|
</example>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title><literal>INCLUDE</literal> and
|
|
<literal>IGNORE</literal></title>
|
|
|
|
<para>If the keyword is <literal>INCLUDE</literal> then the contents
|
|
of the marked section will be processed. If the keyword is
|
|
<literal>IGNORE</literal> then the marked section is ignored and
|
|
will not be processed. It will not appear in the output.</para>
|
|
|
|
<example>
|
|
<title>Using <literal>INCLUDE</literal> and
|
|
<literal>IGNORE</literal> in marked sections</title>
|
|
|
|
<programlisting>
|
|
<![ INCLUDE [
|
|
This text will be processed and included.
|
|
]]>
|
|
|
|
<![ IGNORE [
|
|
This text will not be processed or included.
|
|
]]></programlisting>
|
|
</example>
|
|
|
|
<para>By itself, this isn't too useful. If you wanted to remove text
|
|
from your document you could cut it out, or wrap it in
|
|
comments.</para>
|
|
|
|
<para>It becomes more useful when you realise you can use <link
|
|
linkend="parameter-entities">parameter entities</link> to control
|
|
this. Remember that parameter entities can only be used in SGML
|
|
contexts, and the keyword of a marked section
|
|
<emphasis>is</emphasis> an SGML context.</para>
|
|
|
|
<para>For example, suppose that you produced a hard-copy version of
|
|
some documentation and an electronic version. In the electronic
|
|
version you wanted to include some extra content that wasn't to
|
|
appear in the hard-copy.</para>
|
|
|
|
<para>Create a parameter entity, and set it's value to
|
|
<literal>INCLUDE</literal>. Write your document, using marked
|
|
sections to delimit content that should only appear in the
|
|
electronic version. In these marked sections use the parameter
|
|
entity in place of the keyword.</para>
|
|
|
|
<para>When you want to produce the hard-copy version of the document,
|
|
change the parameter entity's value to <literal>IGNORE</literal> and
|
|
reprocess the document.</para>
|
|
|
|
<example>
|
|
<title>Using a parameter entity to control a marked
|
|
section</title>
|
|
|
|
<programlisting>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
|
|
<!ENTITY % electronic.copy "INCLUDE">
|
|
]]>
|
|
|
|
...
|
|
|
|
<![ %electronic.copy [
|
|
This content should only appear in the electronic
|
|
version of the document.
|
|
]]></programlisting>
|
|
|
|
<para>When producing the hard-copy version, change the entity's
|
|
definition to;</para>
|
|
|
|
<programlisting>
|
|
<!ENTITY % electronic.copy "IGNORE"></programlisting>
|
|
|
|
<para>On reprocessing the document, the marked sections that use
|
|
<literal>%electronic.copy</literal> as their keyword will be
|
|
ignored.</para>
|
|
</example>
|
|
</sect3>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>For you to do…</title>
|
|
|
|
<procedure>
|
|
<step>
|
|
<para>Create a new file, <filename>section.sgml</filename>, that
|
|
contains the following;</para>
|
|
|
|
<programlisting>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
|
|
<!ENTITY % text.output "INCLUDE">
|
|
]>
|
|
|
|
<html>
|
|
<head>
|
|
<title>An example using marked sections</title>
|
|
</head>
|
|
|
|
<body>
|
|
<p>This paragraph <![ CDATA [contains many <
|
|
characters (< < < < <) so it is easier
|
|
to wrap it in a CDATA marked section ]]></p>
|
|
|
|
<![ IGNORE [
|
|
<p>This paragraph will definitely not be included in the
|
|
output.</p>
|
|
]]>
|
|
|
|
<![ <![ CDATA [%text.output]]> [
|
|
<p>This paragraph might appear in the output, or it
|
|
might not.</p>
|
|
|
|
<p>Its appearance is controlled by the <![CDATA[%text.output]]>
|
|
parameter entity.</p>
|
|
]]>
|
|
</body>
|
|
</html></programlisting>
|
|
</step>
|
|
|
|
<step>
|
|
<para>Normalise this file using &man.sgmlnorm.1; and examine the
|
|
output. Notice which paragraphs have appeared, which have
|
|
disappeared, and what has happened to the content of the CDATA
|
|
marked section.</para>
|
|
</step>
|
|
|
|
<step>
|
|
<para>Change the definition of the <literal>text.output</literal>
|
|
entity from <literal>INCLUDE</literal> to
|
|
<literal>IGNORE</literal>. Re-normalise the file, and examine the
|
|
output to see what has changed. </para>
|
|
</step>
|
|
</procedure>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1>
|
|
<title>Conclusion</title>
|
|
|
|
<para>That is the conclusion of this SGML primer. For reasons of space
|
|
and complexity several things have not been covered in depth (or at
|
|
all). However, the previous sections cover enough SGML for you to be
|
|
able to follow the organisation of the FDP documentation.</para>
|
|
</sect1>
|
|
</chapter>
|
|
|
|
<!--
|
|
Local Variables:
|
|
mode: sgml
|
|
sgml-declaration: "../chapter.decl"
|
|
sgml-indent-data: t
|
|
sgml-omittag: nil
|
|
sgml-always-quote-attributes: t
|
|
sgml-parent-document: ("../book.sgml" "part" "chapter")
|
|
End:
|
|
-->
|