This is a user's guide to the linuxdoc-sgml
document
processing system, for use with Linux documentation. linuxdoc-sgml
is an SGML DTD (Document Type Definition) and set of ``replacement files''
which convert the SGML to groff
, LaTeX, and HTML source. In the future,
linuxdoc-sgml
will support texinfo
, as well as other
formats.
linuxdoc-sgml
is based heavily on the QWERTZ DTD by Tom Gordon,
thomas.gordon@gmd.de
. I have only made revisions to his DTD and
replacement files for use by Linux documentation.
linuxdoc-sgml
is not meant to be a general document-processing system.
Although it can be used for documents of many types, I have tailored it for
use by the Linux documentors in producing HOWTOs, FAQs, and (later) the
Linux Documentation Project manuals. Therefore, I have tweaked features
into and out of the system for this purpose. If you see a lack of generality
in the system, that is the reason. There's nothing binding linuxdoc-sgml
to Linux documentation, but all documents produced by the system will look
a certain way. If you want things to look differently I suggest that you use
a more generalized system such as the plain QWERTZ DTD.
One of the goals of this system is to make documents easy to produce in
numerous formats. Until now, most Linux documentation has been produced
in plain ASCII through manual editing. A system like groff
can take care
of the plain-text formatting, but that still doesn't give you HTML (for
use on the World Wide Web), LaTeX (for nicely printed documents), or
texinfo
. Therefore, if there are features missing from this system
that you would like, please let me know! The idea is that we shouldn't
have to use a lot of hackery to produce good-looking docs in multiple formats.
The author should have to do as little as possible.
This document is written using the linuxdoc-sgml
DTD. It contains
more or less everything you need to know to write SGML docs with this
DTD. See example.sgml
for an example of an SGML document that you
can use as a model for your own docs.
I chose SGML for this system because SGML is made specifically for translation
to other formats. SGML, which stands for Standard Generalized Markup Language,
allows you to specify the structure of a document---that is, what kinds
of things make up the document. You specify the structure of a document with
a DTD (Document Type Definition). linuxdoc-sgml
is one DTD that specifies
the structure for Linux HOWTOs and other docs. QWERTZ is another DTD; the
SGML standard provides DTD's for books, articles, and other generic document
types.
The DTD specifies the names of ``elements'' within the document. An element
is just a bit of structure---like a section, a subsection, a paragraph,
or even something smaller like emphasised text. Unlike LaTeX, however,
these elements are not in any way intrinsic to SGML itself. The
linuxdoc-sgml
DTD happens to define elements that look a lot like
their LaTeX counterparts---you have sections, subsections, verbatim
``environments'', and so forth. However, using SGML you can define any kind
of structure for the document that you like. In a way, SGML is like
low-level TeX, while the linuxdoc-sgml
DTD is like LaTeX.
Don't be confused by this analogy. SGML is not a text-formatting system. There is no ``SGML formatter'' per se. SGML source is only converted to other formats for processing. Furthermore, SGML itself is used only to specify the document structure. There are no text-formatting facilities or ``macros'' intrinsic to SGML itself. All of those things are defined within the DTD. You can't use SGML without a DTD---a DTD defines what SGML does.
Here's how processing a document with SGML and the linuxdoc-sgml
DTD
works. First, you need a DTD. I'm using the QWERTZ DTD which was produced,
originally, by a group of people who needed a LaTeX-like DTD. I've modified
the QWERTZ DTD to produce the linuxdoc-sgml
DTD for our purposes.
The DTD simply sets up the structure of the document. A small portion of
it looks like this:
<!element article - -
(titlepag, header?,
toc?, lof?, lot?, p*, sect*,
(appendix, sect+)?, biblio?) +(footnote)>
This part sets up the overall structure for an ``article'', which is like
a ``documentstyle'' within LaTeX. The article consists of a titlepage
(titlepag
), an optional header (header
), an optional table of
contents (toc
), optional lists of figures (lof
) and tables
(lot
), any number of paragraphs (p
), any number of top-level
sections (sect
), optional appendices (appendix
), an optional
bibliography (biblio
) and footnotes (footnote
).
As you can see, the DTD doesn't say anything about how the document should
be formatted or what it should look like. It just defines what parts make
up the document. Elsewhere in the DTD the structure of the
titlepag
, header
, sect
, and other elements are defined.
You don't need to know anything about the syntax of
the DTD in order to write documents. I'm just presenting it so you know
what it looks like and what it does. You do need to be familiar with
the document structure that the DTD defines. If not, you might
violate the structure when attempting to write a document, and be very
confused about the resulting error messages. We'll describe the
structure of linuxdoc-sgml
documents in detail later.
The next step is to write a document using the structure defined by the
DTD. Again, the linuxdoc-sgml
DTD makes documents look a lot like
LaTeX---it's very easy to follow. In SGML jargon a single document written
using a particular DTD is known as an ``instance'' of that DTD.
In order to translate the SGML source into another format (such as LaTeX
or nroff) for processing, the SGML source (the document that you wrote)
is parsed along with the DTD by (you guessed it) the SGML parser.
I'm using the sgmls
parser by James Clark, jjc@jclark.com
, who
also happens to be the author of groff
. We're in good hands.
The parser (the executable sgmls
simply picks through your document and
verifies that it follows the structure set forth by the DTD. It also spits out
a more explicit form of your document, with all ``macros'' and elements
expanded, which is understood by sgmlsasp
, the next part of the
process.
sgmlsasp
is responsible for converting the output of sgmls
to
another format (such as LaTeX). It does this using replacement files,
which describe how to convert elements in the original SGML document into
corresponding source in the ``target'' format (such as LaTeX or nroff
).
For example, part of the replacement file for LaTeX looks like:
<itemize> + "\\begin{itemize}" +
</itemize> + "\\end{itemize}" +
Which says that whenever you begin an itemize
element in the
SGML source, it should be replaced with
\begin{itemize}
in the LaTeX source. (As I said, elements in the linuxdoc-sgml
DTD
are very similar to their LaTeX counterparts).
So, to convert the SGML to another format, all you have to do is write a new replacement file for that format that gives the appropriate analogues to the SGML elements in that new format. In practice, it's not that simple---for example, if you're trying to convert to a format that isn't structured at all like your DTD, you're going to have trouble. In any case, it's much easier to do than writing individual parsers and translators for many kinds of output formats; SGML provides a generalized system for converting one source to many formats.
Once sgmlsasp
has completed its work, you have LaTeX source which
corresponds to your original SGML document, which you can format using
LaTeX as you normally would. Later in this document I'll give examples
and show the commands used to do the translation and formatting. You can
do this all on one command line.
But first, I should describe how to install and configure the software.
Next Chapter, Previous Chapter
Table of contents of this chapter, General table of contents
Top of the document, Beginning of this Chapter