3 Writing Documents with `linuxdoc-sgml`

Contents of this section

For the most part, writing documents using the linuxdoc DTD is very simple, and somewhat like LaTeX. However, there are some caveats to watch out for. In this section I'll give an introduction on writing SGML docs. See the file example.sgml for an SGML example document (and tutorial) which you can use as a model when writing your own docs. Here I'm just going to discuss the various features of SGML, but the source is not very readable as an example. Instead, print out the source (as well as the formatted output) for example.sgml so you have a real live case to refer to.

3.1 Basic concepts

Looking at the source of the example document, you'll notice right off that there are a number of ``tags'' marked within angle brackets (< and >). A tag simply specifies the beginning or end of an element, where an element is something like a section, a paragraph, a phrase of italicized text, an item in a list, and so on. Using a tag is like using a LaTeX command such as \item or \section{...}.

As a simple example, to produce this boldfaced text, I typed


As a simple example, to produce <bf>this boldfaced text</bf>, ...

in the source. <bf> begins the region of bold text, and ends it. Alternately, use can use the abbreviated form


As a simple example, to produce <bf/this boldfaced text/, ...

which encloses the bold text within slashes. (Of course, you'll need to use the long form if the enclosed text contains slashes, such as the case with UNIX filenames).

There are other things to watch out with respect to special characters (that's why you'll notice all of these bizarre-looking ampersand expressions if you look at the source; I'll talk about those shortly).

In some cases, the end-tag for a particular element is optional. For example, to begin a section, you use the <sect> tag, however, the end-tag for the section (which could appear at the end of the section body itself, not just after the name of the section!) is optional and implied when you start another section of the same depth. In general you needn't worry about these details; just follow the model used in the tutorial (example.sgml), and feel free to ask me if you have any questions about the particulars.

3.2 Special characters

Obviously, the angle brackets are themselves special characters in the SGML source. There are others to watch out for. For example, let's say that you wanted to type an expression with angle brackets around it, as so: <foo>. In order to get the left angle bracket, you must use the &lt element, which is a ``macro'' that expands to the actual left-bracket character. Therefore, in the source, I typed


angle brackets around it, as so: <tt>&lt;foo></tt>.

Generally, something beginning with an ampersand is a special macro. For example, there's &percnt to produce %, &verbar to produce |, and so on. For all ``special characters'' there exist these ampersanded-entities to represent them.

Usually, you don't need to use the ampersand macro to get a special character, however, in some cases it is necessary. The most commonly used are:

Use & for the ampersand (&),
Use < for a left bracket (<),
Use > for a right bracket (>),
Use &etago; for a left bracket with a slash ()


Use &dollar; for a dollar sign ($),
Use &num; for a hash (#),
Use &percnt; for a percent (%),
Use `` and '' for quotes, or use
&dquot for ".

3.3  Verbatim and code environments 


While we're on the subject of special characters, I might as well mention
the verbatim ``environment'' used for including literal text in the output
(with spaces and indentation preserved, and so on). The 
verb element is used for this; it looks like the following:

<verb>
  Some literal text to include as example output.
</verb>



The verb environment doesn't allow you to use everything
within it literally. Specifically, you must do the following within
verb environments.

Use &ero; to get an ampersand, 
Use &etago; to get ,

Don't use \end{verbatim} within a verb
environment, as this is what LaTeX uses to end the verbatim 
environment. (In the future, it should be possible to hide the underlying
text formatter entirely, but the parser doesn't support this feature yet.) 


The code environment is much just like the verb environment,
except that horizontal rules are added to the surrounding text, as so:

Here is an example code environment.



You should use the tscreen environment around any verb environments,
as so:

<tscreen><verb>
Here is some example text. 
</verb></tscreen>



tscreen is an envionment that simply indents the text and sets the 
sets the default font to tt. This makes examples look much nicer, both
in the LaTeX and plain ASCII versions. You can use tscreen
without verb, however, if you use any special characters in your 
example you'll need to use both of them. tscreen does nothing to 
special characters. See example.sgml for examples.
The quote environment is like tscreen, except that it does
not set the default font to tt. So, you can use quote for
non-computer-interaction quotes, as in:

<quote>
Here is some text to be indented, as in a quote.
</quote>



which will generate:

Here is some text to be indented, as in a quote.




3.4  Overall document structure 


Before we get too in-depth with details, I'm going to describe the
overall structure of a document as defined by the linuxdoc DTD.
Look at example.sgml for a good example of how a document is set up.

 The preamble 

In the document ``preamble'' you set up things such as the title
information and document style. For a Linux HOWTO document this should
look like:

<!doctype linuxdoc system>

<article>

<title>The Linux Food-Processing HOWTO
<author>Norbert Ebersol, <tt/norbert@foo.com/
<date>v1.0, 9 March 1994
<abstract>
This document describes how to connect your Linux machine to a food-processor
for dicing vegetables.
</abstract>

<toc>



The elements should go more or less in this order. The first line tells
the SGML parser to use the linuxdoc DTD. The <article>
tag forces the document to use the ``article'' document style. (The 
original QWERTZ DTD defines ``report'' and ``book'' as well; I haven't
tweaked these for use with linuxdoc-sgml. Just use article for
you SGML docs, for now.)
The title, author, and date tags should be obvious; in the
date tag include the version number and last modification time of
the document.
Thr abstract tag sets up the text to be printed at the top of the
document, before the table of contents. If you're not going to
include a table of contents (the toc tag), you probably don't
need an abstract. I suggest that all Linux HOWTOs use this same format
for the preamble, so that the title, abstract, and table of contents are
all there and look the same. 

 Sectioning and paragraphs 

After the preamble, you're ready to dive into the document. The following
sectioning commands are available:

sect: For top-level sections (i.e. 1, 2, and so on.) 
sect1: For second-level subsections (i.e. 1.1, 1.2, and so on.)
sect2: For third-level subsubsections.
sect3: For fourth-level subsubsubsections.
sect4: For fifth-level subsubsubsubsections.


These are roughly equivalent to their LaTeX counterparts section,
subsection, and so on.
After the sect (or sect1, sect2, etc.) tag comes the
name of the section. For example, at the top of this document, after
the preamble, comes the tag:

<sect>Introduction



And at the beginning of this section (Sectioning and paragraphs), there
is the tag:

<sect2>Sectioning and paragraphs



After the section tag, you begin the body of the section. However, you
must start the body with a <p> tag, as so:

<sect>Introduction

<p>
This is a user's guide to the <tt/linuxdoc-sgml/ document processing...



This is to tell the parser that you're done with the section title
and are ready to begin the body. Thereafter, new paragraphs are started
with a blank line (just as you would do in TeX). For example,

Here is the end of the first paragraph.

And we start a new paragraph here.



There is no reason to use <p> tags at the beginning of
every paragraph; only at the beginning of the first paragraph after
a sectioning command.

 Ending the document 

At the end of the document, you must use the tag:

</article>



to tell the parser that you're done with the article element (which
embodies the entire document). 


 3.5  Cross-references 


Now we're going to move onto other features of the system. 
Cross-references are easy. For example, if you want to make a
cross-reference to a certain section, you need to label that section
as so:

<sect1><heading><label id="sec-intro">Introduction</>



You can then refer to that section somewhere in the text using the
expression:

See section <ref id="sec-intro" name="Introduction"> for an introduction.



This will replace the ref tag with the section number labelled
as sec-intro. The name argument to ref is necessary for
nroff and HTML translations (at the moment). The nroff
macro set used by Linuxdoc-SGML does not currently support cross-references,
and it's often nice to refer to a section by name instead of number. 
For example, this section is 
 Cross-references 
.
There is also a url element for Universal Resource Locators, or
URLs, used on the World Wide Web. This element should be used to refer
to other documents, files available for FTP, and so forth. For
example,

You can get the Linux HOWTO documents from 
<url url="http://sunsite.unc.edu/mdw/linux.html" 
     name="the Linux Documentation Project home page">.



The url argument specifies the actual URL itself. A link to the
URL in question will be automatically added to the HTML document.
The optional name argument specifies the text that should be anchored to
the URL (for HTML conversion) or named as the description of the
URL (for LaTeX and nroff). If no name argument is given, the
URL itself will be used.
For example, you can get the Linuxdoc-SGML package from
  ftp://ftp.cs.cornell.edu/mdw/linuxdoc-sgml-1.1.tar.gz 
.


3.6  Fonts 


Essentially, the same fonts supported by LaTeX are supported
by linuxdoc-sgml. Note, however, that the conversion to 
plain ASCII (through groff) does away with the font 
information---I might hack up plain-ASCII representations of the
various fonts if the need arises. So, you should use fonts 
as much as possible, for the benefit of the conversion to LaTeX.
But don't depend on the fonts to get a point across in the plain
ASCII version. 
In particular, the tt tag described above can be used to
get constant-width ``typewriter'' font which should be used for
all e-mail addresses, machine names, filenames, and so on. 
Example:

Here is some <tt>typewriter text</tt> to be included in the document.



Equivalently:

Here is some <tt/typewriter text/ to be included in the document.



Remember that you can only use this abbreviated form if the enclosed
text doesn't contain slashes.
Other fonts can be achieved with bf for boldface and em 
for italics. Several other fonts are supported as well, but
I don't suggest you use them, because we'll be converting these
documents to other formats such as HTML which may not support them.
Boldface, typewriter, and italics should be all that you need.


3.7  Lists 


There are various kinds of supported lists. They are:

itemize for bulleted lists such as this one.
enum for numbered lists.
descrip for ``descriptive'' lists. 


Each item in an itemize or enum list must be marked
with an item tag. Items in a descrip are marked with tag.
For example,

<itemize>
<item>Here is an item.
<item>Here is a second item.
</itemize>



Looks like this:

Here is an item.
Here is a second item.


Or, for an enum,

<enum>
<item>Here is the first item.
<item>Here is the second item.
</enum>



You get the idea. Lists can be nested as well; see the example document
for details.
A descrip list is slightly different, and slightly ugly, but
you might want to use it for some situations:

<descrip>
<tag/Gnats./ Annoying little bugs that fly into your cooling fan.
<tag/Gnus./ Annoying little bugs that run on your CPU.
</descrip>



ends up looking like:

Gnats.
Annoying little bugs that fly into your cooling fan.
Gnus.
Annoying little bugs that run on your CPU.




3.8  Miscellany 


There are various other esoteric features in the system as well, most
of which you probably won't use. If you're curious, read the QWERTZ
User's Guide (from ftp.cs.cornell.edu in pub/mdw/SGML).
QWERTZ (and hence, linuxdoc) supports many features such as 
mathematical formulae, tables, figures, and so forth. I don't recommend
using most of these features in the Linux HOWTOs because they won't render
well in plain ASCII. If you'd like to write general documentation in
SGML, I suggest using the original QWERTZ DTD instead of the hacked-up
linuxdoc DTD, which I've modified for use particularly by the Linux
HOWTOs and other documentation. 
The bottom line is, linuxdoc-sgml supports many other features found
in the QWERTZ DTD, but I haven't necessarily tweaked them to work well
with linuxdoc-sgml. If you encounter problems with any of them,
please let me know.


 Next  Chapter,  Previous  Chapter
Table of contents of this chapter,
 General table of contents
 Top  of the document,
  Beginning of this Chapter

3 Writing Documents with linuxdoc-sgml

3.1 Basic concepts

3.2 Special characters

3.3 Verbatim and code environments

3.4 Overall document structure

The preamble

Sectioning and paragraphs

Ending the document

3.5 Cross-references

3.6 Fonts

3.7 Lists

3.8 Miscellany

3 Writing Documents with `linuxdoc-sgml`