Bob Stayton

$Id: publishing.xml,v 1.4 2002/06/03 19:26:58 xmldoc Exp $

Using XSL tools to publish DocBook documents
A brief introduction to XSL
XSL processing model
Customizing DocBook XSL stylesheets

There is a growing list of tools to process DocBook documents using XSL stylesheets. Each tool implements parts or all of the XSL standard, which actually has several components:

Extensible Stylesheet Language (XSL)

A language for expressing stylesheets written in XML. It includes the formatting object language, but refers to separate documents for the transformation language and the path language.

XSL Transformation (XSLT)

The part of XSL for transforming XML documents into other XML documents, HTML, or text. It can be used to rearrange the content and generate new content.

XML Path Language (XPath)

A language for addressing parts of an XML document. It is used to find the parts of your document to apply different styles to. All XSL processors use this component.

To publish HTML from your XML documents, you just need an XSLT engine. To get to print, you need an XSLT engine to produce formatting objects (FO), which then must be processed with an FO engine to produce PostScript or PDF output.

This section provides a discussion about which XSLT engines you might want to use to generate HTML and FO output from your DocBook XML documents, along with a few short examples of how to actually use some specific XSLT engines to generate that output. Before using any particular XSLT engine, you should consult its reference documentation for more detailed information.

Currently, the only XSLT engines that are recommended and known to work well with the DocBook XSL stylesheets are Daniel Veillard's C-based implementation, xsltproc (the command line processor packaged with libxslt, the XSLT C library for Gnome), and Michael Kay's Java-based implementation, Saxon.

The following engines are not currently recommended for use with the DocBook XSL stylesheets:

James Clark's XT

XT is an incomplete implementation of the XSLT 1.0 specification. One of the important things that's missing from it is support for XSLT "keys", which the DocBook XSLT stylesheets rely on for generating indexes, among other things. So you can't use XT reliably with current versions of the stylesheets.

Xalan (both Java and C++ implementations)

Bugs in current versions of Xalan prevent it from being used reliably with the stylesheets.

Your choice of an XSLT engine may depend a lot on the environment you'll be running the engine in. Many DocBook users who need or want a non-Java application are using xsltproc. It's very fast, and also a good choice because Veillard monitors the DocBook mailing lists to field usage and troubleshooting questions and responds very quickly to bug reports. (And the libxslt site features a DocBook page that, among other things, includes a shell script you can use to automatically generate XML catalogs for DocBook.) But one current limitation xsltproc has is that it doesn't yet support Norm Walsh's DocBook-specific XSLT extension functions.

If you can use a Java-based implementation, choose Michael Kay's Saxon. It supports Norm Walsh's DocBook-specific XSLT extension functions.

A variety of XSLT engines are available. Not all of them are used much in the DocBook community, but here's a list of some free/open-source ones you might consider (though xsltproc and Saxon are currently the only recommended XSLT engines for use with DocBook).

For generating print/PDF output from FO files, there are two free/open-source FO engines that, while they aren't complete bug-free implementations of the FO part of the XSL specification, are still very useful:

Of those, PassiveTeX currently seems to be the more mature, less buggy implementation.

And there are two proprietary commercial products that both seem to be fairly mature, complete implementations of the FO part of the XSL specification:

  • current versions of Arbortext Epic Editor include integrated support for processing formatting object files

  • RenderX XEP (written in Java) is a standalone tool for processing formatting object files

Before using any XSLT engine, you should consult the reference documentation that comes with it for details about its command syntax and so on. But there are some common steps to follow when using the Java-based engines, so here's an example of using Saxon from the UNIX command line that might help give you general idea of how to use the Java-based engines.

You'll need to alter your CLASSPATH environment variable to include the path to where you put the saxon.jar file from the Saxon distribution. And you'll need to specify the correct path to the docbook.xsl HTML stylesheet file in your local environment.

CLASSPATH=saxon.jar:$CLASSPATH
export CLASSPATH
java  com.icl.saxon.StyleSheet  filename.xml docbook/html/docbook.xsl > output.html

If you replace the path to the HTML stylesheet with the path to the FO stylesheet, Saxon will produce a formatting object file. Then you can convert that to PDF using a FO engine such such as FOP, the free/open-source FO engine available from the Apache XML Project (http://xml.apache.org/fop/). Here is an example of that two-stage process.

CLASSPATH=saxon.jar:fop.jar:$CLASSPATH
export CLASSPATH
java  com.icl.saxon.StyleSheet filename.xml docbook/fo/docbook.xsl > output.fo
java  org.apache.fop.apps.CommandLine output.fo output.pdf

Using a C-based XSLT engine such as xsltproc is a little easier, since it doesn't require setting any environment variables or remembering Java package names. Here's an example of using xsltproc to generate HTML output.

xsltproc docbook/html/docbook.xsl filename.xml > output.html

Note that when using xsltproc, the pathname to the stylesheet file precedes the name of your XML source file on the command line (it's the other way around with Saxon and with most other Java-based XSLT engines).