Profiling DocBook documents

An easy way to personalize your content for several target audiences

Jirka Kosek

$Id: profiling.xml,v 1.3 2002/03/14 14:19:38 nwalsh Exp $


Table of Contents

Introduction
$0 solution
Usage
Single pass profiling
Conclusion

Introduction

There are many situations when we need to generate several versions of document with slightly different content from the single source. User guide for program with both Windows and Linux port will differ only in several topics related to installation and configuration. It would be futile to create and maintain two different documents in sync. Another real world example – in addition to standard documentation we can have guide enriched with problem solutions from help-desk. It also may be better to store these information in one document in order to make them absolutely synchronized.

Several high-end editing tools have built in support for profiling. User can easily add target audiences for any part of document in a simple to use dialog box. User can select desired target audience before printing or generation of other output formats. Software will automatically filter out excess parts of document and pass rest of it to rendering engine. However, if your budget is limited you can not use commercial solutions. In the following text I will show you simple but flexible profiling solution based on freely available technologies.

$0 solution

In the document we mark parts targeted for particular platform or user group. When generating final output version of document we must do profiling i.e. personalization for particular target audience. Only some parts of document are processed. DocBook has built in support for marking document parts – on almost every element you can use attributes like os, userlevel and arch. We can store identifier of operating system, user group or hardware architecture here. You can also store profiling information into some general use attribute like role. Example 1, “Sample DocBook document with profiling information” shows how document with profiling information might look like.

Example 1. Sample DocBook document with profiling information

<?xml version='1.0' encoding='iso-8859-1'?>
<!DOCTYPE chapter PUBLIC '-//OASIS//DTD DocBook XML V4.1.2//EN'
	                 'http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd'>
<chapter>
<title>How to setup SGML catalogs</title>

<para>Many existing SGML tools are able to map public identifiers to
files on your local file system. Mapping is specified in so called
catalog file. List of catalog files to use is stored in environment
variable <envar>SGML_CATALOG_FILES</envar>.</para>

<para os="unix">On Unix systems you can set this variable by invoking
command <command>export SGML_CATALOG_FILES=/usr/lib/catalog</command>
on command line. If you want maintain value of the variable between
sessions, place this command into startup file,
e.g. <filename>.profile</filename>.</para>

<para os="win">In Windows NT/2000 you can set environment variable by
issuing command <menuchoice><guimenu>Start</guimenu>
<guisubmenu>Settings</guisubmenu> <guisubmenu>Control
Pannel</guisubmenu>
<guimenuitem>System</guimenuitem></menuchoice>. Then select
<guilabel>Advanced</guilabel> card in the dialog box and click on the
<guibutton>Environment Variables...</guibutton> button. Using the
<guibutton>New</guibutton> button you can add new environment variable
into your system.</para>

</chapter>

DocBook documents are often processed by freely available DSSSL and XSL stylesheets. Most DocBook users who want profiling starts with creation of customization layer which filters out some parts of document. This approach has several serious disadvantages. First, you must create profiling customization for all output formats as they are using different stylesheets. This mean that you must maintain same code on several places or do some dirty tricks with importing several stylesheets into one stylesheet.

Second drawback is more serious. If you override templates to filter out documents, you can get almost correct output in a single run of stylesheet. If you will closely look on generated output, you will recognize that in the table of contents there are entries for items which should be completely removed by profiling. Similar problems are in several other places – e.g. skipped auto generated numbers for tables, pictures and so on. To correct this one should change all stylesheet code which generates ToC, cross-references and so on to ignore filtered content. This is very complicated task and will disallow you to easily upgrade to new versions of stylesheets.

Thus we must use different approach. Profiling should be totally separate step which will filter out some parts of original document and will create new correct DocBook document. When processed with any DocBook tool or stylesheet you will get always correct output from the new standalone document now. Big advantage of this method is compatibility with all DocBook tools. Filtered document is normal DocBook document and it does not require any special processing. Of course, there is also one disadvantage – formating is now two stage process – first you must filter source document and in second step you could apply normal stylesheets on result of filtering. This may be little bit inconvenient for many users, but whole task can be very easily automated by set of shell scripts or batch files or whatever else. Starting from version 1.50 of XSL stylesheets you can do profiling in one step together with normal stylesheet processing.

Figure 1. Profiling stream

Profiling stream

When implementing filter, you can use many different approaches and tools. I decided to use XSLT stylesheet. Writing necessary filter is very easy in XSLT and many users have XSLT processor already installed. Profiling stylesheet is part of standard XSL stylesheets distribution and can be found in file profiling/profile.xsl.

Usage

If you want to generate Unix specific guide from our sample document (Example 1, “Sample DocBook document with profiling information”) you can do it in the following way. (We assume, that command saxon is able to run XSLT processor on your machine. You can use your preffered XSLT processor instead.)

saxon -o unixsample.xml sample.xml profile.xsl "os=unix"

We are processing source document sample.xml with profiling stylesheet profile.xsl. Result of transformation is stored in file unixsample.xml. By setting parameter os to value unix, we tell that only general and Unix specific parts of document should be copied to the result document. If you will look at generated result, you will notice that this is correct DocBook document:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE chapter
  PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" 
         "http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd">
<chapter>
<title>How to setup SGML catalogs</title>

<para>Many existing SGML tools are able to map public identifiers to
files on your local file system. Mapping is specified in so called
catalog file. List of catalog files to use is stored in environment
variable <envar>SGML_CATALOG_FILES</envar>.</para>

<para os="unix">On Unix systems you can set this variable by invoking
command <command moreinfo="none">export SGML_CATALOG_FILES=/usr/lib/catalog</command>
on command line. If you want maintain value of the variable between
sessions, place this command into startup file,
e.g. <filename moreinfo="none">.profile</filename>.</para>

</chapter>

It is same as the input document, only Windows specific paragraph is missing. Same procedure can be used to get Windows specific version of document. The result generated by profiling stylesheet have correct document type declaration (DOCTYPE). Without it some tools would not be able to process them further. On the result of filtering you can run common tools – for example DSSSL or XSL stylesheets.

Stylesheet support several attributes for specifying profiling values. They are summarized in the following list.

profile.os

This parameter is used for specifying operating system (os attribute) for which you want get profiled version of document.

profile.userlevel

This parameter is used for specifying user level (userlevel attribute) for which you want get profiled version of document.

profile.arch

This parameter is used for specifying hardware architecture (arch attribute) for which you want get profiled version of document.

profile.condition, profile.conformance, profile.revision, profile.revisionflag, profile.security, profile.vendor, profile.role, profile.lang

These parameters can be used to specify target profile for corresponding attributes.

profile.attribute

Name of attribute on which profiling should be based. It can be used if profiling information is stored in other attributes then os, userlevel and arch.

profile.value

This parameter is used for specifying value for attribute selected by attr parameter.

E.g. setting profile.attribute=os and profile.value=unix is same as setting os=unix.

profile.separator

Separator for multiple target audience identifiers. Default is ;.

Current implementation is able to handle multiple profiling targets in one attribute. In that case you must separate identifiers by semicolon:

<para os="unix;mac;win">...</para>

It is possible to use different separator than semicolon by setting sep parameter. There cann't be spaces between separator and target names.

You can also perform profiling based on several profiling attributes in a single step as stylesheet can handle all parameters simultaneously. For example to get hypothetical guide for Windows beginners, you can run profiling like this:

saxon -o xsample.xml sample.xml profile.xsl "profile.os=win" "profile.userlevel=beginner"

As you can see above described profiling process can be used to substitute SGML marked sections mechanism which is missing in XML.

Single pass profiling

If you are using XSL stylesheets version 1.50 and later with EXSLT enabled XSLT processor (Saxon, xsltproc, Xalan) you can do profiling and transformation to HTML or FO in a single step. To do this use stylesheet with prefix profile- instead of normal one (e.g. profile-docbook.xsl, profile-chunk.xsl or profile-htmlhelp.xsl). For example to get HTML version of profiled document use:

saxon -o sample.html sample.xml .../html/profile-docbook.xsl "profile.os=win" "profile.userlevel=beginner"

No additional processing is necessary. If you want to use profiling with your customized stylesheets import profiling-able stylesheet instead of normal one.

Conclusion

Profiling is necessary in many larger DocBook applications. It can be quite easily implemented by simple XSLT stylesheet which is presented here. This mechanism can also be used to simulate behavior of marked sections known from SGML.