SPFE Architecture

SPFE is essentially a set of ideas about how to design a structured authoring system that is affordable, highly customizable and supports a high degree of automation. These goals have often been at odds with each other. Even if people accepted the premise that creating document types that were highly specific to their own products and business processes could increase productivity, reduce errors, and increase automation, these potential benefits were often offset by the cost of creating such a highly specific system. SPFE sets out to address this problem by

  • Offering the highest degree of reusable structure and processing code without sacrificing either the simplicity or the task-specific nature of a customized document type.
  • Reducing or eliminating the need for expensive content management systems, or at least allowing organizations to delay acquiring such systems until they have sufficient experience creating and processing structured content to frame their requirements based on their genuine business needs.
  • Being as resistant as possible to the hidden killer of content management systems — the creeping disorder that can infect many systems that are not adequately structured and maintained, and which can result in a slow strangulation of productivity as the system ages.

Database publishing architecture

The key feature of the SPFE architecture is that it is designed to support the database publishing approach to structured writing and publishing. This means treating the content as a database and building and organizing information sets based on queries rather that on maps or pick lists. Supporting the database publishing approach is key to SPFE’s goal of reducing dependence on expensive content management systems and their ongoing overhead. SPFE does not require (though it can work with) a database management system. It can use XML documents alone as a database.

Publishing tool chain

The main component of the SPFE architecture is the publishing tool chain. The publishing tool chain is a loosely-coupled message-based pipeline with four major stages:

  • Synthesis
  • Presentation
  • Formatting
  • Encoding

The acronym SPFE is taken from the initials of these four stages. Each stage has a defined set of responsibilities:

SPFE Architecture

The SPFE Architecture

Synthesis

The synthesis stage is responsible for pulling together a set of topics that make up an information set.  This may involve:

  • Querying a repository to select a set of topics to include in the information set.
  • Fully resolving every topic, which includes importing fragments and graphics, resolving conditions, and ensuring that all names and references are fully resolved and all objects required for the build are available.
  • Extracting content from external sources, combining it with authored content, if required, and creating fully resolved topics.

The output of the synthesis stage is a collection of fully resolved XML topics, with their associated meta-data. Each topic is still in its original topic type schema, or very close to it.

Presentation

The presentation stage is responsible for organizing the set of topics provided by the synthesis stage and determining how they will be presented, organized, and linked. The presentation stage deals with the specific topic type schemas of all the topics and does any processing required to convert that topic type into a publishable format. Linking is generated dynamically from in-line metadata using queries.

The presentation step is also responsible for making sure that the topics are presented in a way that is appropriate to a specific media. Thus if you are publishing to two different media, you will probably have two different presentation processes, one for each media.

The output of the presentation stage is a collection of topics in a common presentation format appropriate to the type media in which they are to be presented. At this point, the presentation is not as specific as a particular file format like HTML or CHM. Rather, it is specific to the type of presentation, such as paper, help system, or web. Thus you might have a help presentation process which outputs into a generalized help presentation schema which could then be transformed into Eclipse Help or CHM, or you could have a paper presentation schema which could then be transformed into PDF or TeX.

Formatting

The formatting stage is responsible for the appearance of the content in a specific media. If the media is XML based, such as XHTML, then the formatting stage outputs in that format and the encoding stage is not required.

In some cases, the final format may require additional files to be generated besides the topic files themselves. For instance, if you are generating Eclipse help, you have to generate plugin, context, and index files to integrate your content with the Eclipse help system.

Encoding

The output of the formatting stage is an XML encoding of the final output format. In the case where the final output format it not XML-based, this output must then be encoded in its final form by the Encoding stage. For example, if the final format of the content is PDF, the formatting stage would create an XML representation in the form of an XSL-FO file. The encoding would then be performed by an XSL-FO engine.

In most cases, the encoding stage will be performed by an existing third-party application such as a help compiler or an XSL-FO engine.

Advantages of the modular tool chain

This modular tool chain is key to the database publishing support in the SPFE architecture. The synthesis step executes the initial query that selects the the content to be included in the content set. This query also performs any reuse by processing conditions and inclusions. The presentation step then queries the the result of the synthesis stage to organize and link the content set. This separation of the two query stages ensures that links are only formed within the content present in the content set, eliminating the need to manage links at the repository level.

A second advantage of this separation of the processing pipeline into these distinct stages is that it allows you to add new content types without having to create new formatting or encoding routines. For the most part, adding a new topic type, for instance, only requires creating a new presentation routine. As long as standard metadata structures are used, the synthesis process tends to be generic for authored content. New reference types, on the other hand, or new content pulled in from external sources, usually require a new synthesis routine, but can use standard presentation, formatting, and encoding.

Adding a new output format, on the other hand, generally involves only a new formatting routine, while changes in the way content is organized can be accomplished with a new presentation routine, leaving all the other layers intact.

The work of creating a new synthesis, presentation, formatting, or encoding routine is further reduced by the use of modular schemas and scripts.

Schema meta-structure

In SPFE, schemas and scripts are constructed in modular fashion to allow for the highest degree of reuse, thus reducing the time and cost involved in introducing a new topic or reference type into your system. The key to making the modularity most effective is the definition of a consistent meta-structure for all schemas. Within the meta-structure, you can plug in structural elements from any source that adheres to the meta-structure.

Note that the principle that there should be a meta-structure is what is fundamental to SPFE. The SPFE open toolkit will implement a specific meta-structure, and all schemas and scripts in the SPFE Open Toolkit will be written to conform to it. There will be considerable benefit in adopting the meta-structure used in the SPFE Open Toolkit, as it will allow you to take a great deal of advantage of the schemas and scripts that the toolkit provides. However, you could develop your own meta-structure and you would still be doing SPFE.

The meta-structure of a SPFE schema, per the SPFE Open Toolkit, is as follows:

topic
     head
          identity
          tracking
          index
     body
          topic-structures
               text-structures
                     paragraphs
                          mentions
                          decorations

A few notes on each of these:

topic: The topic is simply the container for all the other parts of the meta-structure.

head: The head is a container for topic metadata. Whether topic metadata resides inside the topic or is attached to it externally is generally an implementation detail at the repository level. However, much of what the synthesis and presentation layers do is dependent on the topic metadata, so it is normally useful to have the metadata internal to the topic once it is extracted from the repository for publishing purposes. It is therefore part of the default meta-structure. It should be noted that in SPFE metadata is not an afterthought or a label applied to a published artefact after the fact. In SPFE, metadata comes first, before content. This is fundamental to treating content as a database rather than simply storing it in a database.

identity: The identity is a container for the metadata that establishes the identity of the topic. This can be as simple as a URI, but it may also include specifications of the scope of a topic in various dimensions.

tracking: The history and current status of the topic. This, particularly, may be moved to the repository metadata if using a repository that handles this kind of metadata. But SPFE is designed to be useful using a file system, with or without a version control system, so it is part of the default meta-structure.

index: In order to support query-based linking, a SPFE topic must be indexed. The index specifies the subjects that the topic covers. It may be authored or, in the case of some types of reference content, created by the synthesis layer.

body: The body contains the actual content of the topic.

topic-structures: The topic-structures are the specific set of fields and structures that define a topic of a particular type. Basically, these set the data structure rules that make this topic type distinct from all others. Generally when you create a new topic type, the only stuff that will actually be new are the topic-structures. Everything else will be implemented using existing modules, either your own or those from the SPFE Open Toolkit.

text-structures: The text-structures are the basic text things like paragraphs, lists, sections, tables, etc. text-structures are inserted into topic-structures at any point where general text is required. (For example, in a reference, you might have a number of topic-structure elements that just take plain text content, but then a description field that takes complex text.) You may use the text-structures from the SPFE Open Toolkit, or create your own to suit your organization’s style.

paragraphs: Paragraphs have a special status in the SPFE schema meta-structure. While they are a text structure, they are the only text structure that is allowed to have mixed content. That means that if you want to have any in-text markup (markup that occurs in the middle of a string or normal text, like the <strong> tag in HTML, for instance), it has to be inside a paragraph. This means that you can’t do things like putting mixed content directly inside a list item as you would in HTML. In SPFE, if you want to put text into a list item, you have to put it inside a paragraph (p) element. This rule makes it much easier to manage all in-text markup in a SPFE schema. It also make it easier for writers to learn SPFE schemas. Of course, there is nothing to prevent you from adding other mixed-content elements in your own schemas, but following the SPFE convention will reduce the amount of work you have to do.

mentions: Mentions are something a bit special in SPFE. SPFE follows the rule that there is to be no applications semantics in the content. Among other things, this means that you can’t define links directly, since a link is an application semantic. What you can do is mention things. Things you can mention include:

  • Named objects in the real world, for instance, an API routine.
  • Things in the real world that don’t have a single canonical name, like a user task.
  • Resources in your repository (which must be mentioned by their URI, to comply with the principle of location independence).
  • Objects within the current topic, for instance, a code sample.
  • Terms — words that have particular significance or meaning in your subject area.

Each of these types of mentions can be resolved to a link or an inclusion when the topic is included in a topic set (a process called soft linking).

The use of mentions greatly simplifies authoring, because the author does not have to search for resources to link to. It also simplifies reuse, because links and inclusions are only processed after the set of topics in a particular information set is selected, meaning that there is no possibility of broken links. Both these things greatly reduce the need for a CMS in a SPFE system.

decorations: Finally, there are decorations, which are basic text decorations like bold, italic, and code. Mentions can also be decorated in output, depending on their type (the mention of a book title would be decorated in italics, for instance). Decorations are for things that do not fall into any of the categories of mentions that are defined in the system.

Why allow bold and italic to be specified in a system that otherwise is so strict about structure? Quite simply, there are always going to be things that a structure does not take account of, or that authors do not know how to mark up using the current set of mentions. If you don’t provide a safety valve, either the authors are going to interrupt their work to try to figure out what to do, or they are going to cheat and use an existing mention element because they know it will be decorated the way they want in the output.

Having writers use markup in a way that it was not intended to be used is a sure way to create the slow growth of disorder in the content set that can choke a structured system over the long run. It is much better to give authors pure decoration elements that they can use when they don’t know what else to use. This way, none of the semantic elements get misused, and it is easy to do an audit of the decoration elements from time to time to determine if a semantic mention element should have been used, or if you need to add a new kind of mention element to your schemas.

No nested topics

An important thing to notice about the meta-structure of a SPFE topic is that it does not allow for nested topics. You cannot make topics up out of other topics. Each topic is complete in itself. This rule removes a huge amount of complexity from the SPFE system, contributing significantly to its goals of simplicity and reliability.

While nested topics are not supported, you can use fragments. Fragments are a text-structure-level entity that you can include in your content by reference. Fragments can be used in the following ways:

  • to import common content into to multiple topics
  • to import different content into a topic when it is used in different contexts
  • to conditionalize content, either by applying conditions to the fragment or by remapping the URI of the fragment to be included

Fragments are designed with simplicity and reliability in mind. They avoid any ambiguity about whether a topic is complete in itself or not, and both fragments and the topics that use them can be validated independently (meaning that the fragment reference does not have to be resolved in order to validate a topic file).

Independent validity of all content objects

In this respect, fragments obey another architectural constraint, which is that all content objects should be valid in themselves. Some systems use mechanisms to break content up into multiple files, but require all the files to be read in order to validate the master file, and frequently provide no mechanism for validating the sub-files independently. This kind of arrangement complicates the system and can create a requirement for sophisticated channels to exist between the editor and the repository, potentially creating additional costs or vendor lock-in. By requiring that all object be independently valid, SPFE simplifies content management requirements and keeps tool options open.

(It should be noted that the principle of independent validity does not apply to all objects in the system, just to content objects. Applying it to schema files, for instance, would make much of the modularity of the architecture impossible. Many schema components cannot be fully validated without reference to other schema components.)

Modularity of schema and script components

The schema meta-structure provides a framework for modularizing schema and script components. Schemas are modularized along the lines of the meta-structure. Within a schema, the content of a segment of the meta-structure is represented by a named group. This allows you to substitute a different schema file defining each group in the meta-structure, and thus mix and match components from the SPFE Open Toolkit, third parties, or your own collection to create new topic types which reuse as much existing structure as possible.

Scripts are modularized on the same lines, allowing you to substitute script modules in the same way that you substitute schema modules, thus allowing you to reuse processing the way you reuse structure. It is worth noting that this reuse of structure and processing is done entirely using mechanisms native to XML schemas and XSLT scripts (and generally available in most schema and programming languages). SPFE does not involve a special or propriety reuse mechanism, only a set of constraints that discipline the use of already existing mechanisms. This helps keep the system simple and easy to integrate.

 

{ 1 comment to read ... please submit second! }

  1. Mark – thanks for the detailed write-up on SPFE. This page and your comparison with DITA went a long way to filling in gaps in my knowledge. An elegant solution.

{ 0 Pingbacks/Trackbacks }

Leave a Reply