SPFE Principles

The SPFE architecture is informed by a set of principles:

Low cost and reliable data

A custom structured authoring system must contain costs, but most of the cost associated with structured authoring are not buying tools. The biggest costs are customization, ongoing system maintenance, and content management. There are a number of ways to control these costs, but the most fundamental is by creating reliable data. Reliable data in necessary to support database publishing features such as automated link generation, which in turn simplify editorial and content management tasks, and thus reduce content management costs.

Reliable data, in this context, means something very specific. It means that you can perform operations on the data programmatically without the need for human beings to configure the details and inspect the result of every operation. To put it another way, it means you can automate decision making about the presentation of the content. The more operations your data is reliable for, the lower your running costs will be.

Making data reliable, and containing costs, requires living with certain constraints. The SPFE architecture and the SPFE approach to structuring data incorporate these constraints. By focusing on constraints, rather than specifying a particular way to do things, SPFE allows you to create the reliable data structures that are appropriate to your business.

While SPFE does not dictate any specific schema, the SPFE architecture is designed to allow for the modular construction of schemas and transformation scripts. The SPFE Open Toolkit will provide a set of basic building blocks for schema and script construction, allowing you to reuse structures and scripts as much as possible in creating your schemas, reducing your costs and allowing you to focus on just those aspects of the schema that are specific to your business.

Embrace algorithms

We live in the age of the algorithm, an age in which value is created by creating and deploying algorithms. Algorithms can be used not only to automate tasks, but to automate decision making. Algorithms depend on data. So many of the wonderful things that smart phones can do today, for example, depend on the ability of algorithms to combine location data generated on the phone with geographic information downloaded from the net. There is nothing remarkable or new about the data involved: it is the algorithm that adds the value by how it manipulates the data.

Content is data, and we should create our content as a database so that it becomes accessible to algorithms — not merely our own, necessarily, but other peoples as well. Thus it is a principle of SPFE that content should be created as a database that is accessible to algorithms, and that we should embrace algorithms as part of our content strategy.

Focus on data collection before publishing

Influenced by the desktop publishing model, which assumes that authors are responsible for the organization and published appearance of content, most structured writing systems are built from the publishing end backwards. The result is a set of content structures that are, to a greater or lesser extent, abstractions of publishing artifacts.

This creates two problems for the capture of reliable data. First, because the schemas are abstractions of publishing artifacts, they are generally only reliable for publishing operations. The data is often not reliable for operations such as the collection and organization of content for a particular purpose, linking content, extracting and merging content, and validating content. This means that these time-consuming tasks continue to fall on writers.

Secondly, it encourages authors to continue to work in a WYSIWYG view, as in the desktop publishing systems they are used to. The problem this creates is that when someone works in a WYSIWYG view, they rely on the visual appearance of the text as their feedback to know they have created the content correctly — if it looks right, it is right. Reliance on visual feedback is why, after twenty years of word processing and desktop publishing, most people still do not use styles consistently. Text formatted by hand or by the application of styles looks the same, and if it looks right, it is right. In this kind of authoring environment it is very difficult to get data that is reliable for anything other than publishing, and even then, it is not always reliable for publishing in every target media.

The SPFE architecture, therefore, is not based on the desktop publishing model, but on the database model. This does not necessarily mean the relational database model. There have always been other database models. By itself, an XML document is a hierarchical database, which is a database model that predates the relational model.

Both models are useful when dealing with content, but when we say that SPFE is based on the database model, we are not talking about a particular model, but about the general principles of database design, in particular:

  • The starting point for a robust system design is the data model itself. You start by designing a data model that is reliable for all the functions you want to perform on the data.
  • The next consideration is to ensure that you get reliable data entry. A reliable data format is useless if the data itself is not reliable, meaning that it has to follow the rules of the data model. The only way you get reliable data is to design your data entry system to ensure reliable data entry. This means that the feedback to the author must be based on data integrity, not visual appearance. (This does not mean you have to work in raw XML; it means that if you work in a visual editing view, that view will look more like a form than a published page.)
  • Finally, if you have reliable data, you can publish it reliably. You don’t worry about the publishing system (the reporting system, in database terms) while you are constructing the data model and the data entry interface, because you know that reliable data can always be published reliably to any media. (This does not mean, of course, that you can get the kind of hand-painted pages your can achieve with a desktop publishing tool. It simply means that you can get effective published output from well-structured data.)

You should design your system, therefore, to make authoring as easy as possible, not to make publishing as easy as possible. Interestingly, however, it will turn out that if you make the content reliable, constructing an effective publishing system will be straightforward.

Metadata first, content afterwards

What makes content into reliable data is its structure and its metadata. Reliable content has consistent high-quality metadata and the content itself conforms to the metadata. This means that the content conforms to its metadata both in what it does and what it does not do. That is, the content does fully everything its metadata says it does, and it does not do anything that its metadata does not say it does. Of the two, the second is, if anything, the more important. Making sure that each piece of content does only what it is supposed to do is really the key to making it reliable.

This strict conformance of content to its metadata is not achieved by creating the content first and then applying metadata afterwards. The metadata must come first, and must guide the author as they create the content.

Validate early, validate often

The key to creating and maintaining reliable content is to validate early and validate often. Validation is also key to assisting authors to create properly structured data.

No application semantics in the content

One of the most important SPFE principles is that there must be no application semantics in the content. This means that there must be nothing in the content itself that dictates how it is going to be organized, linked, formatted, or published. (The publishing system will certainly used content metadata to drive the publishing process, but it will be using content metadata — metadata that describes the content as content — not embedded publishing directives.)

Avoiding application semantics is essential to maintaining the reliability of your content. If authors are entering application semantics in the content, the authors are performing a content processing function, which is time consuming and inevitably unreliable. Because authors cannot be expected to enter processing directives reliably, you will also have to devote time to checking the output.

Also, embedding application semantics in your content locks your content into a particular system. That data cannot be moved to a different system, because the new system will not understand the embedded application semantics. But if you data is free of application semantics, it can be transferred to any system (You may have to transform its structure to insert application semantics if that is what the new system expects, but at least you will be able to do that reliably.) It is worth noting that if you use the SPFE architecture, and abide by this principle, your content will not actually be SPFE content in any specific sense. It will be your content.

Embedding application semantics in the content means you can’t change or add application functionality without updating the content.

Embedding application semantics means that authors have to understand how the publishing system works in order to create content — you can’t embed correct application semantics without understanding how the application works. This means that authors have to be trained in the system in order to write content (or that their contributions have to be restricted to areas that don’t involve specifying application semantics). This limits the authoring pool and your ability to get reliable data from a diverse authoring community.

Finally, avoiding application semantics in your content is essential to really enabling content reuse in your organization. Most systems that make the reuse of content their main focus, only support reuse of content within their own system and their own data format. They have no capacity to reuse content from other sources because their reuse strategy involves embedding application semantics in the content.

A SPFE system can reuse content from any source that does not embed application semantics in the content (and, with a little more effort, from those that do). Similarly, the content in your SPFE system is directly reusable by any system that does not rely on specific application semantics in the content it uses. It can also be reused by a system that does rely on specific application semantics by exporting the data to that format.

Location independence

Content artifacts should be location independent. This means that no artifact should refer it another by a system file path (absolute or relative). If artifacts need to refer to other artifacts (which should be minimized), every artifact should have a URI, and if it must refer to other artifacts, it should do so by URI.

Never author in an interchange format

Many organizations are concerned about their ability to exchange content with partners, customers, and other organizations. They often feel that this requires them to author in the interchange format. Interchanges formats, by their very nature, represent the lowest common denominator between all the parties to the exchange. The more general the interchange format, therefore, the less specific structure it will have, and therefore the less reliable the content will be for the kinds of operations we are interested in. Authoring in an interchange format, therefore, means creating less reliable content.

The other problem with interchange formats is that there are actually multiple interchange formats that you may have to contend with. (The most common interchange format in business today — by a very large margin — is Microsoft Word.) Choosing any one interchange format means cutting yourself off from the possibility of exchanging data with organizations that use a different standard.

Rather than giving up the reliability of your data and trying to guess the winner of the next round of the standards wars, the SPFE approach is to make sure that your data is reliable enough and structured enough that your content can be transformed into any interchange format that you may be asked for. Transforming your content to an interchange format is no different than transforming it to HTML or PDF. As long as you have enough structure in your content to match the semantics of the interchange format, you will have no problems producing data in the format requested of you.

If your are receiving interchange data, chances are it will come in different formats, and different degrees of reliability. There is not a lot you can do to make it more reliable — it is hard to make pigs out of sausages — but you can still pull it into your SPFE system and integrate it with your own content, to whatever extent its degree of reliability supports — which means you are no worse off than if you were creating your own content in the interchange format.

The SPFE principle, therefore, is never to write in an interchange format, because doing so limits your capacity to exchange data.

Semantic richness is more important than any specific format

For purposes of reuse and exchange, it does not matter what format the content is stored in. All that matters is what format it can be transformed into. A local format that can be transformed into many common formats is more reusable than content stored in a common format that cannot be transformed into other common formats.

Be modular and loosely-coupled

No structured writing system is ever finished. New needs develop all the time, and must be accommodated quickly. This responsiveness to new needs is particularly important in a system that puts a premium on reliable data. If the right structures don’t exist to support new content types, that content will be created in a generic fashion. Worse, it may be created by violating the rules of an existing content type. Both these things reduce the reliability not only of the new content, but of the content collection as a whole. It is imperative, therefore, that it be both inexpensive and safe to add new information types to the system as required. The best way to do this is with a modular, loosely coupled architecture.

Modularity is imperative in order to allow people to quickly and inexpensively create a new content type. By making schemas and processing applications modular, you can focus on creating and processing just those structures that are specific to the new content type that you are creating, and can plug in the existing modules to do everything else.

Support constraint-based collaboration

There are two forms of collaboration: total integration, in which everybody sees everything, and needs to know everything that is happening in the system, and constraint-based, in which each party works to an agreed set of constraints and does not need to know everything that is happening or see all the content in the system. The problem with total-integration is that it imposes an overhead proportional to the size of the system. As the system grows, the amount of time people must devote to collaborative activities, as opposed to productive activities, increases proportionally. Beyond a certain scale, total integration collaboration has to be abandoned in favor of constraint-based collaboration. SPFE is designed specifically to support constraint-based collaboration. This is a major part of why it has a greatly reduced need for sophisticated content management support.

Don’t try to rule the world

This last principle is in some ways the most important of all. SPFE is not about ruling the world. It is not about encompassing all content or all content processing in a single model or a single architecture. Every architecture is a compromise, and every architecture optimizes for some properties at the expense of others. This is just as true when an architecture strives for universality as when it strives for anything else. Universality comes at the price of complexity and the loss of data reliability. To cite one example, XML has been promoted as the lingua franca of web application development, but for many web applications it is being displaced by JSON, a lighter, simpler format that is not nearly as broad in scope as XML, but which solves a significant set of problems more effectively.

SPFE is about building highly structured content sets that can be processed reliably. It will never be an architecture suitable for doing highly specific hand-massaged page layout. It will never be an architecture suitable for doing ad hoc reuse of arbitrary pieces of loosely structured content. It will never try to rule the world.

{ 1 comment to read ... please submit second! }

{ 1 Pingbacks/Trackbacks }

  1. We Must Remove Publishing and Content Management Concerns from Authoring Systems - Every Page is Page One

Leave a Reply