Why Your Content Management System Will Fail

Your content management system is going to fail. Don’t take my word for it, though. Ask Robert Metcalfe. He formulated Metcalfe’s Law, which states that “the value of a telecommunications network is proportional to the square of the number of connected users of the system”.

What has a telecommunications network got to do with content management failure? A content management system manages pieces of content, and increases their value by managing connections between them. Metcalfe’s law provides a formula for determining how the addition of nodes to a network increases the number of possible connections between the nodes, and thus the value of the network. The formula for determining the number of unique connections is: n(n-1)/2, where n is the number of nodes in the network.

Thus, if there are 4 nodes in a network the number of connections is 6:

4(4-1)/2 = 4*3/2 = 6

But when the number of nodes grows, the number of connections grows faster, so with 5 nodes we get 10 connections:

5(5-1)/2 = 5*4/2 = 10

For 20 nodes:

20(20-1)/2 = 20*19/2 = 190

For 200 nodes:

200(200-1)/2 = 200*199/2 = 19 900

And for 2000 nodes:

2000(2000-1)/2 = 2000*1999/2 = 1 999 000

Yes, a network with 2000 nodes has almost 2 million unique connections. Cool huh?

But again, so what?

This is what: content management is not only about managing individual topics, it is about managing the relationships between topics. So if you have 2000 topics in your repository, the content management problem is not only to manage those 2000 topics, but to manage the 1 999 000 unique potential relationships between those topics.

Now, you may object that in a CMS, every node is not directly connected to every other node, and that is true. But when you add a new node, it could potentially have a direct connection to any other node in the network. To fully integrate a new node into the network, therefore, every one of those potential connections ought to be evaluated.

The flip side of Metcalfe’s Law is that while every new node in the network increases the value of the network as a function of n(n − 1)/2, it also increases the cost of maintaining the connections between the nodes of the network as a function of n(n − 1)/2. (There is some controversy about the value part of Metcalfe’s Law — some contend that value does not actually increase so rapidly as the network grows. But when it comes to cost, things are far more black and white. Each new connections clearly increases the cost to add and manage connections. Metcalfe’s Law simply points out just how fast the number of connections grows.)

Actually, the cost to integrate a new node into the network in a CMS is worse than this. Metcalfe’s formula  assumes that there is only one type of connection between nodes in a network, But in a CMS, there are multiple types of relationships between topics. There are link relationships, reuse (transclusion) relationships, temporal (version) relationships and potentially others. When a new node is integrated, each of these possible relationships ought to be evaluated, meaning that the cost is multiplied by the number of different types of relationships to be evaluated.

But wait! It gets worse! So far we have been talking about this as if the relationship between two topics was a single relationship. But in fact, it is two distinct relationships. How A relates to B and how B relates to A are different. That means that when you create topic A, it is not enough to consider how it relates to every other topic in the network. You also have to ask, how does every other topic in the network relate to the topic I just created?

But wait! It gets worse still! Adding a topic to a CMS is not like installing a fax machine. A fax machine will always have the exact same relationship with all other fax machines, and so replacing your fax machine with a new one does not require you to reexamine its relationships with other fax machines. But every time you make a substantive edit to a topic, you potentially alter its relationships to every other topic, meaning that one complete edit cycle on the content set creates the same cost as originally adding the topics to the set (supposing you actually do the evaluations).

The upshot of all this is that if you have more than a trivial number of topics in your collection, and you are managing the relationships between those topics by hand (even with machine assistance) then you are simply not evaluating most potential relationships, and not re-evaluating and maintaining most of the actual relationships that you have established, and that is going to lead, sooner or later, to the failure of your content management system.

A failed content management system can experience either a hard landing or a soft landing.

When a hard landing occurs, you lose the ability to create any correct output from the CMS. Generating correct content relies on relationships between topics. When those relationships are broken, missing, or so convoluted that nothing works the way you expect it to, the value of the network goes to zero.

A soft landing generally involves the abandonment of the attempt to manage the content set as a whole, and the development of small clusters of topics, each of a more manageable size. (Ten unrelated networks of 200 nodes have 199,000 relationships compared to the 1,999,000 for a single network of 2000 nodes, a tenfold reduction in cost.)  In this circumstance the full value of the network is not realized, but at least production can continue. Production from a CMS after a soft landing may actually still be better than with the old desktop publishing  system, thought the productivity gain may well not be enough to offset the total cost of the CMS system deployment. The value of the network does not go to zero in this scenario, but its ROI will often be negative.

Solutions? Two avenues offer promise:

1. Avoid the massive deployment cost of a CMS and move to a wiki instead. Here there are two routes you can go. One is to go the true wiki route and open the wiki so that ever reader can be a contributor. This addresses the relationship management problem through the many-hands-make-light-work principle. This is how Wikipedia gets it done. The other is to confine authorship to a small group. Since a wiki does nothing to reduce the cost of relationship management, this approach will lead to same clustering as the soft landing of a broken CMS (most corporate wikis display this clustering). But because the massive CMS deployment cost does not have to be factored in, you can probably improve your productivity over DTP and show a positive ROI overall.

2. Do what people who successfully manage huge data sets do, and move to a full database model — content as a database, not merely content in a database — and use database methods — essentially the automatic discovery of relationships based on metadata — to reduce the cost of evaluating and managing each potential relationship to near zero. This allows you to manage the relationships in a cost-effective manner, and thus allows you to realize the full value of the network.

This second approach requires a more radical rethink of the way you create content, but it is actually technically much simpler and less expensive to implement than the CMS approach. This approach is what SPFE is designed to support.

SPFE Open Toolkit Alpha 1

The first alpha release of the SPFE Open Toolkit is now available. This is a very preliminary release, more likely to be of interest to XML geeks than to authors. A lot of stuff is not here or is not working, but the basic configuration and build system is working, and demonstrates the modularity of the system. Soft linking is also working. There is minimal documentation, and what there is has to be built by running the SPFE build on the source files, per the instructions below.

Download for Windows: spfe.alpha1.20120422.zip

Download for Linux: spfe.alpha1.20120422.tar.gz

To make it run:

  1. Unpack the archive to a suitable location.
  2. Create an environment variable SPFEOT_HOME and set it to the location of the spfe-ot directory. For example: SPFEOT_HOME=/home/yourname/spfe/spfe-ot
  3. Add the spfe-ot directory to your path.
  4. If not already installed, install Java.
  5. If not already installed, install ANT and add it to your path.

To build the SPFE docs:

  1. Go to the directory spfe/spfe-docs/build.
  2. Enter: spfe spfe-docs-config.xml draft. The build will create a directory spfebuild in your home directory. The SPFE docs will be in /spfebuild/spfe-docs/output.

If it does not work for you, use the comments to ask for help. Ditto if you can’t figure out what it does. Ditto if you would like to contribute to the development.

More documentation will be coming soon.

In Praise of Silos

I’m not sure who first started to use the word “silo” as a term of abuse in the computing world, but it seems to be the go-to word for any systems that limits access to its data in any way that the author deems inappropriate. I’m not sure if whoever coined the term was thinking of grain silos or missile silos — perhaps the latter, which would fit the usage better.

Grain Elevator

Silos store and organize grain for distribution.

Anyway, silos are getting a bad rap. A grain silo is a container designed specifically to collect grain produced in a local market, sort it, classify it, and store it for shipment to global markets. Putting data in a silo is portrayed as a means of keeping other people from getting to the data. But putting grain in a silo is the very opposite of preventing other people from getting to the grain; the silo exists for the sole purpose of shipping grain efficiently to other people.

Imagine if we tore down all the silos and forced every farm to truck its grain directly to the shipping terminal, bypassing the silos and the railways. Costs would skyrocket and chaos and delays would be introduced all along the line. Is this really how we want to manage our content?

 

Continue reading →

The Difference Between Content in a Database and Content as a Database

A great deal of content lives in databases these days, but there is a world of difference between storing content in a database and treating content as a database. The essence of the difference is this: Content in a database is an object that can be retrieved from its indexed location, like locating a dining room chair in an Ikea warehouse. Content as a database is a record that can be examined and presented from different angles based on different properties, which can be selected based on any of these properties, and which can be related to other records based on common properties.

A physical metaphor is more difficult here, because this is precisely the kind of thing that is really hard to do in the physical world. In the physical world you can organize your bookshelf by title or by author or by publisher or by size or by color or by height, but whatever you choose, you a preferring one property over all the others. In a database, all properties are equal and you can create a report on your bookshelf that is organized by title or by author or by publisher or by size or by color or by height or by any other property that is separately addressable in the database.

Continue reading →