Why Your Content Management System Will Fail

Your content management system is going to fail. Don’t take my word for it, though. Ask Robert Metcalfe. He formulated Metcalfe’s Law, which states that “the value of a telecommunications network is proportional to the square of the number of connected users of the system”.

What has a telecommunications network got to do with content management failure? A content management system manages pieces of content, and increases their value by managing connections between them. Metcalfe’s law provides a formula for determining how the addition of nodes to a network increases the number of possible connections between the nodes, and thus the value of the network. The formula for determining the number of unique connections is: n(n-1)/2, where n is the number of nodes in the network.

Thus, if there are 4 nodes in a network the number of connections is 6:

4(4-1)/2 = 4*3/2 = 6

But when the number of nodes grows, the number of connections grows faster, so with 5 nodes we get 10 connections:

5(5-1)/2 = 5*4/2 = 10

For 20 nodes:

20(20-1)/2 = 20*19/2 = 190

For 200 nodes:

200(200-1)/2 = 200*199/2 = 19 900

And for 2000 nodes:

2000(2000-1)/2 = 2000*1999/2 = 1 999 000

Yes, a network with 2000 nodes has almost 2 million unique connections. Cool huh?

But again, so what?

This is what: content management is not only about managing individual topics, it is about managing the relationships between topics. So if you have 2000 topics in your repository, the content management problem is not only to manage those 2000 topics, but to manage the 1 999 000 unique potential relationships between those topics.

Now, you may object that in a CMS, every node is not directly connected to every other node, and that is true. But when you add a new node, it could potentially have a direct connection to any other node in the network. To fully integrate a new node into the network, therefore, every one of those potential connections ought to be evaluated.

The flip side of Metcalfe’s Law is that while every new node in the network increases the value of the network as a function of n(n − 1)/2, it also increases the cost of maintaining the connections between the nodes of the network as a function of n(n − 1)/2. (There is some controversy about the value part of Metcalfe’s Law — some contend that value does not actually increase so rapidly as the network grows. But when it comes to cost, things are far more black and white. Each new connections clearly increases the cost to add and manage connections. Metcalfe’s Law simply points out just how fast the number of connections grows.)

Actually, the cost to integrate a new node into the network in a CMS is worse than this. Metcalfe’s formula  assumes that there is only one type of connection between nodes in a network, But in a CMS, there are multiple types of relationships between topics. There are link relationships, reuse (transclusion) relationships, temporal (version) relationships and potentially others. When a new node is integrated, each of these possible relationships ought to be evaluated, meaning that the cost is multiplied by the number of different types of relationships to be evaluated.

But wait! It gets worse! So far we have been talking about this as if the relationship between two topics was a single relationship. But in fact, it is two distinct relationships. How A relates to B and how B relates to A are different. That means that when you create topic A, it is not enough to consider how it relates to every other topic in the network. You also have to ask, how does every other topic in the network relate to the topic I just created?

But wait! It gets worse still! Adding a topic to a CMS is not like installing a fax machine. A fax machine will always have the exact same relationship with all other fax machines, and so replacing your fax machine with a new one does not require you to reexamine its relationships with other fax machines. But every time you make a substantive edit to a topic, you potentially alter its relationships to every other topic, meaning that one complete edit cycle on the content set creates the same cost as originally adding the topics to the set (supposing you actually do the evaluations).

The upshot of all this is that if you have more than a trivial number of topics in your collection, and you are managing the relationships between those topics by hand (even with machine assistance) then you are simply not evaluating most potential relationships, and not re-evaluating and maintaining most of the actual relationships that you have established, and that is going to lead, sooner or later, to the failure of your content management system.

A failed content management system can experience either a hard landing or a soft landing.

When a hard landing occurs, you lose the ability to create any correct output from the CMS. Generating correct content relies on relationships between topics. When those relationships are broken, missing, or so convoluted that nothing works the way you expect it to, the value of the network goes to zero.

A soft landing generally involves the abandonment of the attempt to manage the content set as a whole, and the development of small clusters of topics, each of a more manageable size. (Ten unrelated networks of 200 nodes have 199,000 relationships compared to the 1,999,000 for a single network of 2000 nodes, a tenfold reduction in cost.)  In this circumstance the full value of the network is not realized, but at least production can continue. Production from a CMS after a soft landing may actually still be better than with the old desktop publishing  system, thought the productivity gain may well not be enough to offset the total cost of the CMS system deployment. The value of the network does not go to zero in this scenario, but its ROI will often be negative.

Solutions? Two avenues offer promise:

1. Avoid the massive deployment cost of a CMS and move to a wiki instead. Here there are two routes you can go. One is to go the true wiki route and open the wiki so that ever reader can be a contributor. This addresses the relationship management problem through the many-hands-make-light-work principle. This is how Wikipedia gets it done. The other is to confine authorship to a small group. Since a wiki does nothing to reduce the cost of relationship management, this approach will lead to same clustering as the soft landing of a broken CMS (most corporate wikis display this clustering). But because the massive CMS deployment cost does not have to be factored in, you can probably improve your productivity over DTP and show a positive ROI overall.

2. Do what people who successfully manage huge data sets do, and move to a full database model — content as a database, not merely content in a database — and use database methods — essentially the automatic discovery of relationships based on metadata — to reduce the cost of evaluating and managing each potential relationship to near zero. This allows you to manage the relationships in a cost-effective manner, and thus allows you to realize the full value of the network.

This second approach requires a more radical rethink of the way you create content, but it is actually technically much simpler and less expensive to implement than the CMS approach. This approach is what SPFE is designed to support.

{ 4 comments to read ... please submit one more! }

  1. Patrick G Gribben

    Hi Mark,
    Thank you for a stimulating and thoughtful item.

    You wrote “The upshot of all this is that if you have more than a trivial number of topics in your collection, and you are managing the relationships between those topics by hand (even with machine assistance) then you are simply not evaluating most potential relationships, and not re-evaluating and maintaining most of the actual relationships that you have established, and that is going to lead, sooner or later, to the failure of your content management system.” and that is the nub of your case. It is an inevitabilist position.

    Well, when’s it going to start happening? DITA has been up and running for a while. Shouldn’t there be some evidence of these hard and soft landings?

    Another point which goes against your inevitabilism is that there may be cases where the number of items in the system falls and so is kept within manageable bounds.

    Patrick Gribben

    • Hi Patrick, thanks for the comment.

      Certainly there is an element of hyperbole in the “inevitablism” of the post. The fact remains that the incidence of content management failure is high. Though such things are impossible to quantify exactly — how do you define failure, and where are failures reported — 80% is often quoted as the failure rate.

      The comment is not aimed at DITA in particular, though DITA is obviously a case of particular interest in the tech pubs community. I would argue that there is lots of evidence of soft landings in DITA systems. Most of the publicly identifiable DITA documentation sets that I have seen show evidence of soft landings — frankenbook structures with complex topic hierarchies and little or no internal linking.

      But here the issue of how you define failure is important. Many DITA implementations seem to be designed from the beginning to operate in what I would consider to be a soft landing mode. That is, there is no intention or attempt to manage all of the potential relationships. Content is still managed as a set of “books” with little attempt at linking.

      To me, this constitutes a failure to manage all the interesting relationships in your content, but if it is consistent with project goals, then it will be considered a success by the project designers. The failure to manage all the interesting relationships will only become apparent when managing them becomes a project goal.

      This supports your final point, that failure is only inevitable if the scope of the project, either in terms of the number of nodes, or the number of relationships managed, becomes too great to be managed by hand.

      The question then is, will people be contented indefinitely not to manage these relationships ?

{ 0 Pingbacks/Trackbacks }