<!-- Content Here -->

Where content meets technology

Nov 17, 2008

Guardian Hack Days

Both the New York Times and the Telegraph have been innovating on the newsmedia business model by introducing API's that expose their content and services for outside developers to leverage. Recently The Guardian held their "Hack Day at The Guardian".

To quote:

The concept is simple: take an idea to prototype in a day and then present your work to your colleagues and a panel of judges.
Guardian staff and friends participated in this 24 hour code-fest that ended in 90 second presentations in front of a panel. You can track activity on Twitter and see pictures on Flickr. I still don't know who won yet. I am sure that everyone is a winner :)

Nov 12, 2008

CRX and Tar PM

Thomas Müller has a blog post that nicely describes how the Tar PM works. Tar PM is the fastest of Day CRX's pluggable persistence managers. The speed of Tar PM is a major reason why some companies go with the CRX rather than the free JCR reference implementation Apache JackRabbit.

The key to Tar PM's speed is that it only supports write operations and these operations just append the new data to the end of one big file (a TAR file actually). What content objects are stored where is recorded in an index which is also read-only. To prevent limitless growth of the data file, you need to periodically run maintenance programs that compress the file be removing deleted records.

This may seem very familiar to those of you who have managed systems built on Zope (like Plone) and have had to "pack the database" - an operation that does essentially the same thing as the CRX tools plus remove unneeded intermediate versions that were defensively saved during transactions. From my experience with Zope, I know that having a huge, single file database can be scary but not necessarily dangerous. What do you do if you have a corrupt record in the middle of the file that causes the maintenance tools to crash? Usually there is some way to fix it but you need access to the experts. The only issue is that Zope and CRX experts are not as easily found as Oracle, MySQL, or MSSQL experts. Tar PM seems to improve on the ZODB by switching to a new TAR file after a certain point. Longtime readers of this blog may remember my ZOracle Series (part I, II, and III) that described a project to point the Zope Object Database (ZODB) to an Oracle database.

Although both Zope and Day have customers with huge repositories, the general rule of thumb is to keep things small when you can. In the ZODB world there are extensions that store large binary files outside of the database. In the JCR world, the strategy is to segment content into smaller repositories. For example, if you have lots of publications, put each one into its own JCR instance rather that combine them all into one. Companies that pursue this type of segmentation need to have some component in the architecture that can look across repositories and maintain collections of references to show an aggregated view. At the simplest level, this can be a search engine. At a more advanced level there could be hierarchical taxonomy system with references to items in different repositories.

This strategy runs against Oracle's vision of all your company's content neatly organized in one big database. I would argue that putting everything in one place does not necessarily mean that it is well managed or easy to find. More important than how the content is physically stored is that it is cohesively organized (that is, content that belongs together is stored together) and that there are uniform ways to access it. This is the strategy of the JCR and it plays well with service oriented architecture where different applications (services) that manage their own data can be combined to support cross-application business processes. When you have everything in the same database, the tendency is to do your integration at the data level (which can be brittle and proprietary) rather than the application (or service) level. I won't deny that it is handy to have a database that can scale infinitely in size and there are applications that need very large storage (like archival systems). But trying to keep things small and segmented has its virtues as well. I am reminded of the frequently made point that storage is cheap but finding and managing the information can be very expensive.

Nov 11, 2008

CMS Selection Workshop

Last week I was a panelist in a jboye08 session called "Running a Web CMS procurement." Jarrod Gingras from CMS watch moderated the panel that also included Graham Oakes, Piero Tintori, and Søren Sigfusson. It ran for 90 minutes and, quite frankly, we barely scratched the surface. I must admit, I share the blame because I used my 10 minutes just to ask the audience a bunch of questions (see slides). Fortunately, I will have an opportunity to do justice to the topic when I present my "How to Select a Web Content Management System" workshop at the Gilbane Conference in Boston next month on Tuesday December 2nd.

Over the years of doing CMS selections, I have refined my approach to address the specific challenges that are unique to web content management. In particular:

  • The huge number of products to choose from and the lack of a clear market leader

  • The different sources of content technologies: commercial, open source, SaaS, and custom

  • The flexibility of these platforms

  • The importance of usability

  • The spanning of technical and organizational concerns

  • The different uses of web content management software

  • Web 2.0 and now Web 3.0

  • The fact that the platform winds up being a component of a larger solution that includes implementation, process, management, and support

  • And the wide range of processes that organizations use to manage their content.

I have written extensively about selection on this blog here, here, here, here, here, here, and here. Here is a chance to put it all together in one (hopefully) coherent session. If you are already registered for this workshop, please feel free to email me (seth "à" contenthere.net) with any specific things you want me to cover. If you are not already registered, you can do so so here. You can even get a free iPod Touch if you go for the "all in" Conference Plus package.

If you are going to be in town for the Gilbane Conference but will not be attending any of the workshops, you should consider going to the CM Professionals Fall Summit, which is also on Tuesday. I will be on an expert panel talking about the content lifecycle but I will be sure to save something for the workshop :)

Nov 10, 2008

Blogs, Wiki's, etc.

A couple of months ago a WCMS sales guy said to me that when hears the words "we are looking for blogs, wikis, etc." from a customer it is a clear indication that the customer really doesn't know what he is talking about or (at least) doesn't have a clear vision of goals for Web 2.0.

I too am suspicious (and a little surprised) when I hear these terms together because, other than the fact that they are relatively new to the "enterprise," blogs and wikis have little to do with each other. Bob Doyle wrote a very good article differentiating these technologies way back in 2006 (When to Wiki, When to Blog - read the article).

A blog is a publishing system and a wiki is a collaboration tool. A blog author writes articles (posts) which reflect an idea or an observation at a point of time. You don't typically update a blog entry unless you see a typo that annoys too much to ignore (like misspelling your name - as on of my recent posts). Comments provide a forum for a dialog around the topic. These comments may appear within the context of the blog site or somewhere else as in the case of friendfeed but they are a conversation around the article, not the article itself. Occasionally the blog author will highlight a comment by updating the blog with a reference but this is the exception not the rule. If the author changes his mind, he will write another post rather than update the original. To learn from blogs you read lots of posts and piece together a consistent understanding that works for you.

A wiki is a tool to collaboratively build a comprehensive informational resource. Rather than blog posts that a single author publishes to an audience, a wiki page allows a group of people to jointly define a topic, establish a policy, or create some other information resource that needs to be updated over time. Companies that use a wiki (rather than a WCMS) as their intranet have come to the conclusion that potentially anyone in the company could correct or otherwise improve the information there. If these contributions are wrong, their updates can be corrected or rolled back.

WCMS can serve both of these publishing and information management purposes. For example, a typical implementation "corporate brochure" of a CMS will publish "point-in-time" articles (e.g. press releases) and manage fixed pages (e.g. "about us"). If you need to do both with one tool (and want the option to strictly control contribution), you probably need a WCMS. If you need to do one of these things but not the other, you might be in the market for a blog or a wiki but not "blogs, wikis, etc."

Nov 06, 2008

Nov 06, 2008

World Plone Day

Tomorrow (Friday November 7th) is World Plone Day (WPD). This event is held in cities all over the world to make Plone experts available to introduce and explain the platform. If you are considering Plone, this is a very good opportunity to understand the technology and meet the people to behind it. There are 54 cities holding WPD events. You may be able to find your city (or one nearby) on the list.

Nov 04, 2008

Novelty + Urgency = Chaos

Around a year and a half ago, I coined "Gottlieb's law," which I suppose should officially be a theorem because it has not been conclusively proven. It certainly hasn't been disproved though. Because I will probably never be able to prove that "a company's success in content management is inversely proportional to the amount of information that is exchanged over email," I figured I would move on with a new theorem. Here goes:

Gottlieb's 2nd Law (or theorem)

Novelty + Urgency = Chaos
This comes from an observation that when companies put a development team (or any other kind of team) in a pressurized environment and give them new technologies and tools to work with, chaos ensues. Developers don't have time to learn the new platform so they hastily try to apply legacy practices and ideas which are not necessarily appropriate for the new technology. To make matters worse, this chaos is difficult to work out of the system even after developers have learned better. Time simply hasn't been budgeted to go back and correct the mistakes that were made. Sometimes known bad habits are even repeated just for consistencies sake.

Here is how it usually plays out in the world of web content management. A company buys a new CMS and intends to migrate 50-100 sites onto the new platform. The team takes a deliberately cautious approach of choosing the simplest example as a starting point/test case. After an easy experience building out the pilot site, the team writes out a migration plan that begins with the biggest site (who has been clamoring to be early because they are feeling real pain on the old platform). Before long this project blows up. The additional complexity of the larger site was not adequately accounted for in the project plan and the team is falling behind. Their unfamiliarity with the new API causes them to write hacks and work-arounds for features that are well supported by the platform - if only they knew how to use it. The longer the delay, the greater the pressure that squeezes out general practices like refactoring, code review, unit testing, and even communication. When a developer makes a new discovery, he will maybe apply it to new code but is rarely able to go back and fix sloppy, inexperienced code.

The first big site is delayed but that doesn't mean that the original migration plan is abandoned. A new development team is spun up to work on the next site. Of course, these guys are new to the platform. The seasoned developers are too busy with their death march to release the first site to share what they learned.
The new team either uses the first site as a model or tries to do everything the opposite way. The same things happens for the next couple of sites so every site is built in a different way and it is impossible to maintain them. Three years later, when the CMS is replaced, poor manageability is cited as a primary reason for moving off the platform. The chaotic condition of the code base and content makes the next migration harder. And then cycle continues.

How could this be avoided? It is not so easy. Usually a CMS purchase is supported by a business case that needs to be optimistic about both cost and benefit in order to get the ROI to look right. This innocent little exercise of telling a story that the management wants to hear is where the problems begin. Maybe it would have been better to explore the option of upgrading and refactoring the current platform and comparing the ROIs of the two approaches. Both migration and upgrading are complicated process that are full of surprises.

The next error is the assumption that building out sites will be like turning a crank. It may get that way eventually but the first few are as hard as building custom software. There are lots of choices and lots of places to make mistakes. Ideally, the sequence would be: (tiny) pilot site, medium site 1, rebuild medium site 1, build re-usable components and patterns, build a reference implementation based on the rebuilt site 1. After that the rest of the new sites can be built on a reference implementation that gets gradually improved as new ideas are tried out.

Whoever is holding the budget is cringing right now. Not only is this making more migration work (two+ additional sites), it is lengthening the time to benefit for the most important sites that may be funding this whole migration in the first place. It also means paying developers to do things other than coding: documenting, teaching, learning, and recoding. As expensive as this is, the costs are trivial when compared to the price of doing it wrong and feeling the need to replace the system years ahead of schedule.

Another strategy is to work with a systems integrator who has a long track record of working on the platform you selected. Of course, this only works if a) the consultants you get on the team really know the platform and b) you didn't beat them down on price so they need to cut corners and work sloppily. You really need to trust the systems integrator and partner with them to achieve joint success. These trustworthy partners don't come cheap either because they tend to be honest and conservative in their pricing. At the end of the day you may save money but the initial price tag will be discouraging when compared with others who want to tell you what you want to hear.

Too often people tend the focus on migrating the content itself during a CMS implementation. Migrating the content can be messy if it needs to be cleaned up first but there are at least some possibilities for automation (or at least hiring cheap temps to manually copy/paste). Rendering and integration code, conversely, is not at all portable (except for CSS - write as much presentation code as you can in CSS) and can be a real challenge to translate to a new platform. Porting the code is most efficient when developers can take the time to learn the new platform and logically break down the site to leverage the technology's native strengths and best practices. When under time pressure, the tendency is to revert to comfortable old ways even if they are counter-productive. And the downward spiral continues...

Oct 29, 2008

Oct 23, 2008

Variable control lists

Your typical editor is conflicted. On the one hand, he likes to have direct control over what articles are promoted on every page of the site. On the other hand, he realizes that he may not have the time to exert this control. When editors have to manually place articles in every promotional list, they risk letting the important pages (home pages and section front pages) get stale as they tend to other things. When they allow the templating logic to automatically select articles to highlight, they get frustrated by the judgement of automated filtering rules ("why is that article promoted more than this more important article?"). They want to be able to override the automated logic when they have the idea that they could do better.

Here is a classic trick to establish compromise between the extremes of manual control and automation. I have implemented this on many different web content management platforms and it should work on yours. I call it "variable control lists." The basic idea is that you allow an editor to select 0-n articles to include in a promotional list that displays n items. The template logic shows the articles that the editor selected and then fills in the rest of the positions with the results of a query. So, if an editor picks 0 articles to spotlight, the template logic selects n. If the editor selects 3, the template logic selects n-3 articles and lists them after the editors picks.

A couple of nuances should be considered. First, you want to make sure that supplemental list does not include any of the same articles that the editor selected. Second, you might want to apply some additional rules like having the template disregard the editor's picks if they are past a certain age. That really depends on the type of site you are running.

Try it out and see how it works for you. I am especially interested in learning of platforms that this would not work on.

Oct 21, 2008

Kevin Cochrane Joins Day

I was just on the Day Software site and noticed that Kevin Cochrane (who had left Interwoven to join Alfresco) joined Day as their Chief Marketing Officer. This seems like a great fit. Kevin can contribute to Day's already very strong commercial open source strategy (Day is a primary contributor to Apache JackRabbit) and will probably help Day build a U.S. presence. Day's focus on web content management is much greater than Alfresco's so Kevin will probably enjoy more influence in the Day culture. The big question is whether Kevin will bring with him some of his other Interwoven/Alfresco team. Day already has a star-studded technology team and a very strong, standards based content repository.

I am looking forward to seeing Kevin's contributions at Day.

← Previous Next → Page 35 of 75