Thursday, December 15, 2005

Alfresco and Plone

[Note (2/27/2008): Alfresco has improved its web content management capabilities quite a bit since I wrote this post. For a more up to date assessment, check out my more recent review of Alfresco that covers version 2.2]

Alfresco Software frequently describes Alfresco as the first open source Enterprise Content Management (ECM) System. To the extent that they are the first well funded open source project to aggressively invade the territory of industry incumbents like Documentum and FileNet, that is true. However, they are not the first open source project with document management capabilities. While there are a couple of other open source CMS that are designed to do document management (Contineo and Xinco), that part of the open source CMS landscape has been dominated by Plone. What follows is a short analysis of the relative positioning of the two projects.

I recently went to training for Alfresco and really like the software. I was amazed by what the team has done in such a short period of time. The Alfresco team had benefit of being able to rapidly "assemble" their application using best of breed open source components. Equally important, the lessons learned at Documentum (the development team is largely composed from Documentum alum. In fact, A lead Alfresco developer was employee of the year at his last year at Documentum) seem to have been applied to the design of Alfresco.

A particularly compelling aspect of Alfresco is the openness of the architecture. Alfresco supports Microsoft's CIFS (Common Internet File System) protocol that allows you to mount the repository, or a sub-folder of the repository, as a Microsoft Windows Network File Share. Doing so makes it possible for users to unconsciously interact with the CMS by working in their natural ways. Content rules, that are triggered when files are moved in and out of folders, can execute functions like add metadata, start workflow, or send emails behind the scenes. Making the CMS "invisible" like this is a very good way to help ensure adoption. Alfresco also supports WebDAV, is JSR 170 level 1 compliant (level 1 is the watered down, read-only version of the spec which is still useful in integration with other applications), and has a Web Services interface. Support of these standards makes Alfresco very attractive in distributed, heterogeneous architectures which is where I think content management is going. It is a nice departure from the mainstream ECM vision of centralization.

Inside, the application is highly configurable. One interesting feature is the application of the concept of "aspects." If you are a Java programmer, you probably have heard the buzz of Aspect Oriented Programming (AOP). The general idea is that an "Aspect" is a general set of attributes or capabilities that can be assigned to an object without relying on inheritance through the class hierarchy. For some reason, it was easier for me to get my head around applying aspects to content assets than it was for me to figure out AOP. In Alfresco, there are "aspects" like "versionable" or "categorized." This concepts allow content types to be very simple and, if they desire, users can add attributes to a single instance of a content asset. Defining content types and aspects, along with almost all configuration is done by editing XML configuration files. I think it was smart not to build a sophisticated configuration user interface at this stage of the project. The people that you want to make these customizations should be comfortable editing XML files.

Based on their experience of most clients wanting very basic workflow, Alfresco's workflow model is very simplistic. Workflows are designed using folders to represent states and then using rules to add simple approve/reject choices that can trigger other events. This system would be a little awkward for implementing complex workflows that involve splits and merges and syncronization with other approvals. Alfresco plans to add BPEL support will address this in later releases.

As mentioned earlier, versioning has been implemented as an aspect so that any content type may be versioned. One small quirk with versioning is that, when you use the CIFS interface, and you editing the file directly, a new version is created, which creates a full copy of the file on the file system, with every save. This would pose a problem of consuming a lot of disk space if you were editing a large video file and you want to save frequently to prevent data loss.

Alfresco handles all types of file types but support for Microsoft Office and PDF formats is the strongest. Using OpenOffice components, Alfresco is able to extract text for the full text search index (powered by Lucene) and transform into PDF format. The system is architected to be extended with new "transformers" that can handle other conversions. I have already talked to clients that would want to extend Alfresco in this way.

So, is Alfresco the perfect open source ECM? Not quite. At least not yet. First of all, Alfresco is not all open source. Features like group based access control and clustering are actually "Shared Source" and require monthly subsription fees to use. Without these features, it would be difficult to roll Alfresco out to a large group of users. So you could say that the "E" part of the "ECM" is not open source. The second issue is that, at this point, Alfresco does not handle web content, another critical part of the classic ECM definition. This was intentional. The Alfresco team wanted to start with a solid foundation and then grow into other aspects of content management. The demand for affordable document management solutions, and the scarcity of open source projects that do it, make this a wise choice. I actually don't mind the absence of WCM functionality. CMS that try to do too much are often difficult to use and it would be better to get it right than throw it in sloppily. Alfresco says it is going to add WCM later and that team that they have assembled to do it understands WCM. I hope, from their experience at Documentum, they learned the lesson that WCM is not just a matter of managing another type of files.

So how does Alfresco stack up to Plone? There is a large degree of functional overlap between Plone and Alfresco. They both have the functionality necessary for groups of users to manage and share documents: access control, search, metadata, etc. Plone also supports WebDAV and has a mechanism where files automatically updated on the server when edited with a client application such as Microsoft Word. But it does not support CIFS. Alfresco has the advantage of a content rules framework which Plone is missing because of its lack of an event model. Alfresco has a better content versioning system. The many companies who have standardized on Java will feel more comfortable working with a Java solution (although their standard application servers may not run Java 1.5, which is required by Alfresco - WebSphere does not). Also, Alfresco, with its open architecture, has more options for integration than Zope based applications.

There are several areas where Plone has a significant edge. The most notable of which is handling web content. Plone is an effective and eleg
ant hybrid of a document management system and a web content management system. Plone's workflow model is more robust than Alfresco's. The other significant advantage that Plone has is its maturity which has lead to a broad install base, excellent documentation (including several professionally published books), and an extensive library of add-on extensions which provide capabilities ranging from a blog to eCommerce.

Based on all this, I think both applications have their uses. I would use Alfresco for a targeted document management solution that would fit into a larger enterprise content management architecture - perhaps as a node, or collection of nodes, in a deployment like the one described in this presentation which Travis Wissinks gave at the KMWorld & Intranets conference last month. I would use Plone to build an all-in-one intranet or extranet where I wanted to mix article, page, and file content and opportunistically deploy new features to improve collaboration and retention. I would also use Plone as a department-level knowledge management system because of features like threaded discussions around content assets, event calendar, and native RSS support.