Jan 06, 2006
Alan Pelz-Sharp has a nice article on CMS Watch about trends in the ECM market. In this article, Alan (I think rightly) points out that ultimately the large infrastructure plays (Microsoft, EMC, IBM) are going to gobble up the space "managing unstructured data just the same as they currently manage structured data." The article goes on to say that acquisition may not be a bad thing because it may lead to more investment into the platform and better support. That is a likely scenario unless one of these giants acquires multiple platforms and lets all but their favorite wither away (for a while this is what was happening at divine when it had both Content Server and Participant Server).
One of the most insightful observations in the article is that the impending roll-up may lead the large ECM players, in anticipation of being acquired one day, to ignore the small and medium enterprise because these clients will not mean anything to the potential acquirers. Rather than risk being underserved by the large ECM vendors, Pelz-Sharp suggests Small and Medium Enterprises to consider Best of Breed products, including Open Source, that can "solve pressing problems in a simpler fashion."
Good point. I hadn't thought of that.
Jan 04, 2006
Last night BostonPHP hosted an evening with Mitch Pirtle of Joomla! fame at our our Boston office. Mitch is a great speaker and his passion for Joomla! and open source really came through in his presentation. While most of the talk turned into an introduction to Mambo, Mitch did weave in some good background about Joomla!'s break from Mambo and where Joomla! is going. He promises to come back to get more into details about technology behind Joomla!
First, some details about the split. Mitch related his experience of sitting in a conference booth representing Mambo and then seeing an open letter announcing Miro's formation of the Mambo foundation without involving the core development team. Just at that moment, he saw Eben Moglen from the Software Freedom Law Center coming up the escalator. Eben and the SFLC were critical in guiding the Joomla team through a process littered with potential land mines. Joomla! also received support from VA Software who donated hardware, software (SourceForge Enterprise Edition) and hosting services for the new Joomla Forge. Rochen donated hosting for www.joomla.org and, before long, Joomla! and Open Source Matters, the holding organization for Joomla!, came were born.
According to Mitch, nearly all of the core development team and much of the community, as well as many third party component developers have shifted over to the Joomla! side. Today the Joomla! project is thriving. The forums already have over 160,000 posts and are growing at a 1,100 per day pace. There are already 11,000 registered developers and 700 projects on Joomla Forge (just like the big SourceForge, many have not been started yet). One interesting project that is going on is to put a Joomla! front end on SourceForge using the new Web Services API. VA is helping with the initiative.
Packt publishing, who sells Building Websites With Mambo, is planning on publishing a similar Joomla! book. I have not recently talked to the Mambo team, which has reloaded their core team with new developers, but it does seem like Joomla! has the momentum of the two projects.
Mitch talked a little about the new 1.1 release (due out soon). The key advancement of 1.1 will be full UTF-8 support. This feature trumped a bunch of other items on the roadmap because of the urgent need for supporting an extended character set. While the team was working in the core, they couldn't resist doing some deep refactoring and modernization of the code. Thanks to better code design, the new version of Joomla! is expected to be faster. Joomla! 1.1 also introduces the first steps of a database abstraction layer which will make it easier to run Joomla! on databases other than MySQL (Postgres will be supported soon with commercial databases like Oracle and SQLServer soon after). Templates will use the templating engine patTemplate rather than simple PHP. Great, another tagging syntax to learn! 1.1 also brings a more sophisticated error handler.
People hoping for user facing features like fine grained security, workflow, and versioning will have to wait for version 1.2 which looks like it is going to be huge. Interestingly, a lot of this code has already been written but Joomla! is being conservative about how much new stuff to release at a time to reduce the pain of upgrades. Support for PHP 5 will have to wait for Joomla! 2.0, a total rewrite which will probably be based on the Zend Framework to which Mitch's company JamboWorks is contributing a security module.
I asked if separating from Mambo gave the project more freedom to extend and modernize the application and Mitch did say that backwards compatibility with earlier versions of Mambo was a constraint that they are happy to be released of. I don't know if the team would have done the level of refactoring that they did if they were worried about a migration path. Also, I am sure that the feeling created by starting something new energized the team.
During the talk, there were some references to some useful Joomla! resources:
-
The API site was unveiled. This site uses the PHP library tool phpDocumentor to automatically generate documentation based on comments (like Javadoc). The comments are a little thin right now but now that the API site is up, there will be more incentive to write good comments.
-
The JamboWorks Template Club is a membership based service which gives access to a collection of pre-made templates. There are currently 12 templates on the site and a new template will be added every month. There were some great examples in the demo. $75 per year gives you access to the whole collection.
-
Boston PHP has just published a release candidate of josCommerce which ports the popular mosCommerce eCommerce component to Joomla.
Dec 28, 2005
Database Adaptors and SQL Methods
If you have ever evaluated Zope you have probably heard that, in addition to using the included object database (the ZODB), Zope can access most SQL compliant relational databases. The typical framework to do this is Data Adaptors and ZSQLMethods. Data Adaptors, commonly called DAs, are plug in products that give you a connection to a particular database. For example, there is a popular DA for MySQL called ZMySQLDA. Once you have a DA, you can use the Zope Management Interface (ZMI) to create a ZSQL Connection to connect to the database and ZSQL Methods to execute queries against that database. The whole process is pretty well documented here. What you get when you go this route is a published Zope object that gives you a result set of data based on a query, or a handle on a query that can update the database (using SQL update, delete, or insert commands). ZSQL Methods are the most useful if you want to create a page in Zope (written in either ZPT or DTML) and you want to display information stored in a relational database. Unfortunately, there are not a lot of DA options for Oracle. The main open source one is DCOracle2 which is no longer active although there are still many people using it. My experience with compiling DCOracle2 for Oracle 10g looked like this. Even after that, it was still flaky. The best instructions I found are here but before you start with DCOracle2, look at the SQL Relay's DA. If you have money to spend, you can also look at eGenix mxODBC which talks to Oracle over ODBC and is not too bad at $120 per server.... Unless, however you are running on *NIX. In that case you need to buy an Oracle ODBC driver which may cost $1,599.00 per server (Windows Oracle ODBC drivers are free). The mxODBC DA is the most configurable DA that I have seen.
However, if you want to do more heavy lifting with a relational database, this framework is a little weak because you probably don't want to do manage all your SQL logic in the ZMI, especially if you want to access the data from Python based classes sitting on a file system (deployment gets difficult here). So the next level in working with relational data is to use a DA and a regular SQL method in your python code. That might look a little like this (note: all code samples are meant to be illustrative, I have left out important bits that are required for the code to run):
def __init__(self,context, map):
self._context = context
#.....
def selectProperties(self, pid):
setattr(self, '_selectProperties',
SQL('_selectProperties', '', CONNECTION_NAME, 'propertyId',
'SELECT %s FROM %s WHERE SCHEMA_ID = ' %
self._fieldList(map),RDBDAO.TABLE_NAME[self._context.id])) )
method = self._selectProperties.__of__(self._context)
return method(propertyId=pid)
def setLocals(self):
try:
results = self.selectProperties(self._context.propertyId)
columns = results.names()
if len(results) < 1:
raise Exception, "Can't find my database row."
for record in results:
for column in columns:
if column.lower() in self._propertyTypes:
if record[column] is not None:
self._rdata[column.lower()] = self._conversion.toZope(record[column],self._propertyTypes[column.lower()])
else:
zLOG.LOG('RDBDAO', DEBUG, "no value for ", column.lower())
In the selectProperties method we create a new method called _selectProperties which is a SQL Method. Don't get too caught up in the syntax. The only thing I would call out is the use of "<dtml-sqlvar>" which does things like apply proper quoting and escape bad characters when the type is string. the CONNECTION_NAME variable is actually just a string which matches the name of the ZSQL Connection that you set up to point to the database that you want to talk to. If you put the ZQL Connection right at the root folder, your object will have a reference to it through Acquisition. However, this doesn't happen if the object does not exist within the acquisition hierarchy or is so brand new that the acquisition context has not yet been set. So we wrap the _selectProperties method in another method called selectProperties which just calls the private _selectProperties method using the context of the calling class which was passed in the __init__ method - hence the syntax __of__(self._context). Then the setLocals method runs the query and puts the results in a local dictionary. the _conversion object contains methods to do data conversion like handling dates.
Notice how there is no syntax for opening and closing a connection. That is all handled in the background by the ZSQL Connection. This example does not do a database update. If it did, you might see a SQL method that issued a query of "commit." Depending on the implementation of the DA, connection pooling and other configuration tends to be extremely simple. There are very few parameters to adjust and that gives you little control over how your application manages connections. Still, this method of data access goes pretty far - as long as the DA behaves reliably. When this framework starts to fall down is when you start working with really long strings such as CLOBs (Character Large Object). The problem arises because SQL methods only accept simple SQL statements. The update code would look like this:
def updateProperties(self, propertyId, args):
setattr(self, '_updateProperties',
SQL('_updateProperties', '', CONNECTION_NAME, 'propertyId ' + ' '.join([k for k in args] ),
self.sqlUpdateString(args)) )
method = self._updateProperties.__of__(self._context)
return method(schemaId=propertyId, **args)
def sqlUpdateString(self, map):
"""This method creates an update statement based on the property map.
There may be more properties than we will be setting when we execute this statement
but that is taken care of by the optional argument on the dtml-sqlvar tag.
"""
sqlString = "UPDATE " + RDBDAO.TABLE_NAME[self._context.id]
sqlString += " SET " + ", ".join(["%s = %s" % (k.upper(), self._conversion.setValue(k,DBTYPE,self._propertyTypes[k])) for k in map])
sqlString += ", MODIFIED_DATE_TIME=%s" % (DBTYPE=='MySQL' and 'CURRENT_TIMESTAMP()' or 'SYSTIMESTAMP',)
sqlString += ", MODIFY_INSTANCE='%s:%s'" % (os.getenv('HOSTNAME'), INSTANCE_HOME)
sqlString += " WHERE PROPERTY_ID = "
return sqlString
Here the _conversion class puts in the appropriate <dtml-sqlvar> syntax as well as
do some additional data conversions. In Oracle, a SQL statement can only be up to a certain number of characters - I can't remember how many, but if you try to issue a query like "UPDATE tableA SET field A = '[some multiple thousand character string such as the body of an article]'", you will get an error. MySQL tends to be a little more forgiving but there are limits. In order to set large string values, you need to use "bound variables" so that the update query gets assembled on the database side. DA's and SQL Methods don't do this very well (actually they don't do it at all).
SQL Relay
To get around this limitation, we used SQL Relay which allowed our Python DAO to talk directly to the database without going through the Zope DA pathway. Although SQL Relay also has a DA, we didn't use it because it does not support bound variables. But the Python client libraries, which we used in our code, do support binding. SQL Relay consists of several components:
-
A set of connection daemons which hold the connection to the database open.
-
A listener which is a deamon that runs and listens on a specified port and forward requests to a connection
-
A client that can talk to the listener
-
A cache manager daemon that maintains the query cache and removes stale result sets

You can put the listener on the server with the database or the server with the application server. Our design had everything but the clients sitting on the relational database server. The system is also extremely configurable through editing various XML files.
So now, with SQL Relay, our update code looks a little like this:
def __init__(self,context,map,connection):
"""The connection object is passed in from DBTransactionManager. The syntax looks like this:
self._connection=PySQLRClient.sqlrconnection(_CON_INFO['host'],_CON_INFO['port'],'',_CON_INFO['user'],_CON_INFO['password'],0,1)
The user name and password used here are not the database username and password. They are username and passwords that are set up in SQL Relay for clients to use. See SQL Relay configuration documentation for more
information.
"""
self._connection=connection
#.....
def sqlUpdateString(self,datamap):
"""This method creates an update statement based on the property map.
There may be more properties than we will be setting when we execute this statement
but that is taken care of by the optional argument on the dtml-sqlvar tag.
self,dbtype,value,type=''
"""
sqlString = "UPDATE " + RDBDAO.TABLE_NAME[self._context.id] + " SET "
params = []
for k in datamap.keys():
params.append(" %s=%s" % (
k.upper(),
self._conversion.toQueryTemplate(
k,
self._propertyTypes[k]
)
)
)
sqlString += ",".join(params)
sqlString += " WHERE PROPERTY_ID = %s" % self._context.propertyId
return sqlString
def persist(self):
query = self.sqlUpdateString(datamap)
cur=PySQLRClient.sqlrcursor(self._connection)
#.....
try:
cur.prepareQuery(query)
for k in datamap:
if k.upper() == 'BODY':
body = self._conversion.setValue(
DBTYPE,self._rdata[k],self._propertyTypes[k]
)
cur.inputBindClob( k.upper(), body, len(body))
else:
cur.inputBind(
k.upper(),
self._conversion.setValue(
DBTYPE,self._rdata[k],self._propertyTypes[k]
)
)
cur.executeQuery()
if cur.affectedRows() > 0:
zLOG.LOG('SQLRDAO', DEBUG,
"Update %d rows " % cur.affectedRows())
else:
zLOG.LOG('SQLRDAO', ERROR,
"Database returned error: %s " % cur.errorMessage())
raise DBError, cur.errorMessage()
#.......
With this setup, we were able to have a robust DAO that talks to an Oracle database and can handle all sorts of data types including CLOBs and BLOBs. Also, we have a configurable database connection framework that can be used to interface wiht several different databases. If I have an opportunity to work on another Zope project that needed relational database connectivity, SQL Relay (either using the DA or the client libraries) will be the first option that I try.
Dec 28, 2005
In my last post, ZOracle Part I, I described the requirements and some background on a recent project to rewire an existing Zope CMF-based CMS to use an Oracle based relational repository.
As I hinted earlier, the key to the solution was an aspect of the existing architecture that placed all of a content asset’s attributes in a structure based on a Zope's PropertySheet. Each object had multiple property sheets that represented groups of attributes in much the same way Alfresco uses Aspects to allow simple objects to be extended with additional data or capabilities. PropertySheets are normally backed by a Python dictionary, and inherit from Persistent so they get stored in the ZODB when associated with other ZODB persisted objects. What we did was create a new kind of property sheet, called RDBProperties, that was backed by a Data Access Object rather than a ZODB persisted dictionary. The DAO encapsulated all the code to read and write from the database. This enabled us to experiment with various database connectivity strategies and do comparative testing (See Part III: Connecting to an Oracle database).
We decided to keep things simple by creating a new database table for every different PropertySheet definition and having a corresponding column for each property. This normalized design was desirable because it made things easier for external applications hoping to make sense of the data. So, if there are 5 asset classes, and 3 possible property sheets to use, there would be three backing tables. We decided to have Oracle manage the primary keys for property sheets with sequences. We stored the unique property sheet ID on the Zope side for retrieval and also captured the Zope-derived object ID in the database so we could reconstruct objects outside of Zope. Other than making sure that we got our data types right and finding a reliable way to talk to the database (see next article), this part of the prototype was pretty straightforward.
The only problem that remained to solve was when write back to the database. Zope is somewhat elusive in this regard. By design, the programmer doesn’t really know when persistent objects are being written to the database. It just kind of happens in the background. We needed to make sure that we kept the relational database up to date, but we also didn’t want to write to the database too often and create a new performance problem. To do this, we extended Zope’s transaction manager, TM (Shared.DC.ZRDB.TM.TM), with a new derived class called RDBTransactionManager. This gave us hooks to execute logic at the beginning and end of the transaction (in Zope, a transaction is defined as what happens from the beginning of the HTTP request to when the response is sent. This is different from a database transaction), and also when a transaction is aborted. A new DBTransactionManager is created the first time a DAO is requested and then re-used in each subsequent DAO instantiation within the Zope transaction. Our DBTransactionManager also had a collection (dictionary) of Property DAO’s that were used in the current Zope transaction, so within a transaction, we could cache values and then wait until the end of the transaction to write back to the database (or not, in the case of an abort). At the end of the transaction, the DBTransactionManager iterated through its list of DAOs and called their persist method, and then at the end of it all, called a global commit method which committed the database transaction.

In this design, Zope (actually ZEO) still manages concurrency because, as far as it knows, it still owns the objects. In our tests, Zope still complained when two transactions were trying to update the same object at the same time. The design also worked with multiple ZEO clients talking to the same ZEO database and Oracle database. Search and general object maintenance was also still managed within Zope. Whenever a content asset is stored in the repository, it is indexed with portal_catalog. Also, all of the other functionality of the CMS operated normally - essentially unaware that anything unconventional was happening underneath. Only some rigorous testing can answer the question of whether this improve the performance of the application. It will certainly reduce the size of the ZODB and also reduce the frequency that the ZODB thinks it needs to write data (Zope considers the data contained in the DAO “volatile" and therefore unworthy of persistence). But we did meet the requiremed of data being accessible to any technology capable of accessing an Oracle database.
Next: Connecting Zope to an Oracle database
Dec 27, 2005
Optaros recently finished a project to build a prototype that adapted an elaborate Zope CMF-based custom CMS to persist content to an Oracle database rather than the ZODB. The reason for doing this was that the ZODB was not performing adequately under the heavy load that the CMS was subject to and was not open to non-Zope technologies that our client wanted to share data with at the database layer. The next set of blog posts will talk about the problem, various solutions, and what we did. These posts are slightly more technical than other posts on this blog and I won’t be insulted if some of the more management types just skim through them ;)
The problem
The system that we were working with has a very large repository (45 GB of text - images and other binary files are stored outside of the ZODB) that is continually being written to (tens of thousands of new objects a day). They use FileStorage, rather than DirectoryStorage, because there are so many objects in the ZODB that the operating system would run out of inodes. Because the database is so big and gets bombarded by so many write requests (the ZODB is effectively single threaded and is optimized for reading rather than writing), the system’s performance is just barely acceptable. There is also a risk of data corruption which would lead to extensive down time which would be disastrous for this mission critical application.
In a Zope CMF based application, everything is stored in the ZODB (except, in this case, binary files which are stored directly on the file system). This includes objects themselves, version information, history, the search indexes (called portal_catalog), and, to some extent, code. While the maintaining the search index represents a significant amount of overhead in this application, the primary target for removing from the ZODB was the actual content objects themselves because there was a desire to expose the content within the repository (read only) to non Python applications. Oracle as a repository was particularly desirable because the client owns a site-license for Oracle and wants to leverage Oracle’s capabilities of administration, tuning, and back-up and recovery.
The system already uses ZEO but technologies that would relieve pressure on the storage tier, such as Zope Replication Services, were tried and failed because of the write-intensity of the application. The right solution would improve performance and store content in fielded data (as in relational tables) rather than the ZODB. Also critical, the solution needed to go in smoothly with as little disruption as possible to the sophisticated and complex application sitting on top. None of the existing solutions seemed to have much promise.
-
OracleStorage was ruled out because, in addition to being somewhat stagnant over the passed few years, it fails on the requirement of being open to non-Zope technologies. OracleStorage stores Zope objects in Python Pickles which are serialized Python objects (equivalent to Serializable in Java. Non-Python applications would have a difficult time reading pickles.
-
The newer project APE, an Object-Relational Mapping layer for Zope (like Hibernate in the Java world), looked like a viable option but earlier prototypes using APE suffered from performance. There was also concern about how the underlying caching mechanism would behave under load. The ultimate breaker was that the documentation on configuring APE was pretty thin.
-
Another solution, which may still be used as a fall-back, was to have a nightly script that iterates through the ZODB and writes to an Oracle schema. This would solve the problem of having the content available in a relational database, but it would not solve any performance issues and would not a safeguard against corruption. This option could be selected in conjunction with Oracle storage if OracleStorage was more actively maintained.
The solution that we wound up going with took advantage of a particular design characteristic of the system: that all of a content asset's attributes were actually stored outside of the asset in a class derived from Zope’s PropertySheet.
Next: an overview of the solution we went with.
Dec 19, 2005
Optaros just released a new white paper that looks at how companies are using open source software. The study is based on a survey of 512 U.S. companies and government organizations. The report observes that companies are starting to use open source for more than infrastructure and browsers. 42% of respondents said they were using open source portals and/or content management systems somewhere within their organization. 16% were using open source in customer relationship management. Read the press release here or the whole report here.
Dec 19, 2005
Plone4Artists and New Zealand's "Government Web Guidelines compliant Content Management System" are two examples of a group with specific needs putting together a "stack" of Plone technologies, with some customizations and making it available to a wider community. This is similar to the concept of "Distros" in Linux, which have been very effective in spreading the use of Linux. If this turns into a powerful trend, success will depend on the degree to which the Plone distros keep in sync with the core Plone community. Failure to do so may lead to fragmentation and compatibility issues.
Dec 19, 2005
iECM is a new standard being developed through AIIM - the organization that developed the first definition of ECM. The goal is to define a standard or set of standards to allow different Enterprise Content Management software to be able to work together in a heterogeneous environment. As you would expect, if you have been reading this blog, I am very hopeful for this standard. Such a standard would move ECM from a "one CMS to rule all" vision to a more practical distributed environment where different system are connected.
Based on this, you can understand the concern that my experience with the registration process raised. After looking at the iECM Blog and seeing nothing, I figured I should register to the mailing list to see what is going on. And that is where the trouble began.
It turns out that, in order to register, you need to fill out a PDF form. I run Fedora Core 4 and the default PDF viewer is Evince. In Evince, it looked like you were supposed to print out the form, fill it out in ink, then send it in. Not seeing a mail address, or a place to sign (why else would they require a mail form?), I knew something was up. I figured it was an active PDF form after working with them on a project for a big insurance company that needed this technology for some sort of compliance.
So I downloaded and installed the Adobe Acrobat Viewer. Using the Adobe Viewer, I saw a submit button! But my hassle was not over. After I filled out the form and hit submit, I got a pop-up asking me to select my mail client. Thunderbird was not on the list. I had to save a file locally and then attach it to an email to the address on the pop-up (copying and pasting the address and subject was not enabled).
This is not my idea of interoperability. Interoperability would be to use a standard HTML web form, not a proprietary file format and a proprietary viewer. If it had to be an Adobe technology, they could have used Macromedia's ColdFusion or JRun and Java. This also hints that this group is really oriented to document management and not web content management. I am still hopeful that this standard can bear some fruit and become meaningful in the content management industry, but this experience was discouraging
Dec 15, 2005
[Note (2/27/2008): Alfresco has improved its web content management capabilities quite a bit since I wrote this post. For a more up to date assessment, check out my more recent review of Alfresco that covers version 2.2]
Alfresco Software frequently describes Alfresco as the first open source Enterprise Content Management (ECM) System. To the extent that they are the first well funded open source project to aggressively invade the territory of industry incumbents like Documentum and FileNet, that is true. However, they are not the first open source project with document management capabilities. While there are a couple of other open source CMS that are designed to do document management (Contineo and Xinco), that part of the open source CMS landscape has been dominated by Plone. What follows is a short analysis of the relative positioning of the two projects.
I recently went to training for Alfresco and really like the software. I was amazed by what the team has done in such a short period of time. The Alfresco team had benefit of being able to rapidly "assemble" their application using best of breed open source components. Equally important, the lessons learned at Documentum (the development team is largely composed from Documentum alum. In fact, A lead Alfresco developer was employee of the year at his last year at Documentum) seem to have been applied to the design of Alfresco.
A particularly compelling aspect of Alfresco is the openness of the architecture. Alfresco supports Microsoft's CIFS (Common Internet File System) protocol that allows you to mount the repository, or a sub-folder of the repository, as a Microsoft Windows Network File Share. Doing so makes it possible for users to unconsciously interact with the CMS by working in their natural ways. Content rules, that are triggered when files are moved in and out of folders, can execute functions like add metadata, start workflow, or send emails behind the scenes. Making the CMS "invisible" like this is a very good way to help ensure adoption. Alfresco also supports WebDAV, is JSR 170 level 1 compliant (level 1 is the watered down, read-only version of the spec which is still useful in integration with other applications), and has a Web Services interface. Support of these standards makes Alfresco very attractive in distributed, heterogeneous architectures which is where I think content management is going. It is a nice departure from the mainstream ECM vision of centralization.
Inside, the application is highly configurable. One interesting feature is the application of the concept of "aspects." If you are a Java programmer, you probably have heard the buzz of Aspect Oriented Programming (AOP). The general idea is that an "Aspect" is a general set of attributes or capabilities that can be assigned to an object without relying on inheritance through the class hierarchy. For some reason, it was easier for me to get my head around applying aspects to content assets than it was for me to figure out AOP. In Alfresco, there are "aspects" like "versionable" or "categorized." This concepts allow content types to be very simple and, if they desire, users can add attributes to a single instance of a content asset. Defining content types and aspects, along with almost all configuration is done by editing XML configuration files. I think it was smart not to build a sophisticated configuration user interface at this stage of the project. The people that you want to make these customizations should be comfortable editing XML files.
Based on their experience of most clients wanting very basic workflow, Alfresco's workflow model is very simplistic. Workflows are designed using folders to represent states and then using rules to add simple approve/reject choices that can trigger other events. This system would be a little awkward for implementing complex workflows that involve splits and merges and syncronization with other approvals. Alfresco plans to add BPEL support will address this in later releases.
As mentioned earlier, versioning has been implemented as an aspect so that any content type may be versioned. One small quirk with versioning is that, when you use the CIFS interface, and you editing the file directly, a new version is created, which creates a full copy of the file on the file system, with every save. This would pose a problem of consuming a lot of disk space if you were editing a large video file and you want to save frequently to prevent data loss.
Alfresco handles all types of file types but support for Microsoft Office and PDF formats is the strongest. Using OpenOffice components, Alfresco is able to extract text for the full text search index (powered by Lucene) and transform into PDF format. The system is architected to be extended with new "transformers" that can handle other conversions. I have already talked to clients that would want to extend Alfresco in this way.
So, is Alfresco the perfect open source ECM? Not quite. At least not yet. First of all, Alfresco is not all open source. Features like group based access control and clustering are actually "Shared Source" and require monthly subsription fees to use. Without these features, it would be difficult to roll Alfresco out to a large group of users. So you could say that the "E" part of the "ECM" is not open source. The second issue is that, at this point, Alfresco does not handle web content, another critical part of the classic ECM definition. This was intentional. The Alfresco team wanted to start with a solid foundation and then grow into other aspects of content management. The demand for affordable document management solutions, and the scarcity of open source projects that do it, make this a wise choice. I actually don't mind the absence of WCM functionality. CMS that try to do too much are often difficult to use and it would be better to get it right than throw it in sloppily. Alfresco says it is going to add WCM later and that team that they have assembled to do it understands WCM. I hope, from their experience at Documentum, they learned the lesson that WCM is not just a matter of managing another type of files.
So how does Alfresco stack up to Plone? There is a large degree of functional overlap between Plone and Alfresco. They both have the functionality necessary for groups of users to manage and share documents: access control, search, metadata, etc. Plone also supports WebDAV and has a mechanism where files automatically updated on the server when edited with a client application such as Microsoft Word. But it does not support CIFS. Alfresco has the advantage of a content rules framework which Plone is missing because of its lack of an event model. Alfresco has a better content versioning system. The many companies who have standardized on Java will feel more comfortable working with a Java solution (although their standard application servers may not run Java 1.5, which is required by Alfresco - WebSphere does not). Also, Alfresco, with its open architecture, has more options for integration than Zope based applications.
There are several areas where Plone has a significant edge. The most notable of which is handling web content. Plone is an effective and eleg
ant hybrid of a document management system and a web content management system. Plone's workflow model is more robust than Alfresco's. The other significant advantage that Plone has is its maturity which has lead to a broad install base, excellent documentation (including several professionally published books), and an extensive library of add-on extensions which provide capabilities ranging from a blog to eCommerce.
Based on all this, I think both applications have their uses. I would use Alfresco for a targeted document management solution that would fit into a larger enterprise content management architecture - perhaps as a node, or collection of nodes, in a deployment like the one described in this presentation which Travis Wissinks gave at the KMWorld & Intranets conference last month. I would use Plone to build an all-in-one intranet or extranet where I wanted to mix article, page, and file content and opportunistically deploy new features to improve collaboration and retention. I would also use Plone as a department-level knowledge management system because of features like threaded discussions around content assets, event calendar, and native RSS support.
Dec 15, 2005
I have changed my settings so that the full text is now available through RSS. Enjoy!