Jan 22, 2006
CM Professionals recently held elections to replace two outgoing members of the Board of Directors (Ann Rockley and Frank Gilbane). The two new board members are Scott Abel and Mary Laplante. Both Scott and Mary have a long history of contributing to CM Professionals and I am looking forward to working with them as board members. I also want to recognize Ann and Frank for their great leadership and dedictation to the organization.
The CM Professional's management committee also got some new blood in this year's election. Janus Boye is our new Director of Member Relations and Mollye Barrett is the new Director of Communications. Welcome Janus and Mollye!
Jan 20, 2006
The U.S. Department of Homeland Security has recently pledged one million dollars (read in a Dr. Evil voice) to fix security bugs in open source projects like Linux, Apache and Mozilla. Stanford University and Symantec are going to do the work. On the one hand, I think that is a nice (but token) gesture of support for open source as a national (dare I say planetary?) asset. On the other hand, from a security standpoint, I would say that open source already has a distinct advantage over proprietary software because there are more people looking at the code and its flaws ("Given enough eyeballs, all bugs are shallow"). For example, I would not vote on an electronic voting system whose source code was not exposed to public scrutiny. So why single open source software out? I wonder what the government can do to make proprietary software more reliable and secure because, if you look at the security alerts, that is where the majority of problems seem to be. on the other hand, I am not sure that I want the government to have any influence over code that I cannot see in light of the recent trends regarding privacy.
Jan 18, 2006
Joe Lamantia recently posted two excellent articles on how "Enterprise Software" is losing touch with the real business problems and being displaced by more agile targeted technologies. You can read the posts here and here. I could not have made these points better myself (although if you have been reading this blog for while, you know that I have tried).
Jan 18, 2006
Someone recently pointed out to me that my blog layout does not display correctly on MSIE6 and I am embarrassed to say that I didn't check the other browsers when I applied the new style. I should have because 29% of my last 100 visits have been by people using IE6. In Steve Zimmerman's defense, the problem is not really with the template. The non-wrapping code samples in my "ZOracle" posts make the right column shift down to the bottom where it is safe. The Gecko engine, used by Mozilla and Firefox, allows the code samples to run over the right column. For now, I think I am going to leave it. MSIE users will have to scroll down to the bottom for the navigation until the ZOracle posts fall out of scope. Untill then, have I ever told you about a great little browser called Firefox?
Jan 17, 2006
I just read this post by Stefano Mazzocchi that discusses the difficulty of merging metadata.
One thing we figured out a while ago is that merging two (or more) datasets with high quality metadata results in a new dataset with much lower quality metadata. The "measure" of this quality is just subjective and perceptual, but it's a constant thing: everytime we showed this to people that cared about the data more than the software we were writing, they could not understand why we were so excited about such a system, where clearly the data was so much poorer than what they were expecting.
Stefano speaks mainly from a Semantic Web perspective but his observations are very relevant to content management and aggregating content from multiple sources. Right now the general business world is very far behind the community in which Stefano works (librarians, which you could say are metadata professionals). Our users struggle to invest any time author good metadata. But once we finally get them to truly focus on the metadata (or automate them out of the process), hopefully library science and the semantic web will have solved the issues and nuances of when you have good metadata and are ready to really use it.
Jan 12, 2006
Karim Lakhani wrote an interesting article about the Mozilla Corporation, a wholly owned, "taxable," subsidiary of the Mozilla Foundation. Karim is serving on the advisory committee to help Mozilla how to figure it all out. The article describes how Mozilla Corporation has an opportunity to become a new kind of software company focused on "social responsibility, profitability and community purpose." Of course, corporate buyers will focus on the price and quality of the product, but the social/community angle will help foster good will and contribution from the development community. Who knows? Maybe the Mozilla Corporation will become the Ben and Jerry's of software.
Jan 11, 2006
Eric Shea just passed on this article describing how Zope3 is dumpinjg the old ZServer and go with The Twisted Framework for its web server.
For those of you unfamiliar with the Zope platform, ZServer is ancient, slow and not too secure. The security part is not so much of a problem because any serious Zope 2.x installation is going to sit behind an Apache web server. Twisted, on the other hand is a high performance framework for building Python applications. We did a prototype based on Twisted as part of a proposal for a very high traffic web service and our experience with it has been very positive. You may also notice the last name of one of the Twisted project team (Lefkowitz). Glyph is the son of r0ml, our former VP, Research and Executive Education. I am pretty sure that the Lefkowitz family enforces a strict nickname policy.
Jan 11, 2006
I was planning to update the look of this site as a surprise for my 100th post (not too far away). However, I just couldn't bear the look of it anymore. My colleague, Steve Zimmerman, a great designer but hesitant author (I am still working on him), volunteered his template because he couldn't stand my "brand" either.
Let me know what you think.
Jan 09, 2006
A participant on the iECM mailing list recently posted a link to an article about the GSA concluding that metadata is not essential. The study, based largely on information from industry experts, found that search technology is good enough that full text indexing is sufficient and no manual human intervention is necessary.
I am sure that my taxonomist friends are working up a worthy rebuttal. But lets just consider the proposition that, at least in the case of normal textual information like this blog entry, manual keyword assignment is not essential. Of course, as the article states, this does not apply to graphical content or numerical data which cannot be parsed and into words that would match a textual search query. But in the case of text, is it reasonable to assume that the author will, in the course of writing, wind up using words that a prospective searcher will search for? There is the issue of synonyms and word choice, and word stems but that can be accounted for in a good search algorithm (When the query request contains "blog" also look for "web log," and "journal". When the query request is "running," look for "run"). Google seems to do a good job.
Interestingly, the commercial search engines all ignore keyword tagging because it is so often abused. I am reminded of the Extreme Programming philosophy about commenting your code (at least at the method level). The code itself should be clear about what it does without explanation by comments. The need for comments is a symptom of overly complex and, therefore, hard to maintain code. I re-read this post and it contains every keyword that I would have used to classify it (search, query, metadata, GSA).
To be honest, most clients I have worked with have either neglected or abused keywords. Either don't understand the value of keywords and don't bother, or they try to game the system so that their content gets put in visible places (yes, this even happens on corporate intranets).
So what if we said to our authors what the commercial search engines tell us: "Don't worry about meta-data tagging, just write good content and we will bring you the right readers." Where would we be?
But Metadata is not just keywords. Look at the basicLibrary of Congress search page. See how you can search on different metadata fields to get what you want? Metadata also helps content reuse. For example, if the title, summary, author, and other attributes of content are stored in structured way, they can be shown on pages that list many content assets, not just the detail page. A 50 word summary is more valuable than the first 50 words of a 10,000 word document (unless the author is especially good at getting to his point. I noticed that in this entry I lead in talking about iECM which this post has nothing to do with). Structuring a portion of content also helps with things like sorting (as in by date, author, etc.).
Metadata is what content management is. To quote a recent CMS Watch article by MarkLogic CEO Dave Kellog:
That is, while ECM tracks and manages a lot of information about the content, it actually does relatively little to help get inside content. Despite its middle name, ECM today isn't really about content. It's about metadata.
Without metadata, an ECM is just a file system with versioning.
So, it looks like authors are not off the hook. Interestingly, in the library world, the people who write the metadata are different from the people that write the content. Unfortunately this is too costly for most corporate environments that casually create and use content and don't have the budget for a full time librarian staff.
Jan 06, 2006
Apoorv Durga on his PCM Blog has a nice post about a migration from Vignette to OpenCMS. The overall project went well and the client was pleased with the results. Apoorv also points out that some features are missing from OpenCMS most notably large site features such as replication and backup. For example, in Vignette, you have a staging server where you manage content and a production server where you display content. OpenCMS does not have that so you you have one server (or cluster) doing both content management and delivery. This may have security and (in extreme cases) performance implications (although with caching turned on and clustering, it is likely that it is not an issue.).
If content syndication is important, you might try Magnolia which has a subscriber model that allows an authoring server to publish content to display servers. Of course, Magnolia is missing many of OpenCMS's advanced features such as versioning and workflow. But if those are not important (frequently people think they need these things more than they actually do), you should give Magnolia a look. It uses a JCR (implemented by The Apache JackRabbit Project) and that is not something that you see in many commercial products.