<!-- Content Here -->

Where content meets technology

May 21, 2008

Content is not Data

David Nuescheler, CTO of Day Software and spec lead for the Java Content Repository specifications JSR 170 and 283, likes to say "everything is content." This is a bold statement that is intended provoke thought but I think that it is also a reaction to a prevailing view among technologists and database vendors that everything (including content) is just data. While it is true that content, when stored electronically, is just a bunch of 0's and 1's, if you think that content is just data, you need to get out of the server room because that is not how your users see it. There are four main reasons why.

  1. Content has a voice. Put another way, content is trying to communicate something. Like me writing this blog post, a content author tries to express an idea, make a point, or convince someone of something. Communication is hard and requires a creative process so authoring content takes much more time than recording data. Content is personal. If the author is writing on behalf of a company, there may need to be approvals to ensure the voice and opinion of the company is being represented. The author may refer to raw data to support his point, but he is interpreting. For example, even a graph of data may reflect some decisions about what data to include and how to show them. Because content has a voice, content is subjective. We consider the authority and perspective of the author when we decide whether we can trust it.

  2. Content has ownership. Data usually do not have a copyright but content does. The people who produce content, like reports, movies, and music, get understandably annoyed when people copy and redistribute their work. While data can be licensed, it is less common. Often data are distributed widely so that more people can provide insight into what they mean. Interestingly, when content is digitally stored as data on a disk, we think less about it in terms of content. For example, we are OK with data backups of copyrighted material even though creating copies is forbidden.

  3. Content is intended for a human audience. While content management purists strive for a total separation of content and presentation, content authors care about how content is being presented. They may have a lot of control over presentation and obsess over every line wrap or they only get to choose what words are bolded or italicized. They will only semantically tag a phrases in a document if they know that it will make for a richer experience for the audience. Presentation is not just for vanity's sake. Presentation, when done well, helps the audience understand the content by giving cues as to how things are organized and what is important. While the Semantic Web is all about machines understanding web content, at the end of the day, the machines are just agents trying to find useful information for human eyeballs (and eardrums). Content is authored with the audience in mind while data is just recorded.

  4. Content has context. In addition to who wrote the content, where it appears also matters. We care greatly how content is classified and organized because we want to make it easier to find. A database table doesn't care about the order of its rows (it is up to the application to determine how they should be sorted). Content contributors really care about where their assets fall in lists (everything from index pages to search results).

These distinctions may seem totally academic but I think they have real implications for the technologies that manage content. Because content is much more than "unstructured data," we can't think about the tools we use to manage and store it just in terms of big text fields in a relational database and forms to update these rows. Content is a personal experience for both the author and the audience and the technology that intermediates needs to be sensitive to that. Every once in a while there is a meme about "content management" becoming an irrelevant term because it will be subsumed into other more process or industry oriented disciplines. If that does happen, it is critical that certain content technology features and concepts carry over.

  1. Versioning. Content goes through a life cycle of evolution and refinement as groups of contributors work together to achieve the best way to convey the information and ideas. Some content assets (like policies and procedures) are updated hundreds of times over many years as information changes. Other assets go through many rapid iterations over a shorter period of time (such as an intensely negotiated contract). Often participants in a content life cycle need to know just what has changed. For example, a copyeditor can save time by just proofreading the changes since the previous copy edit. A translator may not need to re-translate an asset if only a minor edit was made. Sometimes the history of change can give insight into the spirit of meaning. Versioning is not just for reverting to older versions. A robust versioning system has features like version comparison and annotations.

  2. Control over the delivery. To effectively communicate, you need to tune your delivery to your audience. WYSIWYG editing and preview both try to give a content contributor the perspective of their audience. WYSIWYG editing gives a non-technical contributor some control over the styling over text. It is important that the WYSIWYG editor gives an accurate representation (as in the same CSS styles) of what a visitor will see. Single page preview puts the content into the context of a page by executing rendering logic. The more complex the rendering logic, the more difficult it is to control what the user sees. For example, if there is some logic to automatically display relevant related content, the preview environment has to have the same content, rendering code, and user session information as the production environment. Oftentimes, this is hard to do. I have had clients really struggle over controlling dynamic rendering logic. For example, a relevance engine automatically associated inappropriate images with articles or showed the same related content multiple times. Some users also like to see how articles show up on dynamic indices and search results. In these complex delivery tiers, preview is a lot more like software QA than simple visual verification - you need to test all the scenarios and parameters. A good practice is to delineate pages or sections that you want full editorial control over and other (less important sections) that are not worth the manual effort of controlling.

  3. Feedback. You can't communicate in a vacuum. You need feedback. However, most content contributors lob their content over the wall and then forget about it. When you are speaking in front of a group you can gauge reaction and make adjustments. As the web turns into a conversation, the content contributor needs to be listening as much as they are telling. Most content contributors underuse web analytics. The more accessible this information can be made, the better. Many web content management systems integrate analytics packages and have nice features like analytics overlays over rendered pages. However, these features are not used enough. More commonly, an analytics report will be circulated around to people who don't understand how to read it. Comments and voting can also be a powerful medium for adjusting and reacting to feedback either by direct response or by using knowledge of the audience in subsequent articles.

  4. Metadata. While metadata storage is trivial, capturing and using this information is a challenge. Metadata such as source and ownership are critical to tell the audience where the asset comes from (its voice and authority) and how it can be legally used. Metadata is also important for classification and context. Content contributors are notoriously bad at metadata entry: they either neglect or abuse it. Automation is part of the solution, but a good process involves humans with the responsibility for metadata (bring on the librarians!). The best way to leverage and exchange metadata is through standards based formats. Industry oriented formats (like NITF) are important because they have a standard set of metadata built in. Microformats are also useful for highlighting specific bits of standard information within rendered web pages. While most WCM platforms can produce these outputs through their templating tier, very few do any validation of the output. Reviewers just visually validate what they see on a preview page.

  5. Usability. Most of all the system needs to be easy to use. Creating content is hard work no matter how you do it. Any system that distracts or complicates a user from the creative process of developing content is bound to be un-popular and the first excuse for failure. The ideal content management system disappears from the user's consciousness by being familiar and frictionless - you don't need to think about it and it gives you immediate results. For many people, that is Microsoft Word (until Word tries to outsmart you and take over your document) and I have already mentioned the disturbing amount of web content that originates in MS Word. For some, blogging tools are approaching this level of usability. For others, in-context editing achieves it. In many cases users get so familiar with a tool that they forget they are using it even if the tool is hard to learn at first (I am reminded of this when my fingers just automatically type the right commands in vi). This usually only happens when you have specialists operating the CMS rather than a distributed authoring where all the contributors enter their own content.

If you are building an application that also needs to manage content, don't just think of the content in terms of CRUD for semi-structured data. Luckily, components and frameworks are available to incorporate into your architecture. The Open Source Web Content Management in Java report covers Alfresco, Hippo, and Jahia from this perspective. Recently, I have been playing around with the JCR Cup distribution of Day's CRX that bundles Apache Sling (very cool!). Commercial, back-end focused products like Percussion Rhythmyx and Refresh Software SR2 certainly play in this area. People used to deploy Interwoven Teamsite for this but I think it is too expensive to be used in this way. Bricolage is an open source back-end only WCM product written in Perl. But accurate preview and content staging can be complicated in decoupled architectures. Drupal and Plone are also quite popular as content centric frameworks for building applications but they tend to dominate the overall architecture (unless you use Plone with Enfold Entransit).

You have plenty of options that will allow you to avoid brewing your own content management functionality. Consider them!

May 19, 2008

Advice for vendors dealing with independent analysts

Alan Pelz-Sharpe, from CMS Watch, has written some great advice for Analyst Relations professionals in an article called "Advice for vendors dealing with independent analysts." The only thing I would add to his list of dos and don'ts is to use the information from the evaluation to make the product better.

One of the nice things about writing about open source software is that many of the products that I cover do not have analyst relations people. Instead, I talk to developers, committers, and CTOs. The big difference is this... AR people's job is to make the product look good, a developers job is to make the product be good. Nearly all of the products that I reviewed in Open Source Web Content Management in Java were extremely gracious with the criticism. Part of it is that they are grateful for the coverage. But a bigger part of it is that they are used to interacting directly with the community and getting direct feedback. Most open source developers know that software doesn't get better by convincing yourself that it is great. It gets better through continuous improvement that uses criticism as a catalyst for creative solutions.

It doesn't make sense for software vendors to reject (essentially) free feedback that can be used to make their product better and pay "tier one" analyst firms to try to delude themselves and the market with their own hype. In the end, the truth (as experienced by the user) always comes out. In a Web 2.0 world where everyone has a voice, it comes out even faster.

May 16, 2008

Alfresco and E2CM

Alfresco has been tearing up the newswire recently with announcements and interviews related to their evolved vision that incorporates an Enterprise 2.0 style mash-up/social approach to Enterprise Content Management. For lack of a better term, lets call it Enterprise 2.0 Content Management or E2CM (a new term! you heard it here first).

Unlike the old ECM that was all about monolithic applications to support large, structured, and formal business processes (like check processing or FDA approval), E2CM supports the small, informal, ad-hoc interactions that the average knowledge worker engages in every day. With its flexible, open, and extensible architecture, Alfresco is well suited as a foundation for building and integrating with all sorts of simple tools that facilitate sharing, collaboration, and community.

My only concern is that Alfresco is priced high as a framework for building custom applications. To get the Enterprise Edition (required for support and access to certified integrators), you will probably be looking at an annual subscription fee of well over $60K. In the age of free frameworks, that is pretty steep. However, when you look at Documentum and FileNet licensing, it doesn't look bad at all. I guess it all depends on where you are coming from.

All this attention to E2CM (when you come up with a new term, you need to use it a_LOT_) may be at the expense of the traditional WCM functionality in which Alfresco has lagged. It takes too much customization to build a simple, semi-dynamic website on Alfresco. View the source on most of the certified integration partner websites and you will see that they are running on WCM platforms like Joomla!, Plone, and Drupal. Also, two very senior people from the WCM team (the architect and lead developer that came over from Interwoven) have left. Fortunately for Alfresco, they put in place the core infrastructure like the dependency management, deployment, and virtualization. There is also a good start on some UI improvements that will work towards market parity.

It remains to be seen whether Alfresco sees traditional WCM as being a market they want to pursue. Given the competitiveness and price pressure in the market, I understand why they would not want to. Their advantages as a framework for assembling and integrating E2CM applications outweigh their strengths as a turnkey WCM business application and it makes sense for Alfresco to play to their strengths.

May 15, 2008

CM Pros Spring Summit

I am looking over the program for the CM Professionals Spring 2008 Summit (June 17th in San Francisco) and am really impressed with the lineup. This event seems to get better every year.

The topic is "Dynamic Delivery Across Multiple Media Channels" and there is also a healthy dose of web 2.0 and social media content. The one day event is broken up into two tracks (business and technical) and attendees will need to make some hard choices. For example, do you see Christine Pierpoint describing governance in a Web 2.0 world or Michael Wechner talking about integrating Open Social?

Either way, you can't go wrong. Plus, it only costs $395 to register. $1,795 gets you into the Gilbane Conference (always a fun event) and an iPod Touch as well.

May 14, 2008

Magnolia Publishes Roadmap, Loses Reference Site

The Magnolia International team has been at work charting out their roadmap and increasing their transparency by publishing their plans on their wiki and stepping up their blogging (Boris, Philipp, and Gregory).

According to the roadmap, the upcoming version 3.6 (due out in June) will have mainly infrastructural improvements with better integration with JSF and Spring and enhanced caching and clustering. Feature-wise, the biggest changes will be an improved import/export system for backups.

For a user's perspective, the biggest changes will come from a special project called Genuine that will revamp the administration and content contribution user interface ("Admin Central"). Genuine started with a critique of the current UI which lead into an initiative to improve Admin Central's usability and extensibility. The project appears to be at a conceptual stage with few commitments on milestones or other details. The ideas driving the project are best summarized in the Concept Presentation slide deck. Magnolia's usability is generally regarded as being quite good. That they are able to critically look at their own work for ways to improve shows their drive.

In other (totally unrelated) Magnolia news, Drupal's Dries Buytaert reports that France24 is now running on Drupal. France24, France's answer to CNN, used to be a high profile Magnolia site. Dries doesn't know the circumstances of the migration. Personally, I think that Drupal is a better functional fit for media and publishing oriented sites than Magnolia because of how Drupal structures and organizes content. Magnolia content is typed at the "paragraph level" rather than at the asset level and is organized in a rigid hierarchical structure. Drupal content is typed at the asset (or node) level and is organized by keywords (called vocabularies). This makes it easier for articles to surface on multiple pages for a richer, faceted browsing experience (with more ad impressions). Drupal is turning into a popular choice for the media and publishing industry. For example The Onion, MotoGP, Fast Company, and Lifetime all run on Drupal. There is also a nice little video showing all the newspapers running Drupal. Magnolia, on the other hand, is better for corporate internet and intranet sites where site authors like tight control over the organization and display of the content.

May 13, 2008

Fireside conversation


Entrance, originally uploaded by stevenn.

[Update: Steven posted slides from the facilitated discussions on slideshare]

Thank you Outerthought for all the kind hospitality and for hosting the first fireside conversation. The event was a great success (pictures) with a fantastic audience and thoughtful participation during the facilitated discussions. I have a feeling that there will be more events like these so be sure to register for the next one!

May 07, 2008

OpenCms Days 2008 Report

The OpenCms community just finished its first user/developer conference (OpenCms Days 2008) in Cologne Germany. Thank you to our hosts Alkacon Software and the sponsors for putting on such a valuable and fun event.

OpenCms is primarily used in Europe so Cologne (home of Alkacon) was a good choice of venue. If you ever have an opportunity to visit Cologne, you should. It is a beautiful city. Many of the 110 attendees were mostly from Germany but there was also representation from Italy, Spain, Denmark, Czech Republic and there was one attendee from Japan.
The only Americans there were speakers but maybe that was just the weak dollar talking.

Since OpenCms is essentially a commercial open source project (owned and developed by Alkacon) aimed at enterprise buyers, the feel was different than some of the other open source conferences I have been to. It was more corporate, less kumbaya. But there seemed to be a genuine interest in collaboration and community across corporate boundaries. The main question was where to begin. Having the conference was a great start. I saw many first face to face meetings turn into what looked like closer relationships over the two day event.

There were also promises of better inter-company communication - in particular between Alkacon, module developers, and systems integrators. Several attendees had expressed frustration that they had built modules that were quickly either broken or made obsolete by a new release of the the core. The community wanted Alkacon to be more transparent about their roadmap but Alkacon was concerned about making promises that they couldn't keep (Nonetheless, Alkacon CEO, Alexander Kandzior's keynote did a nice job of describing the next few releases of OpenCms at a high level). There were commitments to fix this through sprints (first is July 21-22 2008 in Cologne) and better general communication but, of course, the real proof will be when people return their jobs.

Another observation I had was that the community that I saw seemed much less wrapped up in social media and networking. Most of the conferences that I go to advertise keywords that everyone should use to tag their photos, blogs, and tweets with when they post them on the social networking sites. I got a sense that few within the community used these services. Granted, my expectations for social media use are probably set overly high by the types of projects and people that I follow and the conferences that I attend. I think it is safe to say that Web 2.0 will not be high on the list for OpenCms enhancements. I think that most OpenCms adopters are fine with that prioritization.

The sessions fell into 2 tracks: business and technical. The technical track covered techniques for integrating and customizing OpenCms. The business track showed OpenCms being used in large companies like Bayer, Qimonda, GARDENA, and OEV Online Dienste. Many of these clients have used OpenCms to replace commercial products and are expanding their use of the OpenCms after initial successes. The sites include Intranets, Extranets, Corporate marketing sites and other forms of traditional web content management. They had requirements that are typical of enterprise buyers: large volumes of content, many users with a wide range of technical abilities, and complex organizational structures that require content sharing and access control. Two very good examples were Qimonda's intranet and OEV which hosts mult-tiered websites for 15 insurance companies.

Overall, it was a great conference and (hopefully) the first of many events like it. I will be keeping my eye on the mailing list for follow through on the promises of more community collaboration and communication.

Apr 28, 2008

OOXML and Microsoft Office 2007

A few weeks ago, I wrote that I thought ISO adoption of Microsoft's OOXML was a good thing because a practical standard that everyone followed was more valuable than a noble standard that everyone ignored. Well, it turns out that OOXML is actually the standard that no one follows. As Stephe Walli points out, Microsoft Office 2007 does not support OOXML. So what good is a standard that no one supports? No good at all. At least OpenDocument is supported by multiple applications.

But complex layout standards are a tricky business because it is difficult to write a complete and clear specification that covers so much detail. Just look at the HTML standard and browser compatibility. Joel Spolsky writes eloquently on that topic here. And HTML is designed to be much simpler than an office format.

If the ultimate goal is to allow people with different software applications to collaborate on layout intensive documents, I don't know if we are ever going to get there. As an experiment, I took a report written in NeoOffice and opened it and saved it in Apple TextEdit (which claims ODF support). When I re-opened the document in NeoOffice, much of the formatting was stripped out. I am still waiting for Lotus Symphony's promised Mac release. That will be a better test of round-trip collaboration.

My true hope is that less collaborative content development is done in documents and more through server based tools such as wikis. I think the average knowledge worker is moving in that direction. Tools like Zoho Office and Google Docs are helping here a great deal. These tools allow the collaborative process to happen in a storage neutral way and then give options as to what format the content is published in (PDF, ODF, OOXML - or whatever MS Office really is).

Apr 25, 2008

Apr 24, 2008

Alfresco releases Enterprise Edition 2.2

They are a bit behind schedule and there was very little publicity about it but Alfresco Software has released version 2.2 of Alfresco Enterprise Edition. While this is just a point release, 2.2 introduces a couple of big improvements over 2.2. Probably the most welcome enhancement from a user perspective is the introduction of search within web projects. While web projects were always indexed for search and the API supported it, users can now search for web content from within the user interface. Developers will appreciate that the deployment mechanism that came with 2.1 now has a GUI that allows developers to define deployments to push code or content to different environments.

While it still suffers from some core usability issues, with version 2.2, Alfresco has reached a point where it is a useful tool for web content management. Many systems integrators work around Alfresco's usability limitations by using Web Scripts to rapidly develop custom user interfaces. Expect bigger UI improvements in Enterprise 3.0.

For more information on Alfresco 2.2, you can buy the 19 page Alfresco evaluation from Open Source Web Content Management in Java or buy the whole 160 page report to see how Alfresco stacks up against other options.

← Previous Next → Page 41 of 75