Archive for the ‘jcr’ Category

Different Storage Models for Content

Tuesday, April 7th, 2009

Joel Amoussou has a great article explaining the benefits and implications storing your content in a relational database vs. an XML store. After making the case for when to consider XML over the more common RDBMS/ORM/POJO/Template approach, Joel provides some tips for content modeling and makes some great points about how you need to think a little differently when you work with XML.

I would like to reinforce Joel’s comment that the XML stack is quite different than technologies that you or your developers may be used to. The learning curve can be quite steep and many developers just give up before they really get it. Transitioning to an XML based architecture may not pay off for content management applications where your content types consist of a number of structured fields (like title and author) and one or more unstructured elements (like description and body) that the CMS just reads out what the author typed in – in other words, like this blog.

Russell Toris Wins the JCR Cup

Tuesday, February 3rd, 2009

via CMSWire, Day Software recently announced the winner of the JCR 2008 Cup. Russell Toris won the MacBook Pro grand prize for his web clipping application “Crux”. While the idea is not all that innovative (it appears to be very similar to delicious.com), the application shows how easy it is to get started developing on the CRX/Apache Sling bundle. Russell is a self proclaimed novice programmer with skills mainly in HTML and Javascript. To quote:

“I was surprised how easy it was to get started with JCR, even for a new programmer like me. I know a little bit of Java, but I mainly write Javascript. So I was really, really happy when I saw how much you can do with JCR and Sling just using HTML and Javascript. Basically, if you can write an HTML form, you can create a useful JCR app, because Sling takes care of a lot of server-side stuff and you really don’t need to write Java Server Pages unless you want to. This is a great convenience and makes it possible for programmers of all skill levels to be productive right away with JCR. You don’t have to be a Java expert at all.”

Hippo Launches Hippo CMS 7.0

Monday, January 26th, 2009

It’s official. Version 7, a near-total rewrite of Hippo CMS, is now GA. Hippo CMS 7, formerly called ECM 1.0, is based on newer technologies Apache Wicket and JackRabbit. This new architecture gets Hippo off of the complicated, difficult to learn Cocoon framework and the retired Apache Slide project.

One thing that I particularly like is that they have achieved a compromise between the JCR’s inherent hierarchical organization and a more free form faceted navigation. Hippo CMS 7 is designed for high content volume websites and shows a lot of thinking in this area. The faceted filters can be used at the API level by developers building websites on the platform. Unfortunately, this functionality has not yet been surfaced in the user interface.

As with earlier versions of Hippo, version 7’s architecture has a clean separation between the repository, the management application and the front end delivery tier. Hippo CMS 7 gives developers a bit more of a starting point for building a front end website by shipping with a JSP tag library refers to display components managed in the CMS. Developers are still free to roll their own delivery tier using whatever display technology they choose. The standards-based Java Content Repository, plus frameworks like Sling, will make custom Hippo powered websites easier to build.

Hippo CMS 7 has a plugins framework that facilitates adding new functionality to the platform. There Hippo Forge site will be a place for the community to share their components and tools. These plugins surface on the dashboard and in other areas of UI and are better encapsulated than Hippo 6.x customizations.

hippo7-edit-blg

On the UI side, Hippo CMS 7 shares some basic concepts with earlier versions of the platform. Version 6.x users will recognize the stateful tabs but will appreciate a new three column layout that allows a user to browse the repository and edit multiple content items at once (see screenshot). There are several other AJAX-enabled goodies like type-ahead search and linking and image placement through drag and drop. If you have seen Day’s new CQ5 UI, there are some similarities there. In fact, an alpha of Hippo CMS 7 won second place in the Web Idol demo competition at the jboye08 conference last November. Hippo has plans to create specialized versions of the user interface to optimize the usability for specific user segments. For example, they are working on a user interface view that is optimized for power users on wide-screen displays that will maximize the use of the multi-column layout.

Being a new product, there only 2 customers live on Hippo CMS 7. Two more implementations are in progress. The documentation is not going to win Pulitzer but I have found the mailing lists to be very helpful. If you like what you see, I would recommend setting up some kind of arrangement with the Hippo team where they work closely with your implementation and they can submit fixes/improvements back into the core. Current 6.x customers will be supported by a dedicated V6 team who will maintain the platform with fixes and minor enhancements. No new support contracts will be sold for V6.

This is a big release for Hippo CMS. Usability-wise, there are significant improvements – particularly for power users managing large content repositories. Architecturally, CMS 7 offers a more modern technology stack that flattens the learning curve and enables more efficient development of the product. With a couple of successful implementations on the 7.x series, Hippo CMS may get it some deserved attention (particularly in North America where it is not widely known).

CRX and Tar PM

Wednesday, November 12th, 2008

Thomas Müller has a blog post that nicely describes how the Tar PM works. Tar PM is the fastest of Day CRX’s pluggable persistence managers. The speed of Tar PM is a major reason why some companies go with the CRX rather than the free JCR reference implementation Apache JackRabbit.

The key to Tar PM’s speed is that it only supports write operations and these operations just append the new data to the end of one big file (a TAR file actually). What content objects are stored where is recorded in an index which is also read-only. To prevent limitless growth of the data file, you need to periodically run maintenance programs that compress the file be removing deleted records.

This may seem very familiar to those of you who have managed systems built on Zope (like Plone) and have had to “pack the database” – an operation that does essentially the same thing as the CRX tools plus remove unneeded intermediate versions that were defensively saved during transactions. From my experience with Zope, I know that having a huge, single file database can be scary but not necessarily dangerous. What do you do if you have a corrupt record in the middle of the file that causes the maintenance tools to crash? Usually there is some way to fix it but you need access to the experts. The only issue is that Zope and CRX experts are not as easily found as Oracle, MySQL, or MSSQL experts. Tar PM seems to improve on the ZODB by switching to a new TAR file after a certain point. Longtime readers of this blog may remember my ZOracle Series (part I, II, and III) that described a project to point the Zope Object Database (ZODB) to an Oracle database.

Although both Zope and Day have customers with huge repositories, the general rule of thumb is to keep things small when you can. In the ZODB world there are extensions that store large binary files outside of the database. In the JCR world, the strategy is to segment content into smaller repositories. For example, if you have lots of publications, put each one into its own JCR instance rather that combine them all into one. Companies that pursue this type of segmentation need to have some component in the architecture that can look across repositories and maintain collections of references to show an aggregated view. At the simplest level, this can be a search engine. At a more advanced level there could be hierarchical taxonomy system with references to items in different repositories.

This strategy runs against Oracle’s vision of all your company’s content neatly organized in one big database. I would argue that putting everything in one place does not necessarily mean that it is well managed or easy to find. More important than how the content is physically stored is that it is cohesively organized (that is, content that belongs together is stored together) and that there are uniform ways to access it. This is the strategy of the JCR and it plays well with service oriented architecture where different applications (services) that manage their own data can be combined to support cross-application business processes. When you have everything in the same database, the tendency is to do your integration at the data level (which can be brittle and proprietary) rather than the application (or service) level. I won’t deny that it is handy to have a database that can scale infinitely in size and there are applications that need very large storage (like archival systems). But trying to keep things small and segmented has its virtues as well. I am reminded of the frequently made point that storage is cheap but finding and managing the information can be very expensive.

Roy Fielding to CMIS: You don’t know REST

Friday, October 3rd, 2008

Roy Fielding, creator of REST puts a nice smack down on CMIS’s claim of being RESTfull.

When CMIS was first announced, everyone looked at Day Software whose CTO, David Nuescheler, was the main driver behind the Java Content Repository Standard (JSRs 170 and 283). Of course, Day’s response was gracious. David write a congratulatory blog post and released this official quote:

“As three of the largest players in the ECM market, IBM, EMC and Microsoft are well qualified to initiate a protocol specification for content management interoperability that is complementary to a programming API like JCR,” said David Nuescheler, CTO of Day Software. “Day Software is happy to actively contribute to the specification, which we view as a validation of our standardization and infrastructure efforts over the last three years. CMIS mirrors JSR 170 in that it is platform-agnostic, appealing to a mixture of languages and technologies. We congratulate the group and look very much forward to participating by contributing our JCR and REST knowledge and experience to future versions of the specification. We welcome CMIS as a high-level content protocol that transcends any one programming language, and see it as a win for the entire industry.”

Roy, whose role at Day is Chief Scientist, was less constrained by politeness. Not only does he take CMIS to task for not living up to its RESTful claims, he also calls out the CMIS team for jumping the gun on calling it a “standard.”

Ouch!

Give Your Repository a REST

Tuesday, July 22nd, 2008

Through my research and my client work I have been running across this recurring pattern of exposing a content repository through a REST interface. In the past, I have written about the JCR and Sling and Alfresco’s Web Scripts architecture. I really like both of those implementations. More recently, I have been working with a client who has built their own REST interface on top of Day’s CRX. They started their project before Sling was a glimmer in Apache’s eye and they took a slightly different approach. Instead of using Sling’s repository-oriented request handling, or Alfresco’s model of registering a Web Script (written in Javascript) to a particular path, my client has built out a full URL based query syntax through a servlet. Right now, the syntax focuses on searching retrieving content and is very powerful.

The strategy of using a REST API for your repository solves a central problem with the JCR and other Java base repositories: remote connectivity. Without a remote connectivity infrastructure like JDBC or ODBC, technologies wishing to talk to a Java repository must resort to connectivity like RMI (Remote Method Invocation) that are inefficient and do not necessarily play nicely with firewalls. While not particularly efficient (lots of protocol layers and text processing), REST offers a nice foundation for enabling remote connectivity at the appropriate layer of abstraction (that is, how content is logically stored – not how it is physically persisted). There are many reasons why REST is a good strategy but I think that the most important ones are:

  1. There is great infrastructure available for optimizing and controlling HTTP traffic. For example, reverse proxy technologies like Squid can stand in front of the REST interface and serve repeated requests out of cache. Firewalls can be used to filter traffic with rules that evaluate the requested path and requester origin (beware IP Address spoofing).
  2. REST is entirely technology neutral. Everything talks HTTP and XML. You can replace the implementation of either the server or the client with little risk to the overall architecture.

I think the only downside is that developing your own API is tricky business. While you are free to change your underlying data structures, once you publish your API and start writing applications on it, you lock yourself in. Where possible, it is best to support standardized query syntax like XQuery or the JCR query language in addition to your domain-specific methods.

I expect to see this pattern of REST-based repository access to be pretty much the standard as we get into Web 2.0 architectures that support mash-up applications. If they can address the overhead of all the text handling, more and more systems will use REST API’s to de-couple the various components in the application stack. Something to consider the next time you design a content-centric application.

First Official Release of Sling

Monday, June 30th, 2008

The Apache Sling team recently announced the first official release of Sling. Now you can download some nicely packaged Sling bundles to play around with.

I have been experimenting with the Sling/CRX bundle that came with Day Software’s JCR Cup 2008 competition (entries due midnight September, 30) and was really impressed by what I saw.

Sling allows you to write applications on top of the JCR using either server side or client side Javascript. On the server side, you can create Java Script Templates (ESP files) that give you access to the full JCR API. Templates are stored in the repository and called using an elegant MVC request processing framework. Templates can be called directly, or can be associated with content types and executed when an asset of that time is requested. As you might expect from Roy Fielding’s employer, it is all very REST. For client-side scripting, you just import a Javascript file called sling.js and you get methods like “Sling.getContent” (which gives you an array of Javascript objects).

Despite the fact that Sling is still an incubation project, it is fairly mature and robust. Day’s upcoming release of Communiqué (version 5) uses Sling extensively. I envision Sling being used in a presentation tier where pages are statically rendered (baked) from content in the JCR and Sling is used to power dynamic AJAX overlays using content from replicated JCR workspaces.

I really like the fact that logic is written in an interpreted language like Javascript. Development and deployment is faster when you take out the compilation step. Furthermore, Sling is built as OSGi (using Apache Felix) bundles so it is more modular and flexible than a typical monolithic Java web application.

The CRX (or the free Apache JackRabbit implementation of the JCR) and Sling should be considered along side Alfresco with its elegant Web Scripts (which also uses Javascript as a scripting language). Alfresco has some nice virtualization features but there may be a higher level of lock-in to the Alfresco API’s. Alfresco has a user-oriented user interface while the CRX only has a JCR browser which is really only intended for administrators. However, in both cases, you will probably want to develop your own user interfaces because Alfresco’s current WCM UI is not optimized for managing web content (improvements are scheduled for mid 2009 – interestingly, the Alfresco team is calling these enhancements “project Slingshot).

Day’s new developer portal and blog

Friday, March 21st, 2008

Via CMSWire (because I missed the press release in my in-box – sorry Patrick).

Day Software recently launched a new blog and developer portal. The intent of the portal is to be for content management experts, by content management experts and focus on content technology standards. Day definitely has the horsepower to produce good content with on-staff visionaries like David Nuescheler (the driving force behind JSR 170), Roy “the REST man” Fielding, and collaboration and open source expert Lars Trieloff.

Color me subscribed!

BTW, I just noticed from the RSS feed that the title is “(content goes here) blog.” What’s up with that?

JCR Community Gathering in Amsterdam

Saturday, March 15th, 2008

There will be a JCR Community Gathering in Amsterdam on April 8th as part of ApacheCon Europe. There will also be a JCR session and BoF in the ApacheCon conference.

Hat tip to Zukka Zitting for the announcement!

David Nuescheler nominated for award for JSR 283

Friday, May 11th, 2007

David Neuscheler, CTO of Day Software, was just nominated by the JCP for the Most Outstanding Spec Lead Award for his work on JSR 283. JSR 283 takes the Java Content Repository (JSR 170) further by adding enhancements like federation, remoting, more standard node types, and better access control. JSR 283 was first introduced in October 2005 and these things typically take a long time to make their way through. In the first go around with JSR 170, the team kept momentum by building an open source reference implementation: Apache JackRabbit. According to Zukka Zitting, JackRabbit (or at least a branch of it) will be used as a reference implementation for JCR 283.