Wednesday, December 28, 2005

"ZOracle" Part II: The Solution

In my last post, ZOracle Part I, I described the requirements and some background on a recent project to rewire an existing Zope CMF-based CMS to use an Oracle based relational repository.

As I hinted earlier, the key to the solution was an aspect of the existing architecture that placed all of a content asset’s attributes in a structure based on a Zope's PropertySheet. Each object had multiple property sheets that represented groups of attributes in much the same way Alfresco uses Aspects to allow simple objects to be extended with additional data or capabilities. PropertySheets are normally backed by a Python dictionary, and inherit from Persistent so they get stored in the ZODB when associated with other ZODB persisted objects. What we did was create a new kind of property sheet, called RDBProperties, that was backed by a Data Access Object rather than a ZODB persisted dictionary. The DAO encapsulated all the code to read and write from the database. This enabled us to experiment with various database connectivity strategies and do comparative testing (See Part III: Connecting to an Oracle database).

We decided to keep things simple by creating a new database table for every different PropertySheet definition and having a corresponding column for each property. This normalized design was desirable because it made things easier for external applications hoping to make sense of the data. So, if there are 5 asset classes, and 3 possible property sheets to use, there would be three backing tables. We decided to have Oracle manage the primary keys for property sheets with sequences. We stored the unique property sheet ID on the Zope side for retrieval and also captured the Zope-derived object ID in the database so we could reconstruct objects outside of Zope. Other than making sure that we got our data types right and finding a reliable way to talk to the database (see next article), this part of the prototype was pretty straightforward.

The only problem that remained to solve was when write back to the database. Zope is somewhat elusive in this regard. By design, the programmer doesn’t really know when persistent objects are being written to the database. It just kind of happens in the background. We needed to make sure that we kept the relational database up to date, but we also didn’t want to write to the database too often and create a new performance problem. To do this, we extended Zope’s transaction manager, TM (Shared.DC.ZRDB.TM.TM), with a new derived class called RDBTransactionManager. This gave us hooks to execute logic at the beginning and end of the transaction (in Zope, a transaction is defined as what happens from the beginning of the HTTP request to when the response is sent. This is different from a database transaction), and also when a transaction is aborted. A new DBTransactionManager is created the first time a DAO is requested and then re-used in each subsequent DAO instantiation within the Zope transaction. Our DBTransactionManager also had a collection (dictionary) of Property DAO’s that were used in the current Zope transaction, so within a transaction, we could cache values and then wait until the end of the transaction to write back to the database (or not, in the case of an abort). At the end of the transaction, the DBTransactionManager iterated through its list of DAOs and called their persist method, and then at the end of it all, called a global commit method which committed the database transaction.

In this design, Zope (actually ZEO) still manages concurrency because, as far as it knows, it still owns the objects. In our tests, Zope still complained when two transactions were trying to update the same object at the same time. The design also worked with multiple ZEO clients talking to the same ZEO database and Oracle database. Search and general object maintenance was also still managed within Zope. Whenever a content asset is stored in the repository, it is indexed with portal_catalog. Also, all of the other functionality of the CMS operated normally - essentially unaware that anything unconventional was happening underneath. Only some rigorous testing can answer the question of whether this improve the performance of the application. It will certainly reduce the size of the ZODB and also reduce the frequency that the ZODB thinks it needs to write data (Zope considers the data contained in the DAO “volatile" and therefore unworthy of persistence). But we did meet the requiremed of data being accessible to any technology capable of accessing an Oracle database.

Next: Connecting Zope to an Oracle database