Archive for the ‘django’ Category

CMS Architecture: Managing Content Type Configurations

Tuesday, January 19th, 2010

Warning: this post is highly technical. Non-programmers, please avert your eyes.

Deane Barker (from Blend Interactive) and I have a running conversation about CMS architectures. One of the recurring topics is how content models and other configuration is managed. There are two high-level approaches: inside the repository and outside the repository. Both have their advantages and disadvantages.

  • Managing content types outside the repository

    My preferred approach is to manage content type definitions in files that can be maintained in a source code management system. This way you can replicate a content type definition to different environments without moving the content. Developers can keep up to date with changes made by their colleagues. Configuration can be tested on Development and QA before moving to production. There is no user-interface to get in the way. No repetitive configuration tasks. Everything is scriptable and can be automated. I especially like it when content types are actual code classes so you can add helper methods in addition to traditional fields. Of course, when you get into this, it is a slippery slope into a tightly coupled display tier that can execute that logic.

    On the downside, it is often difficult to de-couple the content (which sits in the repository) from the content model (which defines the repository). When you push an updated content type to a site instance, you might need to change how the content is stored in the repository. This is more problematic in repositories that store content attributes as columns in a database. It is less of a problem in repositories that use XML or object databases (or name-value pairs in a relational database) where content from two different versions of the same model can coexist more easily.

    If you do manage content type definitions outside of the repository, a good pattern to follow is data migrations, which was made popular by Ruby on Rails. I am using a similar migration framework for Django called South. Basically, each migration is a little program that has two methods: forward and back (“up” and “down” in RoR. “Forwards” and “backwards” in South) that can add, remove, and alter columns and also move data around. The forward updates the database, the backward reverts to the earlier version.

  • Managing content types within the repository

    Most CMSs follow the approach of managing the content type definitions inside the repository and provide an administrative interface to create and edit content types. This is really convenient when you have one instance of the application and you want to do something like add a new field. There is no syntax to know and application validation can stop you from doing anything stupid. Some CMSs allow you to version content type definitions so that you can revert an upgrade.

    When you have multiple instances of your site, managing content types can be tedious and error prone if you need to go through the administrative interface of each instance and repeat your work. Of course, you can’t copy the entire repository from one instance unless you want to overwrite your content. If your CMS is designed in this way, you should look for a packaging system that allows you to export a content definition (and other configurations) so that it can be deployed to another instance. Many CMSs allow an instance to push a package directly over to another instance. The packaging system may also do some data manipulation (like setting a default value for a required new field).

Unless you are building your own custom CMS, this all may seem like an academic question. It really is quite philosophical: is configuration content that is managed inside the application or does it need to be managed as part of the application. The same thing goes for presentation templates (but that is another blog post). However, if you intend to select a CMS (like most people should), it is important to understand the choice that the CMS developers made and how they work around the limitations of their choice. If you are watching a demo, and you see the sales engineer smartly adding fields through a UI, you should ask if this is the only way to update the content model and if you can push a content type definition from one instance to another. If the sales engineer is working in a code editor, you need to ask how the content is updated when a model update is deployed.

10 Django Master Class action items

Tuesday, November 3rd, 2009

A couple of weeks ago I attended Jacob Kaplan-Moss’s Django Master Class in Springfield, Virginia. It was a great class and I walked out with a bunch of ideas for making better use of Django. What follows is a set of action items that I created for myself. Jacob was not this prescriptive in his presentation. These are just my personal decisions based on how he explained things.

  1. Use South for database migrations (complete). Unlike Rails, Django has no native system for synchronizing the database schema with code changes. Django will create your initial database schema for you but you need to modify the tables with SQL whenever your models change. South gives Django Rails-like migrations which consists of methods to alter the database and also roll-back changes. I ported a new application I am working on over to use South and am very impressed. Jacob gave some great advice to keep your schema migrations from your data migrations. For example, if you are renaming a field: you would create one migration to add the field; a second migration to move the data to the new field; and a third migration to delete the old field. Doing this will make your migrations safer and easier to roll-back.
  2. Use PostgreSQL rather than MySQL (complete). Jacob didn’t talk disparagingly about MySQL but it was clear to me that PostgreSQL is what the cool kids are using. That is not to say there are not disagreements over what DB is best. I have been using MySQL for years but two things won me over. In the class, I learned that table alterations in MySQL are not transactional so if your South database migration fails, you can’t roll-back so easily. The second factor came after the class when I was reading all these blog posts panicking about what will come of MySQL now that Oracle owns it. I agree with most pundits that Oracle doesn’t have a great reason to invest in MySQL. My comfort level working with PostgreSQL is growing but its going to take a while to get as comfortable with the commands and syntax as I am with MySQL.
  3. Use VirtualEnv (complete). One thing about Python that always seemed hackey to me was the whole “site-packages” thing. I don’t like how all your Python projects tend to share the same libraries. In Java, you are much more deliberate with your CLASSPATH. The class introduced me to virtualenv and its sister project virtualenvwrapper. This creates a virtual sandbox where you can manage libraries separately from your main Python installation. It is brilliant.
  4. Use PIP (complete). I was pretty haphazard about what tools I used to install Python packages. I admit that I didn’t really know the difference between setuptools and easy_install. The Master Class nicely explained the different options and it seems like PIP is emerging as the Python package manager of choice.
  5. Break up functionality into lots of small re-usable applications (in process). Much of the advice from the class is summarized in James Bennett’s DjangoCon 2008 talk: Reusable Apps. Watch the video and be convinced.
  6. Use Fabric for deployments (not started). My normal m.o. for deploying code has been to shell over to a server and svn export from my Subversion server. In multiple server environments, I would usually have some kind of rsync setup. However, in my one of my client projects (using Java), I started using AntHill Pro (plus Ant) for both continuous integration and deployment. From that experience I saw light on the automated deployments. Fabric is primitive compared to AntHill Pro (it doesn’t have cool web-based UI) but it does allow you to run scripts remotely on other hosts. It’s like Capistrano for Python. In the next phase of development, I will definitely be using this.
  7. Use Django Fixtures (not started). I am really embarrassed to say that I have avoided using Fixtures for loading lookup and test data. Instead, I have been doing horrid things with SQL and objects.create(). I am looking forward to reforming my errant ways. Fixtures allow you to create a data file that Django will load for you. It offers three format options: JSON, YAML, or XML. Jacob recommends JSON if you can be assured that you have access to PyYAML, otherwise go with JSON which is nearly as readable.
  8. Look into the Python Fixture module (not started). This straight Python module seems to be an alternative to the Django fixtures system. It is more oriented towards test data and looks a little like using mock objects. I need to dig in a little more before I make up my mind about it.
  9. Use django.test.TestCase more for unit testing (not started). I need to do more with unit tests. I have had some good experiences with writing DocTests but I should use the Django unit test framework more. This will allow me to use fixtures more too! Plus with Django 1.1, startapp even creates a tests.py for you. How can I resist an empty .py file?
  10. Use the highest version of Python that you can get away with (in progress). In the class, Jacob made the good point that every version of Python gets feature and performance improvements. Why not go with the latest stable version like 2.6? Snow Leopard did it for me. I will try to upgrade my server as soon as I can get away with it.

If you can make it to the next Django Master Class, I highly recommend you go. Otherwise, you should look into these resources and make your own educated decisions about whether to use them.

You bought a web page factory, not a webmaster android

Thursday, July 16th, 2009

When a company builds a business case for acquiring a web content management system, a key selling point is this vision of business users being capable and willing to build the website of their dreams. In this dream, the webmaster and other technical staff are replaced by a hyper-caffeinated, mind-reading, web-savvy C-3PO. The companies that really buy into this vision are usually thoroughly disappointed with the results of the implementation. As a consultant interested in his clients success, it is my responsibility to talk my more optimistic clients down to a more realistic set of expectations. In other words, I have a tendency to rain on parades.

The metaphor that I find to be the most helpful to explain the realistic role of a web content management system is that a WCMS is a web page (and RSS feed) factory. In some implementations, the CMS is designed to create micro-sites in the same factory-like manner but it is still a factory. Here are the reasons why:

  • Like a factory, a web content management system requires up front investment to set up. Even if the underlying software is free, you still need to configure it to produce the kind of pages that you want. That investment only pays off after you have produced a certain number of units (pages) so it doesn’t make sense to implement a CMS to manage a 5 page website that rarely changes.
  • Factories are set up to produce a limited number of different products. As much as he would want to right now, a worker on the Chrysler Aspen production line cannot go to his workstation one morning and start building a better selling model — not the one that he saw driving down the street the day before and not the one he dreamt up as he was sleeping. All the tooling needs to be reconfigured and he doesn’t have the skills to do that. In fact, most auto factories shut down for a period of time each year to set up the machinery for the new model-year. The people that do this work have a very different set of skills and permissions than the line workers. They also have the design specs that have been blessed by the management of the compay. In the CMS world this translates into content types and presentation templates that technical people usually have to work on. If you want to build a totally different sub-site with different types of content and different layouts, you need to bring in the propeller heads. That said, the design of the templates of the content types, could have allowed the content producer to select options or exercise creativity in specific areas. Deane Barker has a nice post describing these boundaries as load bearing walls.
  • Factories need raw materials and labor to produce their output. The input of a content management system is content and it has to come from somewhere. The CMS will not the make the content but it can be used to add value to the content by providing functionality that a person can use to organize and present the content to different audiences. Often people look at their webmasters as simple HTML typists but in reality they are usually much more. The good ones proof read the content that they were given. They fill in the gaps by creating or finding content that they were not given. They coordinate content from different providers. They navigate through the site from the perspective of a visitor. The “webmaster@” email address forwards complaints to their email address. The CMS itself won’t do that for you. Only people can do that. The CMS will not eliminate the webmaster but it will make the webmaster more productive by taking some of the mundane mechanics out of the job. Maybe the webmaster can be a little less technical and focus more on the coordination and accoutability side of the job.
  • There is often a trade-off between flexibility and simplicity. One summer during high school, I spent an unbearable day in a factory at a machine that cut three foot long cardboard tubes into what you would see in the center of a roll of duct tape. I put the tube on an arm, pressed a button and stood back to watch the arm spin and the blades come down. Then I took the rolls off and repeated. This was a very simple machine to use, but I couldn’t make a paper towel or toilet paper roll with it — not unless I had a machinist come into re-configure it. They could have made an even simpler to use machine that loaded the tubes on itself; but that would probably require more machinist time to configure. With a CMS you can simplify the tool by taking away options and control. For the non-technical user, you need to be very deliberate about what options to expose. You need to confine their creativity to small areas (like a rich text area in the middle of the page). Most CMS accommodate this by providing different user interfaces for power users and non-technical users. This would be the manufacturing equivalent of a special factory for real craftsman to make limited edition products and prototypes.
  • Machines that maximize flexibility and simplicity are achievable but at a cost. Some machines are exquisitely designed to present just the right options to the operator (in a perfectly intuitive way) and automate everything else. Getting to this point can take hundreds of refining iterations. There are diminishing returns on these refinements so it is rare that a company can make the ROI case for this level of investment. I doubt that Starbucks will ever build an espresso machine that is so easy to use that any customer can walk in and make his own grande, half-caff, soy, triple, iced americano with lemon (in a venti cup).
  • Being a factory line worker brings less satisfaction than being a skilled artisan. Like with my short career as a cardboard technician, operating a machine is extremely boring when all the options are taken away. On the upside, this lets the contributor focus more on the content (where they should be applying their creativity anyway); but if the contributors don’t particularly like to write either, you have a problem. Often content contributors use poor usability as an excuse for avoiding the difficult task of creating content. If this is the case, no amount of user-friendliness will compel them to take ownership of their content. Factory workers don’t just burst into the plant to voluntarily produce cars. They need to be motivated by compensation and pride over the quality of their work (what they do have control over). With a CMS, content contributors have less responsibility for layout and branding of the site but they are responsible for the words and pictures and organization of the content. The quality and craftsmanship of those aspects of the site needs to be recognized.

It is hard imagine a worse buzz-kill than to have your knowledge workers and marketing staff picture themselves as machine operators; but I have yet to talk a client out of implementing a CMS (except in cases when they already have a CMS that is working quite fine but they are struggling for other reasons). The reason why is that once you get past a certain volume of content, you can’t manage it without the help of tools that take away some of the personal craftsmanship in design and functionality of each individual page (you can’t manufacture a million cars without mass production factories either). Mass production of pages is a good thing because the audience wants the information, not each content contributor’s own personal vision of how it should look. We tried that model. It was called GeoCities and it didn’t work out that well. The sites were just awful to look at and the content was out of date.

A web content management system reduces the cost of maintaining lots uniform pages (and sub-sites). It doesn’t help a company rapidly develop new concept websites. In fact, it often slows down the production of these websites — especially if the group that wants to do the innovating does not have developers who can access the CMS. Many media companies have a heavy duty web content management system for their heavy lifting (the bulk of their content on the main site) but use lighter weight frameworks (or CMSs that are designed to be more like frameworks) and custom code for their experimental sites (for example, The Washington Post and Django). But no matter what, if you want to innovate beyond the options and the text areas that were not designed into the CMS implementation, you are talking a software development lifecycle that includes development and testing and developers to do the work.

Book Review: Django 1.0 Template Development

Thursday, March 26th, 2009

I just finished reading Scott Newman’s book Django 1.0 Template Development. This is the second Django book that I have read (the first was The Definitive Guide to Django
) and I am very impressed by the number (and quality) of Django books that have been published. 21% of the respondents to a recent “This Week in Django” poll said that they learned Django from reading a book (65% learned from the online documentation). Considering that until recently there were no Django books, this is significant.

Django 1.0 Template Development lives up to its title by focusing on the template layer of the Django web application framework although it does go through some basics of setting up your project and some of the details of the Django request handling pipeline. There is very little coverage of models – just enough to give the sample project some data to work with.

There is good coverage of how templates are loaded and guidelines of how to develop views [1] with plenty of tips on leveraging Django’s many convenience features (like generic views) and organizing code for better manageability. There are examples for using and writing custom middleware, filters, and tags [2] with special attention paid to best practices in security. A whole chapter is devoted to working with Django’s pagination system. Explanations are well supported with the theory behind and examples that demonstrate the details of Django’s behavior.

The area that I was hoping for a little more depth was in optimizing performance. Django gives the developer a lot of options of how to design the application. For example, in addition to the typical template “include” syntax, Django also supports template inheritance (where a child template can extend and override blocks of a page from its parent). There is not much information on the performance implications of deep template hierarchies. The caching chapter gives a nice overview of Django’s different caching options and engines and general guidelines but perhaps the art of really tuning a site is the topic for another book.

I would highly recommend Django 1.0 Template Development for anyone who wants to efficiently build a clean and manageable template layer for a Django project. In particular, a developer who needs to make the display tier flexible and extensible (such as the book’s example of managing a separate site skin for mobile browsers). Although the preface recommends the reader have a working knowledge of Django and Python, I don’t think that is really necessary. There is just enough information to help the developer to understand the overall Django framework but the emphasis is definitely on displaying data.

Notes:

  • 1 in Django, the “view” is the code that gathers and preprocesses the data for the template to render
  • 2 These are important for a template developer because Django deliberately limits the amount of logic you can put into a template to force developers to keep templates clean and make code more reusable. Logic belongs in filters (that manipulate data) and tags (that do more complex logic), and middleware (where you inject additional functionality into the request/ response cycle).

Django Contrib

Monday, February 23rd, 2009

Jacob Kaplan-Moss wrote very good post on the purpose of django.contrib. For those unfamiliar with the Django project, django.contrib is a tightly managed, highly used collection of modules (packages in Python). Contrib is not quite Django core, but much more part of Django than the free-for-all of shared re-usable applications.

The reason why I find this interesting is that many open source projects struggle with the balance between the benefits and risks of maintaining a low bar to contribution. The generally adopted policy is to keep core development within a small trusted group of “committers” (who can commit their own and patches by others that they have reviewed) and then open up module development to rest of the world. What usually happens is that modules can be highly redundant and inconsistent on quality. Add-on modules are typically the greatest cause for system instability and insecurity and the most difficult to upgrade. But they are still great because they are code that you don’t need to write. Drupal, with its explosive community growth, struggles with this balance. Drupal developers often find that it is easier to write a module than to find one that is well written and does what you need it to do. Yes, there are Drupal modules that rise to the top and eventually get included into the core (Views, CCK, etc.), but they are rare. Sometimes there are random reversals – remember when CCK overtook flexinode?. The rest churn in the rough and tumble Contributed Modules library. Other projects have these issue, Drupal’s growth and scale just make it more visible. Typo3 had a rating system (“no cigar” through “Cohiba”) but the initiative fizzled.

Back to Jacob and django.contrib… Contrib is a very important resource for the Django community. The packages there are widely used and considered (by many) as good as core Django code. If a developer finds a module he needs in contrib, he will use it rather than any other comparable module he finds elsewhere. Contrib modules are actively maintained and are tested against new versions of the core. They are “sanctioned” not just for what they do but how they are built. In his post, Jacob recognizes the impact of incorporating a package into contrib and tries to clarify what incorporation means. Jacob proposes three tests to determine whether a package gets into contrib: the functionality should be optional (that is Django won’t break if you delete contrib from your system), it solves a common problem, and it exemplifies generally accepted best practices.

Jacob also writes that contrib packages should conform to the same standards and guidelines for the general module population. For example, they shouldn’t access Django’s “brittle internals” (that is, functionality not wrapped in an API that may change without warning). The rest of the post goes through examples of packages that should or not be in contrib according to his tests.

I think more projects should have a system like django.contrib or at least have an open discussion around what makes a preferred module. As Typo3 found, it is hard to instate one after the catalog of modules (and community of developers) has grown beyond a certain size. Putting the best modules into core carries the risk of bloating the core application with functionality that not everyone needs. The various Linux distributions that have package managers are able to put preferred modules in their package library (accessible through tools like port and yum). The last resort, which is where most big projects are, is to have a good network of developers and do your research from word of mouth.

Re-platforming www.contenthere.net

Friday, December 26th, 2008

If you have been playing close attention, you might have noticed that www.contenthere.net is now running on WordPress. Prior to the migration, the site was hosted on a combination of Blogger, Yahoo Store, and some hand coded HTML (managed in Subversion of course). That arrangement was fine but I ran into limitations with the integration between Yahoo Store and the rest of the architecture. There were no big show stoppers, just little inconveniences that I was getting tired of working around. Besides, I was itching to tinker – we techies get like that sometimes.

Selecting a new platform was fun because I got to be the client in a process in which I am normally the consultant. I was quite different from a typical Content Here client. First of all, I had no budget. Second, the president of the company (i.e. me) wanted the technology to be fun to program in. Third, I didn’t want to choose a platform that I would recommend to my typical clients because I do not want to appear biased. Incidentally, the last point is a main reason why I have held off implementation of a content management solution for so long.

My first choice was the Django web application framework. I had done some prototyping on the platform and was really impressed with the cleanness of the architecture and how quickly you could build applications. It is a little like Ruby on Rails but in Python. Furthermore, Django has a popular e-commerce application called Satchmo. I installed Satchmo and was able to understand the code and make some quick customizations on it. What really killed Django for me was the lack of a good blogging platform. There are a number of simple django blogging applications out there but nothing seemed to fit the bill. The closest was Banjo but it didn’t seem to be that well supported. There is actually a long standing discussion in the Django community about the framework’s lack of mature blogging applications.

The next two finalists were Drupal and WordPress. I have built sites on Drupal and like the framework a lot. However, the commerce module always seems to be far behind the current release of the core. I also think that Drupal is a little bit more than I need for my simple site (a blog with a shopping cart).

My decision to go with WordPress started as a simple prototype. I was amazed at how quickly I could create a theme to match my old design. The commerce module WP e-Commerce looked pretty solid and I was able to quickly get it working with PayPal as my payment gateway. I also found some useful plugins to provide me the features I was missing in Blogger (like related posts, etc.). The thing that sealed the deal for me was the ease with which WordPress imported all my blogger posts and comments. I was even able to make the permalinks match the same structure as Blogger’s for easier URL re-mapping (just a simple rewrite rule). Wordpress surely has its warts (there are plenty of places where the code gets pretty sketchy) but for a simple, reliable blogging platform with e-commerce capabilities, I am quite pleased.

DjangoCon 2008

Thursday, July 17th, 2008

The Django community recently announced the first official Django conference. DjangoCon 2008 will be held at Google’s Mountain View headquarters on September 6th and 7th to coincide with release 1.0 of the platform. Admission is free (as in beer) but they are capping the attendance at 200.

If you are new to Django, Django is am open source web application development framework written in the Python programming Language. Despite its sub-1.0 status, Django is quite mature. It was first developed by the folks over at Lawrence Journal-World for sites like ljworld.com, lawrence.com, and KUSports.com. Later, Rob Curley assembled a team over at the Washington Post to build a bunch of local sites. Now Django is bundled and actively used in Google App Engine. There area also a number of books on Django. I am currently reading the Definitive Guide to Django by Adrian Holovaty and Jacob Kaplan-Moss. So far, so good. I see a lot in common with Rails and the two definitely seem to get along at least at a philosophical level.

I will be covering Django in an upcoming report about web content management in media and publishing because of Django’s widespread use in that industry. There is a small commercial CMS called Ellington that is specifically designed for the newsroom. Do you have any experience with Django or Ellington? I would love to talk to you about it.