Archive for the ‘development’ Category

Deane Barker: Editors Live in the Holes

Thursday, August 5th, 2010

A few days ago I read Deane Barker’s excellent post Editors Live in the Holes (go ahead and read the post and then come back) and I have been thinking about it ever since. I have had the same experience several times and it is a good reminder for developers to pay special attention to configuring and testing the rich text editor. As Deane points out, it is too easy for developers to disregard “the holes” as a contributor problem, not a system problem.

To get it right, the holes need to be jointly owned by the designers, developers, and content contributors. Designers need to design for flexibility. Developers need to do everything they can to make contributors successful. But this raises something of a chicken and egg problem — at least for new CMS implementations (as opposed to migrations). In these projects, content entry typically occurs after the system is considered complete. This means that the designer and developer need to anticipate what rich text capabilities (formatting controls and the styles that control the display of rich text) the contributors will need. This is particularly important in the ever-present “generic page” content type that is typically used for the many one-off (odd ball) pages that exist in any website.

I have found two good techniques to get around this problem. First, it is good to test the rich text editor with a few of the more challenging one-off pages on the site. Take a page with embedded images and objects (like perhaps a Google map) and formatting and try to reproduce it in the rich text editor. Don’t disable the rich text editor and edit the source. That is cheating. If it turns out you can’t do it without pulling your hair out, you need to come up with a work around. If it is a really important page, you might need to develop a special content type and/or presentation template that does some of the work. If you find that there are too many challenging one-off pages to choose from, you might step back and consider enforcing more uniformity between pages. Otherwise, you will probably not be getting all of the value (content reuse and manageability) out of a CMS.

The second technique is to build a “style guide” page and place it in some discrete area on the site. The style guide page is a generic page that contains examples of all the stylings that are available to the contributor. For example, every heading level, paragraphs, lists (ordered and unordered), tables, embedded images, etc. The content contributor can visit this page to get an idea of what is possible and then open it in edit mode to see how the formatting was executed. The process of building and reviewing the style guide page is a useful forum to get designers, developers, and contributors together to collaborate and align. The fact that it is so tangible grounds everyone in the real capabilities of the platform. The style guide page is also the first place to check updates or enhancements to styles after launch.

At the end of the day, designers, developers, and contributors all want the site to be a success. They can’t just claim victory on their little piece (“the mockups were approved,” “we got out of QA,” or “I got my page to preview!”). Editors may live in the holes but everyone has to keep the holes clean.

Work Breakdown Structure vs. Deadlines

Wednesday, July 21st, 2010

One of the most common points of friction between project managers and developers is planning work. Most programmers hate creating work breakdown structures (WBS). You can’t blame them, accurately predicting steps and effort required to build undesigned software is impossible. Yes, you heard that right. Software development planning is impossible — at least for someone who likes precision, which most programmers do.

The problem is that every software development project is a unique collection of thousands of tiny details that each have the potential to suck up enormous amounts of time. The traditional, PMI-sanctioned WBS technique forces developers to name all the activities that will be required, sequence them with dependancies, and then create an estimate of each one. The assumption is that if you did the planning right, you should just be able to follow the steps and come out the other end on time and on budget. This also implies that if you didn’t blindly follow the steps, the project plan was wrong — or you were too incompetent to follow the steps correctly. But with the fluid nature of software development, the project plan is always wrong. I used to think that precision would increase with finer granularity. The more lines in the project plan, the more accurate it would be. But now I think the opposite is true. The more tasks you add, the more guesses you make and the greater the overall variance. Even if you guessed every task right, there were probably just as many tasks that you forgot to add. And there are also lots of steps that you find you didn’t need to do too.

While predicting a WBS is impossible, developers can get better at setting and meeting deadlines. There is a small nuance between setting a deadline and estimating tasks in a WBS. On the outside, the difference is so small that no one will notice. Nobody will care because they just want to know when the work will get done. But there is a difference. The WBS technique forces a linear accounting of all the work that needs to be done. Creating a deadline is more like adding a constraint (that you hope is reasonable) to help guide and prioritize the work that you wind up doing. Comparing the two is like comparing launching a rocket to flying a plane. PMI-style planning is like shooting a rocket: doing all the calculation at the beginning and then hoping that you accounted for everything before ignition. Setting a deadline turns the rocket into an airplane by adding a pilot that can steer. Realizing you can make adjustments after take-off transforms the pre-flight calculations from a fixed flight path to a map that you can use to make in-flight decisions. A deadline (either the final deadline or an intermediate milestone) is where you think you can be at a certain point of time (or after a certain amount of effort). When creating a deadline for yourself, you don’t try to think of every possible task it will take. It is more like eyeballing distances than counting steps.

I became conscious of this distinction the other day when I was on a bike ride. I take pride in the fact that I usually get home within a few minutes of the time I tell my wife I will be back. Lots of times I pull in right at the minute. Putting on my planner hat, if I was asked how long a bike ride would take, I would want to know the exact route and measure the distance and slope and windspeed and make assumptions about average speed. When I put on my cycling helmet, I realize that most of those variables are under my control. I can shorten the route. I can ride faster. I can take an alternate road to stay out of a headwind. Because I know my cycling ability and the terrain so well, I make these adjustments without even thinking about it.

I know you are thinking that software development is not like riding a bike. There are all these externally imposed requirements, constraints, and dependencies that need to be accounted for. But think back and ask yourself: how many of these factors are added specifically for the purpose of creating the WBS? I feel like developers work against themselves by asking for more and more estimation inputs and being more prescriptive of how they will work. There is no way that every detail can be accounted for and every detail that you do add will constrain your ability to make adjustments.

For estimation purposes, requirements should represent boundaries of an acceptable solution. With this understanding, a developer needs to produce a reasonable deadline based on similar work and explain any assumptions made. An overall deadline or intermediate milestone shouldn’t be overly ambitious. It should account for unknowns. If a deadline is not acceptable, scale back the scope until an acceptable deadline can be achieved. Through the course of the project, new information is going to present itself: the client is more particular than he was able to articulate; the available components are not as good as expected; new features are added to the scope. When any of these things happen, you make adjustments. You might be able to work a little more efficiently. You might be able to scale down scope in other areas. You might be able to delegate work back to the client. Or, you might just have to extend the deadline.

These adjustments require a decent partnership between the developer and the client where the deadline is jointly owned. It doesn’t work when one party feels like the other is obligated to deliver no matter what. In the bicycle analogy, when two people go for a ride, they decide where they want to go. Usually the conversation plays out where one rider asks the other what sort of ride he is up for. The second rider may say he needs to get back in 2 hours and wants to get in some climbing. The first rider will suggest a route that he is familiar with. When they encounter construction that makes a road impassable, they may be able to find an alternative route that is just as good; they can hammer home over a longer route in a paceline; or they can call home to say that they are going to be late. Whether the first rider should have known about the construction is debatable (Did the construction just start? Was the overall distance to ambitious? Did the route not allow for adjustments?) but debating is not going to get anyone home sooner.

With experience, you do get better at making more realistic deadlines. And, more importantly, you also get better with time management. You will build an awareness of where you are in the overall process and know early if you are falling behind schedule. In the cycling analogy, you periodically glance at the clock, your current speed, the slope of the road, and which way the wind is blowing. In software development, you are looking at things like the calendar, the productivity, and the rate of defect identification. With this information rolling around in your subconscious, you start thinking about options instinctively. The client perception is that you planned well. But you really didn’t. You managed time well. The up front estimate was just one of the many constraints that you juggled when developing the solution.

HTML production for CMS implementations

Monday, June 28th, 2010

Most new site CMS implementations (as opposed to site migrations from one CMS to another) start off with a set of HTML mockups. This can be a convenient starting place because, in addition to showing how the pages should look and informing the content model, having the HTML gives a good head start to presentation template development. Ideally the template developer just has to replace the sample “Lorem Ipsum” text with a tagging syntax that retrieves real content from the repository. There are even some graphical tools that help a developer map regions on the mockup with content from the repository. However, often moving from HTML mockups to presentation templates isn’t so smooth. Sometimes the HTML has to be re-written from the ground up.

The most common source of problems is when the HTML is too specific. This usually occurs when the designer/developer who produces the mockups is accustomed building static HTML websites where she has full control over everything. HTML and CSS for an CMS implementation has to account for the fact that control is shared between the template and the content contributor. While the template controls the overall layout, the control contributor controls the navigation, text, images, and (with the help of a rich text editor) can even style body content. HTML code that is rigid and brittle breaks when stretched by unanticipated content. Here are some things to look out for.

  • Hard coded height and width dimensions on image tags. Most content contributors don’t know the first thing about aspect ratios. They upload a picture and don’t understand why it is squished on the page. While most CMS these can automatically scale images (and even if they can’t the browser will), they can’t all reshape them. While some CMS support cropping functionality for thumbnails, few content contributors know how to use it to precisely shape an image. I usually recommend setting only one dimension (usually width) and then letting the other dimension (usually the height) do what it needs to do. If you really need to control both, you can use this little background image trick:

       <div class="picture" style="background: url(<<horizontally scaled image path>>) no-repeat; height:150px;"></div>
    

    This uses the image CMS’s image scaling to set the width and vertically crops the image after 150 pixels by making it a background image.

  • Overusing element ids. When you are only building a few pages and you want very direct control over elements, there is a temptation to code CSS to reference specific element ids rather than classes. In some cases, this makes sense. For example, when there is only one global left navigation component. However, it makes less sense for anything that a content contributor might have control over — like items in that navigational menu or anything else that repeats. I haven’t used DreamWeaver (DreadWeaver, as I like to say) in years but I suspect that the HTML/CSS auto-generation generation prefers using IDs over classes because that is where I see it the most. The worst case I have seen was a sample search result page with every search result individually styled with element ids.
  • Over-complicated HTML. HTML is only going to get more complicated when it is infused with template syntax. It is best to start with HTML code that is as simple and terse as it can be. If a designer is still using nested tables to position things, have him work in photoshop. The more styling you can do in CSS the better. This will make templates cleaner, more efficient, and easier to manage. Plus, your CSS will survive a migration to another CMS better than your template code will.
  • Using images rather than text headings. While the font control afforded by images is nice, avoid using images for anything dealing with the navigation or page names. Otherwise content contributors will not be able to create new pages or re-organize the navigation without a designer to produce images. If you have a top level navigation that is unlikely to change, you can compromise by building images just for the top level page names. A decent strategy is to code the HTML like
        <h1 class="section-heading <<dynamic section name in lowercase >>"><<sectionname>></h1>
    

    for example:

        <h1 class="section-heading about">About</h1>
    

    This way, if a content contributor introduces a new section that doesn’t have an image or style yet, there is a decent fallback of styled text.

  • Too many layouts. Most web content management systems prefer you to have an overall page layout template (also known as a master page) that is used for nearly all of the pages of the site and then content-type-specific templates that render in the “content area” in the center of the page. Things like the header, footer and global navigation components go in the page layout template. In many systems these two templates are not very much aware of each other because they are rendered at different times within the page generation process. The trick is to determine what portions of the page to put in the global template and what to put in the content-type specific templates. The more you put in the content-specific templates, the more flexibility you have but you also wind up having redundant code that adds management overhead. You also want to make sure that the design does not specify too many options for content presentation templates. In addition to adding to maintenance overhead, this also confuses the user. When lots of variability is required, it is a good technique to design the implementation to allow contributors to build pages with blocks of content. This way, the presentation template just has to define “slots” that contributors can fill (or not fill) with content.

Most of these tips will come more naturally to an advanced HTML that really knows his stuff than a pure designer with design tools that can create HTML. However, even the best HTML developers can have mental lapses when they get into a production groove. It is a good idea to understand the HTML producer’s skill-set before assigning the task of HTML production and set expectations. Otherwise, you will probably get a rude awakening when template development is scheduled to start. If this type of HTML production is new to your team and you would like them to learn it, account for this learning by holding frequent reviews of the HTML code as it being produced. Start with the most simple content type (like a generic page) so you can focus on the global page layout and get alignment on static vs. variable components. Over time, your team will instinctively notice HTML code that works for the mockup but will be problematic in a presentation template.

Jeff Cram: Your website is not a project

Wednesday, June 9th, 2010

Jeff Cram started blog series called post launch paradigm with a great post called “Your website is not a project.” The article lists all the ways companies fail when they think of a website as a project to be completed.

If a website is not a project, what is it? Jeff calls it an “ongoing process.” I call it a “product.” Website product management is becoming an increasingly important service offering for Content Here and it is a natural extension of the selection work that I have been doing over the first three years of the company. During a selection engagement I create a road map of functionality to be implemented over time and set expectations for user adoption and incremental improvement. Recently selection clients have been engaging Content Here after implementation to help them progress along that road map. This feels great on a number of levels: these clients realize that their websites are not projects, they have bought in to the concept of continuous improvement, and I get to see the clients working with the products that they have selected. I even get to go through code once in a while!

Tips for Web Product Management

Tuesday, May 11th, 2010

I am currently providing web product management services for two clients. One client is a start-up launching a new web-based product. The other is a 100 year old newspaper. While at face value these two clients couldn’t appear to be more different, they are actually quite similar. Both are trying to innovate a viable product. The startup is building a new concept. The newspaper is a trying to re-imagine an old concept. In both cases the development backlog is a chaotic mess of items that range from little tweaks to major features. There is impatience for progress; but that urgency needs to be balanced with the need to build something that is scalable and sustainable if the business succeeds. The truth is most websites operate under these conditions to some degree. It is just the ambition of these two businesses raises the stakes and the stress level.

To be successful in these projects, I have had to draw on lots of different skills and experiences. Many of the concepts and techniques come from agile methodologies like Scrum and Lean software development. What follows is a list of principles and practices that I have found to be effective.

  • Establish a regular (2-3 week) release cycle. Everyone benefits from a regular release cycle. Stakeholders get the satisfaction of seeing progress. They don’t panic if one of their requests doesn’t get into the current release if there is a chance that it will be addressed in a subsequent release. The sooner a new feature hits the production site, the sooner it can be measured and improved. Shorter development cycles also mean smaller releases that are easier to test. Site visitors perceive a constantly improving site as being vibrant.

  • Define and communicate prioritization criteria. In order to keep releases small, you need a clear and open scoping process. Enhancement requests need to be evaluated against the site goals (such as creating new revenue opportunities, cutting costs, maintaining credibility, etc.). Without this kind of guidance, development gets chaotic. Developer time is not concentrated on work that matters. The pipeline tends to get clogged with small tweaks; larger, more substantial improvements never get done.

  • Make each release a blend of stakeholder-focused improvements and code maintenance. When code is not regularly optimized and refactored, entropy takes over and it becomes less maintainable. Development teams that are exclusively driven by stakeholder requests don’t have time to keep the codebase clean. A broken window effect causes messy code to beget messy code. For this reason every release milestone should contain a balance of improvements that stakeholders see (new functionality, presentation template changes, etc.) and maintenance tasks (refactoring code, improving management scripts and infrastructure, etc.). By maintaining this discipline, the quality of the application improves (rather than degrades) over time.

  • Don’t forget the HotFix queue. Even though you might have a methodical development plan, emergencies happen. In addition to regularly spaced released milestones, I typically create a “HotFix” milestone with a rolling due date of “yesterday.” Emergency requests go into the HotFix queue and get addressed and deployed immediately. Of course, only I can put things into the HotFix queue and I base that decision on very specific criteria: current functionality is compromised, inaction is costing money (or some other measure of value like reputation), and it is a quick fix.

  • Write good tickets. Every change request gets entered in a ticket tracking system. Bug requests should be extremely descriptive: URLs, screenshots, steps to reproduce. Feature requests take the form of a full specification complete with annotated wireframes or mockups. Every new element shown needs an annotation describing the source of information and behavior. It is also a good idea to put in test conditions so that the QA staff know how to verify it is working.

  • Use your source code control system effectively. Create tags to remember milestones in the development history. Use branches only when you are simultaneously working on two versions of the application. The most likely reasons for branching are:

    • Having a production branch for hotfixes while development for the next release is done on trunk.
    • Using an experimentation branch for functionality that may or may not make it into the main code line.

    Don’t use branches for personal work areas or to manage environment-specific configurations. Merging will be a pain and it will delay any integration testing you will need to do.

  • Automate deployments. Deployments should be simple and mindless. There should be one step to push the same exact code that was tested on the QA environment to the production environments. If someone needs to manually copy individual files, you are doing it wrong. At a previous client (a very large magazine publisher), we used AnthillPro for continuous integration and deployments. Each build of the application was stored in an build artifact library where it could be deployed to different environments with a push of a button. There were cool reports that showed you want build number was deployed where. But that was for managing 50+ applications across hundreds of servers. Now I am using lighter weight tools like Fabric to script builds and deployments.

  • Build a talented and committed team. I strongly believe that there is no room for mediocrity on an agile development team. Working in this way requires a lot of trust. Stakeholders need to trust that developers are working efficiently and doing necessary things. Developers need to rely on each other to communicate and make good decisions. You don’t get that trust unless developers know the technology and are passionate about their craft.

If the website or web application that you manage is your product (or is critical to deliver your product), you need to manage it with this level of discipline and rigor. Otherwise the site will stagnate and you will be unprepared to respond to new market challenges and opportunities.

Supporting Internet Explorer 6

Wednesday, April 14th, 2010

IE6 not supported on Microsoft.com

Over the past few days, I have been involved in a number of conversations about supporting Internet Explorer 6. Arguing about when to drop support for outdated browsers is a sport that is as old as the web itself. There is nothing really new here but the IE6 support debate feels particularly emotional — not as charged as back when people were arguing for only supporting Internet Explorer, but close.

IE6 had a really long run. It was Microsoft’s browser offering for 5 years (late 2001 through late 2006). Up to that point, Microsoft was releasing a major version of IE every year. Now it looks like they are settling into a pace of every other year. That means that IE 6 was installed on a lot of computers. In particular, a lot of computers that were bought when internet usage was starting to get really ubiquitous. In many businesses and households, these computers were bought as an internet appliance with a really long expected lifespan — like a refrigerator or a telephone. Companies are hanging onto their old IE6 computers. Vista’s flop means that Windows XP is still the corporate standard and IE6 comes with XP. Unless you have a technical or information-intensive job or are working at a new company, chances are you are on a highly locked down, old Windows XP computer that your employer begrudgingly bought to give you access to email and the intranet. Your employer doesn’t want to upgrade your machine unless absolutely necessary. That usage pattern has caused IE6 to linger longer than other browsers. See how IE8 seems to eat up more of IE7’s market share than IE6’s?

Internet Explorer Browser Share

Not only do the numbers of IE6 user continue to be significant, the types of users seem to be desirable as well: internet n00bs that click on ads and buy what they see (with the money that was not taken by Nigerian 419 scams).

Technical people have little empathy for these types of users. The first thing we do when we boot up a relative’s computer for home tech support is stop the malware/adware processes, install Firefox, and hide the IE icon. As developers, we know that a requirement for IE6 support translates into maintaining two code bases: one that uses all the goodness of the latest HTML and CSS standards and fast Javascript engines; and another that is a bundle of hacks to compensate for IE6’s quirks. Many web development firms I know are starting to charge an additional 20% – 30% to include IE6 support. They are not price gauging. This is probably less than the actual cost. The customer will probably invest an even larger percentage of additional resources to maintain the application.

For this reason, an increasingly larger number of websites are discontinuing support for IE6. They have done the calculations and have decided that the convenience for the IE6 hold-outs is not worth additional cost and drag on innovation. I don’t mean to sound like a jerk, but big web properties (like Google, Microsoft, and Content Here) dropping IE6 is a good thing for everyone (almost):

  • Visitors will have a greater incentive to upgrade. If they can’t upgrade on their own, they can make the case to their employers that running a 9 year old browser is not acceptable.
  • The more modern technology will increase overall security
  • Web sites and applications can be developed more cheaply and with higher quality.
  • The spending to upgrade outdated equipment will be good for the economy. Companies and households don’t have to buy $2,000 laptops, they can probably get away with cheap NetBooks.

This site never supported IE6. If you are stuck on that browser, I am sorry for the inconvenience that I have caused. But, I figure you are used to browsing broken websites by now :)

Django Action Item Follow Up

Monday, April 5th, 2010

While moderating a comment on my “10 Django Master Class action items” post, I was inspired to evaluate how I am doing on these action items and whether they are helping. Below is a brief summary of my progress; but first a little background. Recently, I had the rare opportunity to rebuild (from the ground up) an application that I wrote for a client. The context was that the first version of the application was a prototype that I built to help demonstrate an idea to potential investors and customers. The prototype served its purpose excellently. It was able to evolve alongside the idea as my client got feedback and refined the value proposition. We came out of the prototyping phase with a strong vision and an excited group of investors and beta customers. To minimize costs I avoided refactoring the application and cut a lot of corners. By the end of the prototype phase, the idea had changed so much that we were really faking functionality by overloading different features. Still, for a ridiculously small investment, my client was able to develop and market test an idea. And now I get to build the application for real and apply the best practices that I learned about in the Django master class. Here is what I am doing and how it is working out.

  1. Use South for database migrations (adopted). I have grown so attached to South that I find it hard to imagine life without it. This is especially important because I am managing different environments and the object model is changing as I add new features.
  2. Use PostgreSQL rather than MySQL (adopted). I am steadily getting more comfortable with PosgreSQL. pgAdmin has been really helpful as I get up to speed with the syntactical differences from MySQL. So far, the biggest differences have been in user management and permissions.
  3. Use VirtualEnv (adopted). VirtualEnv + VirtualEnv Wrapper has been great. For a little while I was working on both the prototype and the actual application. VirtualEnv made it easy for me to switch back and forth. This will also be helpful when I upgrade to Django 1.2.
  4. Use PIP (adopted). I really like how you can do a “pip freeze” to create a requirements file that you can use to build up an environment.
  5. Break up functionality into lots of small re-usable applications (adopted). The prototype had one app. The production app that I am building has 6. One of the apps contains all the branding for the application and some tag libraries. Templates in other apps load a base template from my “skin” app. The best part of using this strategy is in testing and database migrations because you can test and migrate a project one app at a time. The hardest thing for me to figure out is how to manage inter-dependencies and coupling. One strategy that has worked well for me is to focus dependencies on just a couple of applications. For example, I have profile application which manages user profiles (extended from the base django.contrib.auth.User model.). I have other apps that relate to people but I am careful to create foreign key relationships to the User model rather than my profile model.
  6. Use Fabric for deployments (adopted). One word. AWESOME! I have scripts to set up a server and deploy my project without having to ssh onto the server. The scripts were not that hard to write. I took inspiration from some great posts (here and here). Now I can reliably push code (and media) with one local command. I am managing the development of another site running a PHP CMS and I am strongly considering having the team use Fabric for that as well.
  7. Use Django Fixtures (adopted). Managing fixtures in JSON has turned out to be really easy. I typically have two fixtures for each app: initial_data.json and <app_name>_test_data.json. initial_data.json mainly contains data for lookup tables. It is run automatically when syncdb (the Django command to update the database schema) is run. I typically create these files with the dumpdata command and then edit them manually.
  8. Look into the Python Fixture module (not adopted). I looked into this module but, to be honest, editing the JSON files is pretty easy so I don’t see the need for it.
  9. Use django.test.TestCase more for unit testing (adopted). I have been doing a considerable amount of test driven development (TDD). It all started when I wanted to rewrite the core functionality but I needed to wait for someone else to re-build the HTML in the presentation templates. Now I have around 130 unit tests that I run before I commit any code. Focusing on unit testing has made me write code that is more atomic and easier to test. Now I think “how will I test this?” before I write any code.
  10. Use the highest version of Python that you can get away with (adopted). A big motivator for me here was when I upgraded my workstation to Snow Leopard which ships with Python 2.6.3. Getting 2.6.3 on my server was a little more complicated. I wound up using a host that comes with Ubuntu Karmic Koala which also comes with 2.6.3. I am really pleased with Ubuntu and it seems like most of the Django community is going that way.

I feel really lucky for the opportunity to rewrite an application and apply lessons learned. Too often you are stuck managing code that you (or someone else) wrote before you knew what you were doing. That is, before the functionality of the application was fully understood; before a feature of the API was available or known; before a more elegant solution was discovered. I am sure that I will continue to learn new things and want to apply them and I plan to continually refactor as long as I am involved with this project. But this full-reset has been a great experience.

The Onion’s Migration from Drupal to Django

Thursday, March 25th, 2010

There is a great Reddit thread on The Onion’s migration from Drupal to Django. The Onion was one of the companies that I interviewed for the Drupal for Publishers report. One of the things I mention in the report is that The Onion was running on an early version (4.7) of Drupal. The Onion was one of the first high traffic sites to adopt Drupal and the team had to hack the Drupal core to achieve the scalability that they needed. While versions 5 and 6 of Drupal made substantial performance improvements, The Onion’s version was too far forked to cleanly upgrade.

Still, The Onion benefited greatly from using Drupal. They were able to minimize up-front costs by leveraging Drupal’s native functionality and adapt the solution as their needs changed. Scalability was a challenge but it was a manageable one. Even though forking the code base was not ideal, it was a better alternative than running into a brick wall and having to migrate under duress. The Drupal community also benefited from the exposure and learning that came from The Onion using Drupal. Everybody won &mdash how often can you say that?

I can understand the choice of Django 1.1 (current) over a hacked version of Drupal 4.7. Having built sites in both Drupal and Django, I can also see the appeal of using a Django over Drupal 6.16 (current). Django is a more programming-oriented framework and The Onion has programmers. Django is designed to be as straightforward and “Pythonic” as possible. Drupal tries to make it possible to get things done without writing any code at all; and if you can avoid writing code in Drupal, you should. As a programming framework, Drupal has more indirection and asserts more control over the developer. The Onion’s staff of programmers clearly appreciate the programmatic control that Django affords and they are quite happy with their decision.

My Enterprise Text Editor

Tuesday, March 16th, 2010

The Productive Programmer (Theory in Practice (O’Reilly)) is a useful book of how to use your computer more efficiently. One of the several tips that I have adopted is to use one text editor (in my case, TextMate) for all text oriented work. The idea behind this is that when you work in one tool, you get to know it really well and can take advantage of all its nifty time saving features. Most software users, however, only use a tiny fraction of the useful features supported by the software.

I had gradually been moving in this direction for a while. At first, I just used a text editor to program in dynamic languages (Javascript, Python, PHP, Perl, SQL), do HTML/XML markup, and edit large data files. About 6 months ago, I got so fed up with Eclipse’s clunkiness that I started to write Java in TextMate. Since reading the chapter in the book, though, I have started to use TextMate as a blogging tool. This was a big step for me because I was quite happy with Red Sweater’s MarsEdit software. Yes I know that MarsEdit gives you the option to edit posts in TextMate but I decided to go all in. I have not yet been able to get TextMate hooked up as my email editor. I always thought programmers that did everything in EMACS were silly. But since making the change, I have found a lot of powerful keyboard shortcuts and macros. My one hold-out is that I still use Oxygen for editing my DocBook documents.

My successful experience caused me to question whether there was any merit to the “One CMS to rule them all” ECM (Enterprise Content Management) vision that I have been battling over the past 7 years (a battle that I won, by the way, but I am not gloating). Would there be any benefit of having a knowledge worker getting to be a true expert in one tool? Then I came to senses and realized two key differences:

  • Web Content Management is about managing semi-structured data, Enterprise Content Management is about managing metadata. A WCMS primarily helps a user edit and and assemble reusable, structured content. In a document-oriented ECM system, most of the documents are binary files that are edited using tools like MS Word. These ECM systems are used primarily for creating metadata, organizing, and managing permissions. Furthermore, most people organize their documents on a file system metaphor. Web content organization tends to be much more fluid and rule based. Your website is not a file system. You will fail at web content management if you think that a website is a bunch of MS Word documents saved as HTML. Because there is so little functional overlap, one tool doesn’t make sense.
  • CMS users don’t define them selves as CMS users. Programmers, at least the good ones, care about their craft and take pride in how they work. They read books and blogs to continually hone their skills. They love their tools and treasure knowledge of obscure little tricks. Good designers tend to be the same way. Your average content contributor may be similarly inspired about their profession, but if they are, they don’t usually consider using a computer program as part of that quest. For them the computer software is a necessary evil. They are looking for intuitive tools that require no learning. They tend not to invest the time to achieve expertise. If I were to equate using a computer to driving a car, the average computer user drives around in 1st gear or reverse all day long. They discover a way to get the car to move and then leave it at that.

There is a direct relationship between specialization and intuitiveness of software. When the software designer knows exactly what the user will use the software for, he can be very explicit in the user interface. For example, when creating a blogging tool, the software designer can put in a big button that says “CREATE BLOG ENTRY.” A designer of a more generalized, multi-functional tool requires more compromise and negotiation with the user. The user needs to learn how to access lots of basic capabilities and string them together to get the result that he wants. Just look at the UNIX command line and piping together commands. TextMate is a little of both, the designer of TextMate knows that the user is going to want to enter text and save files. That is why the program opens with a big area to type in. But the designer doesn’t know whether the user will be wanting to post this block of text to a blog or compile the text into executable software or hundreds of other options. This is why those functions need to be buried under cryptic key sequences like “control-command-p” (that’s post to blog) or “command-r” (that is compile and run). If a CMS was written for someone that wanted to be a CMS expert, it would probably look something like a command line Sabre terminal. And this is why all purpose tools fail for content managers.

NoSQL Deja Vu

Tuesday, February 23rd, 2010

Around thirteen years ago, I helped build a prototype for a custom CRM system that ran on an object database (ObjectStore). The idea isn’t quite as crazy as it sounds. The data was extremely hierarchical with parent companies and subsidiaries and divisions and then people assigned to the individual divisions. It was the kind of data model where nearly every query had several recursive joins and there were concerns about performance. Also, the team was really curious about object databases so it was a pretty cool project.

One thing that I learned during that project is that (at least back then) the object database market was doomed. The problem was that when you said “database,” people heard “tables of information.” When you said “data” people wanted to bring the database administrator (DBA) into the discussion. An object database, which has no tables and was alien to most DBAs, broke those two key assumptions and created an atmosphere of fear, uncertainty and doubt. The DBA, who built a career on SQL, didn’t want to be responsible for something unfamiliar. The ObjectStore sales guy told me that he was only successful when the internal object database champion positioned the product as a “permanent object cache” rather than a database. By hiding the word “data,” projects were able to fly under the DBA radar.

Fast forward to the present and it feels like the same conflict is happening over NoSQL databases. All the same dynamics seem to be here. Programmers love the idea of breaking out of old-fashioned tables for their non-tabular data. Programmers also like the idea of data that is as distributed as their applications are. Many DBAs are fearful of the technology. Will this marginalize their skills? Will they be on the hook when the thing blows up?

I don’t know if NoSQL databases will suffer the same fate as object databases did back in the 90’s but the landscape seems to have shifted since then. The biggest change is that DBAs are less powerful than they used to be. It used to be that if you were working on any application that was even remotely related to data, you had to have at least a slice of the DBA’s time allocated to your project. Now, unless the application/business is very data centric (like accounting, ERP, CRM, etc.), there may not even be a DBA in the picture. This trend is a result of two innovations. First, is object relational mapping (ORM) technology where schemas and queries are automatically generated based on the code that the programmer writes. With ORM, you work in an object model and the data model follows. This takes the data model out of the DBA’s hands. The second innovation is cheap databases. When databases were expensive, they were centrally managed and tightly controlled. To get access to a database, you needed to involve the database group. Now, with free databases, the database becomes just another component in the application. The database group doesn’t get involved.

Now that the database is a decision made by the programmer, I think non-relational databases have a better chance of adoption. Writing non-SQL queries to modify data is less daunting for a programmer who is accustomed to working in different programming languages. Still, the programmer needs good tools to browse and modify data because he doesn’t want to write code for everything. Successful NoSQL databases will have administration tools. The JCR has the JCR Explorer. CMIS has a cool Adobe Air-based explorer. Both of these cases are repository standards that sit above a (relational or non-relational) database but they were critical for adoption. CouchDB has an administration client called Futon but most of the other NoSQL databases just support an API. You also want to have the data accessible to reporting and business intelligence tools. I think that a proliferation of administration/inspection/reporting tools will be a good signal that NoSQL is taking off.

Another potential advantage is the trend toward distributed applications which breaks the model of having a centralized database service. Oracle spent so much marketing force building up their database as being the centralized information repository to rule the enterprise. In this world of distributed services talking through open APIs, that monolithic image looks primitive. What is more important is minimal latency, fault tolerance, and the ability to scale to very large data sets. A large centralized (and generalized) resource is at a disadvantage along all three of these dimensions. When you start talking about lots of independent databases, the homogeneity of data persistence becomes less of a concern. It’s not like you are going to be integrating these services with SQL. If you did, your integration would be very brittle because these agilely-developed services are in a constant state of evolution. You just need to have strong, stable APIs to push and pull data in the necessary formats.

The geeky programmer in me (that loved working on that CRM project) is rooting for NoSQL databases. The recovering DBA in me cringes at the thought of battling data corruption with inferior, unfamiliar tools. In a perfect world, there will be room for both technologies: relational databases for relational data that needs to be centrally managed as an enterprise asset; NoSQL databases for data that doesn’t naturally fit into a relational database schema or has volumes that would strain traditional database technology.