Archive for the ‘development’ Category

My Enterprise Text Editor

Tuesday, March 16th, 2010

The Productive Programmer (Theory in Practice (O’Reilly)) is a useful book of how to use your computer more efficiently. One of the several tips that I have adopted is to use one text editor (in my case, TextMate) for all text oriented work. The idea behind this is that when you work in one tool, you get to know it really well and can take advantage of all its nifty time saving features. Most software users, however, only use a tiny fraction of the useful features supported by the software.

I had gradually been moving in this direction for a while. At first, I just used a text editor to program in dynamic languages (Javascript, Python, PHP, Perl, SQL), do HTML/XML markup, and edit large data files. About 6 months ago, I got so fed up with Eclipse’s clunkiness that I started to write Java in TextMate. Since reading the chapter in the book, though, I have started to use TextMate as a blogging tool. This was a big step for me because I was quite happy with Red Sweater’s MarsEdit software. Yes I know that MarsEdit gives you the option to edit posts in TextMate but I decided to go all in. I have not yet been able to get TextMate hooked up as my email editor. I always thought programmers that did everything in EMACS were silly. But since making the change, I have found a lot of powerful keyboard shortcuts and macros. My one hold-out is that I still use Oxygen for editing my DocBook documents.

My successful experience caused me to question whether there was any merit to the “One CMS to rule them all” ECM (Enterprise Content Management) vision that I have been battling over the past 7 years (a battle that I won, by the way, but I am not gloating). Would there be any benefit of having a knowledge worker getting to be a true expert in one tool? Then I came to senses and realized two key differences:

  • Web Content Management is about managing semi-structured data, Enterprise Content Management is about managing metadata. A WCMS primarily helps a user edit and and assemble reusable, structured content. In a document-oriented ECM system, most of the documents are binary files that are edited using tools like MS Word. These ECM systems are used primarily for creating metadata, organizing, and managing permissions. Furthermore, most people organize their documents on a file system metaphor. Web content organization tends to be much more fluid and rule based. Your website is not a file system. You will fail at web content management if you think that a website is a bunch of MS Word documents saved as HTML. Because there is so little functional overlap, one tool doesn’t make sense.
  • CMS users don’t define them selves as CMS users. Programmers, at least the good ones, care about their craft and take pride in how they work. They read books and blogs to continually hone their skills. They love their tools and treasure knowledge of obscure little tricks. Good designers tend to be the same way. Your average content contributor may be similarly inspired about their profession, but if they are, they don’t usually consider using a computer program as part of that quest. For them the computer software is a necessary evil. They are looking for intuitive tools that require no learning. They tend not to invest the time to achieve expertise. If I were to equate using a computer to driving a car, the average computer user drives around in 1st gear or reverse all day long. They discover a way to get the car to move and then leave it at that.

There is a direct relationship between specialization and intuitiveness of software. When the software designer knows exactly what the user will use the software for, he can be very explicit in the user interface. For example, when creating a blogging tool, the software designer can put in a big button that says “CREATE BLOG ENTRY.” A designer of a more generalized, multi-functional tool requires more compromise and negotiation with the user. The user needs to learn how to access lots of basic capabilities and string them together to get the result that he wants. Just look at the UNIX command line and piping together commands. TextMate is a little of both, the designer of TextMate knows that the user is going to want to enter text and save files. That is why the program opens with a big area to type in. But the designer doesn’t know whether the user will be wanting to post this block of text to a blog or compile the text into executable software or hundreds of other options. This is why those functions need to be buried under cryptic key sequences like “control-command-p” (that’s post to blog) or “command-r” (that is compile and run). If a CMS was written for someone that wanted to be a CMS expert, it would probably look something like a command line Sabre terminal. And this is why all purpose tools fail for content managers.

NoSQL Deja Vu

Tuesday, February 23rd, 2010

Around thirteen years ago, I helped build a prototype for a custom CRM system that ran on an object database (ObjectStore). The idea isn’t quite as crazy as it sounds. The data was extremely hierarchical with parent companies and subsidiaries and divisions and then people assigned to the individual divisions. It was the kind of data model where nearly every query had several recursive joins and there were concerns about performance. Also, the team was really curious about object databases so it was a pretty cool project.

One thing that I learned during that project is that (at least back then) the object database market was doomed. The problem was that when you said “database,” people heard “tables of information.” When you said “data” people wanted to bring the database administrator (DBA) into the discussion. An object database, which has no tables and was alien to most DBAs, broke those two key assumptions and created an atmosphere of fear, uncertainty and doubt. The DBA, who built a career on SQL, didn’t want to be responsible for something unfamiliar. The ObjectStore sales guy told me that he was only successful when the internal object database champion positioned the product as a “permanent object cache” rather than a database. By hiding the word “data,” projects were able to fly under the DBA radar.

Fast forward to the present and it feels like the same conflict is happening over NoSQL databases. All the same dynamics seem to be here. Programmers love the idea of breaking out of old-fashioned tables for their non-tabular data. Programmers also like the idea of data that is as distributed as their applications are. Many DBAs are fearful of the technology. Will this marginalize their skills? Will they be on the hook when the thing blows up?

I don’t know if NoSQL databases will suffer the same fate as object databases did back in the 90’s but the landscape seems to have shifted since then. The biggest change is that DBAs are less powerful than they used to be. It used to be that if you were working on any application that was even remotely related to data, you had to have at least a slice of the DBA’s time allocated to your project. Now, unless the application/business is very data centric (like accounting, ERP, CRM, etc.), there may not even be a DBA in the picture. This trend is a result of two innovations. First, is object relational mapping (ORM) technology where schemas and queries are automatically generated based on the code that the programmer writes. With ORM, you work in an object model and the data model follows. This takes the data model out of the DBA’s hands. The second innovation is cheap databases. When databases were expensive, they were centrally managed and tightly controlled. To get access to a database, you needed to involve the database group. Now, with free databases, the database becomes just another component in the application. The database group doesn’t get involved.

Now that the database is a decision made by the programmer, I think non-relational databases have a better chance of adoption. Writing non-SQL queries to modify data is less daunting for a programmer who is accustomed to working in different programming languages. Still, the programmer needs good tools to browse and modify data because he doesn’t want to write code for everything. Successful NoSQL databases will have administration tools. The JCR has the JCR Explorer. CMIS has a cool Adobe Air-based explorer. Both of these cases are repository standards that sit above a (relational or non-relational) database but they were critical for adoption. CouchDB has an administration client called Futon but most of the other NoSQL databases just support an API. You also want to have the data accessible to reporting and business intelligence tools. I think that a proliferation of administration/inspection/reporting tools will be a good signal that NoSQL is taking off.

Another potential advantage is the trend toward distributed applications which breaks the model of having a centralized database service. Oracle spent so much marketing force building up their database as being the centralized information repository to rule the enterprise. In this world of distributed services talking through open APIs, that monolithic image looks primitive. What is more important is minimal latency, fault tolerance, and the ability to scale to very large data sets. A large centralized (and generalized) resource is at a disadvantage along all three of these dimensions. When you start talking about lots of independent databases, the homogeneity of data persistence becomes less of a concern. It’s not like you are going to be integrating these services with SQL. If you did, your integration would be very brittle because these agilely-developed services are in a constant state of evolution. You just need to have strong, stable APIs to push and pull data in the necessary formats.

The geeky programmer in me (that loved working on that CRM project) is rooting for NoSQL databases. The recovering DBA in me cringes at the thought of battling data corruption with inferior, unfamiliar tools. In a perfect world, there will be room for both technologies: relational databases for relational data that needs to be centrally managed as an enterprise asset; NoSQL databases for data that doesn’t naturally fit into a relational database schema or has volumes that would strain traditional database technology.

Developers and Designers

Monday, February 8th, 2010

A few months ago I read Lukas Mathis’ through provoking essay “Designers are not Programmers” where he makes the case for a separation between designers and developers. To summarize his argument, thinking about implementation details distracts the designer from the user and results in applications (and websites) that are easy to build but hard to use. He makes a very thorough case (you should definitely read the full essay) but something just doesn’t sit well with me. In my practical experience, I find that teams are more efficient when roles overlap and people understand what is happening outside of their silo. Here are some reasons why:

  • A designer is often faced with lots of options of how to solve a user problem. When it is a coin toss between two solutions, why not choose the one that is easier to implement and apply the time and effort saved to something that really needs the additional complexity?
  • The static tools that pure designers use (e.g. photoshop) have no way to express interactive functionality. All the details that the developer needs to know need to be captured in some sort of specification that can never be complete and is usually out of date. Making the developers wait until the specification is done is inefficient.
  • Good software cannot be achieved by brilliant designers alone. It takes iteration and feedback to get it right. A cold hand-off between the designers and developers lengthens the iteration cycle (so you get fewer of them in a fixed amount of time and budget) and creates more of an opportunity for information loss.

In an ideal world with infinite time and money (and omniscience too), it might be better to have designers whose minds are unencumbered by knowledge of implementation details. Anything that they dream of can be implemented… with enough time and resources, of course. But I don’t live in that world. In the world I live in, product managers and publishers have to make lots of compromises. They also need to be able to react efficiently to correct bad design decisions so that the product (or website) can continually improve. For that, you need an agile team that solve problems directly. this means staying out of a designer-only loop.

The most effective teams that I have worked on have all had a talented front end developer that can rapidly design in DHTML (leveraging javascript libraries and CSS) and knows enough server side scripting to make most user interface changes unassisted. With this mix of skills, it is truly amazing how quickly a small team can get a product in front of users where it can be improved by feedback.

CMS Architecture: Managing Presentation Templates

Monday, January 25th, 2010

Another geeky post…

In my last post, I described the relative merits of managing configuration in a repository vs. in the file system but excluded presentation templates even though how they are managed is just as interesting. Like configuration, presentation templates can be managed in the file system or in the content repository. Like with configuration, if you manage presentation templates in the repository, you need some way to deploy them from one instance of your site to another without moving the content over as well.

There are plenty of additional reasons why you would want to manage presentation templates on the file system. In particular, presentation templates are code and you want to be able to use proven coding tools and techniques to manage them. Good developers will be familiar with using a source code management system to synchronize their local work areas and branch/tag the source tree. Development tools (IDE’s and text editors) are designed to work on files in a local file system. If you manage presentation templates in the repository you have to solve all sorts of problems like branching and merging and building a browser-based IDE or integrating with local IDEs. The latter can be done through WebDAV and I have also seen customers use an Ant builder in Eclipse to push a file with every time it has changed. Still, the additional complexity can create frustrating issues when the deployment mechanism breaks.

As much as it complicates the architecture, there is one very good case when you would want to manage presentation templates in the repository: when you have a centralized CMS instance that supports multiple, independently developed sub-sites. For example, lets say you are a university and each school or department has its own web developer that wants to design and implement his own site design. This developer is competent and trustworthy but you don’t want to give him access to deploy his own code directly to the filesystem of the production server. He could accidentally break another site or, worse, bring down the whole server. You could centralize the testing and deployment of code, but that would just create a bottleneck. You could do something like put the CSS and JS in the repository and have him go all CSS Zen Garden, but sooner or later he will want to edit the HTML in the presentation templates.

In this scenario of distributed, delegated development, presentation templates are like content into two very important aspects:

  1. presentation templates need access control rules to determine who can edit what.
  2. presentations templates become user input (and user input should never be trusted).

The second point is really important. Just like you need to think twice when you allow a content contributor to embed potentially malicious javascript into pages, you need to worry that a delegated template developer can deploy potentially dangerous server side code. Once that code is on the filesystem of an environment it can create all sorts of mischief. It doesn’t matter if it was intentional or not, if a programmer codes an infinite loop or compromises security, you have a problem. Using templating languages (like Smarty or Velocity) rather than a full programming language (like PHP or Java in JSP) will mitigate that risk but you still have to worry about the developer uploading a script that can run on your server. With staging and workflow, CMSs are good at managing semi-trusted content like presentation templates from distributed independent developers. There is a clear boundary between the runtime of the site and the underlying environment.

If your CMS uses file-system based presentation templates and you delegate sub-site development to the departments who own them, you should definitely put in place some sort of automated deployment mechanism that keeps FTP and SSH access out of the developers hands and reduces the potential for manual error. The following guidelines are worth following:

  • Code should always be deployed out of a source code system (via a branch or a tag). That way you will know what was deployed and you can redeploy the same tested code to different environments.
  • Deployments should be scripted. The scripts can manage the logic of what should be put where.
  • Every development team should have an integration environment where they can test code.

One of my clients uses a product called AnthillPro for deployments of all web applications and also presentation templates. It has taken a while to standardize and migrate all of the development teams but now I don’t see how you can have a de-centralized development organization without it.

The other dimension to this problem is the coupling between the content model and the presentation templates. When you add an attribute to a content type, you need to update the presentation template to show it (or use it in some other way). The deployment of new presentation templates needs to be timed with content updates. Often content contributors will want to see the new attribute in preview when they are updating their content. Templates also need to fail gracefully when they request an attribute that does not yet exist or has not been populated yet. Typically, presentation templates evolve more rapidly than content models. After all, a change in a content model usually involves some manual content entry. In my scenario of the university, there is a benefit of centralizing the ownership of the content model. This allows content sharing across sites: if one department defines a news item differently than another department, it is difficult to have a combined news feed. Centralizing the content model will further slow its evolution because there needs to be alignment between the different departments.

Wow, two geeky posts in a row. I promise the next one will be less technical.

10 Django Master Class action items

Tuesday, November 3rd, 2009

A couple of weeks ago I attended Jacob Kaplan-Moss’s Django Master Class in Springfield, Virginia. It was a great class and I walked out with a bunch of ideas for making better use of Django. What follows is a set of action items that I created for myself. Jacob was not this prescriptive in his presentation. These are just my personal decisions based on how he explained things.

  1. Use South for database migrations (complete). Unlike Rails, Django has no native system for synchronizing the database schema with code changes. Django will create your initial database schema for you but you need to modify the tables with SQL whenever your models change. South gives Django Rails-like migrations which consists of methods to alter the database and also roll-back changes. I ported a new application I am working on over to use South and am very impressed. Jacob gave some great advice to keep your schema migrations from your data migrations. For example, if you are renaming a field: you would create one migration to add the field; a second migration to move the data to the new field; and a third migration to delete the old field. Doing this will make your migrations safer and easier to roll-back.
  2. Use PostgreSQL rather than MySQL (complete). Jacob didn’t talk disparagingly about MySQL but it was clear to me that PostgreSQL is what the cool kids are using. That is not to say there are not disagreements over what DB is best. I have been using MySQL for years but two things won me over. In the class, I learned that table alterations in MySQL are not transactional so if your South database migration fails, you can’t roll-back so easily. The second factor came after the class when I was reading all these blog posts panicking about what will come of MySQL now that Oracle owns it. I agree with most pundits that Oracle doesn’t have a great reason to invest in MySQL. My comfort level working with PostgreSQL is growing but its going to take a while to get as comfortable with the commands and syntax as I am with MySQL.
  3. Use VirtualEnv (complete). One thing about Python that always seemed hackey to me was the whole “site-packages” thing. I don’t like how all your Python projects tend to share the same libraries. In Java, you are much more deliberate with your CLASSPATH. The class introduced me to virtualenv and its sister project virtualenvwrapper. This creates a virtual sandbox where you can manage libraries separately from your main Python installation. It is brilliant.
  4. Use PIP (complete). I was pretty haphazard about what tools I used to install Python packages. I admit that I didn’t really know the difference between setuptools and easy_install. The Master Class nicely explained the different options and it seems like PIP is emerging as the Python package manager of choice.
  5. Break up functionality into lots of small re-usable applications (in process). Much of the advice from the class is summarized in James Bennett’s DjangoCon 2008 talk: Reusable Apps. Watch the video and be convinced.
  6. Use Fabric for deployments (not started). My normal m.o. for deploying code has been to shell over to a server and svn export from my Subversion server. In multiple server environments, I would usually have some kind of rsync setup. However, in my one of my client projects (using Java), I started using AntHill Pro (plus Ant) for both continuous integration and deployment. From that experience I saw light on the automated deployments. Fabric is primitive compared to AntHill Pro (it doesn’t have cool web-based UI) but it does allow you to run scripts remotely on other hosts. It’s like Capistrano for Python. In the next phase of development, I will definitely be using this.
  7. Use Django Fixtures (not started). I am really embarrassed to say that I have avoided using Fixtures for loading lookup and test data. Instead, I have been doing horrid things with SQL and objects.create(). I am looking forward to reforming my errant ways. Fixtures allow you to create a data file that Django will load for you. It offers three format options: JSON, YAML, or XML. Jacob recommends JSON if you can be assured that you have access to PyYAML, otherwise go with JSON which is nearly as readable.
  8. Look into the Python Fixture module (not started). This straight Python module seems to be an alternative to the Django fixtures system. It is more oriented towards test data and looks a little like using mock objects. I need to dig in a little more before I make up my mind about it.
  9. Use django.test.TestCase more for unit testing (not started). I need to do more with unit tests. I have had some good experiences with writing DocTests but I should use the Django unit test framework more. This will allow me to use fixtures more too! Plus with Django 1.1, startapp even creates a tests.py for you. How can I resist an empty .py file?
  10. Use the highest version of Python that you can get away with (in progress). In the class, Jacob made the good point that every version of Python gets feature and performance improvements. Why not go with the latest stable version like 2.6? Snow Leopard did it for me. I will try to upgrade my server as soon as I can get away with it.

If you can make it to the next Django Master Class, I highly recommend you go. Otherwise, you should look into these resources and make your own educated decisions about whether to use them.

Attending Django Master Class

Friday, October 2nd, 2009

I am looking forward to attending a Django Master Class, taught by Jacob Kaplan-Moss, on October 16th in Springfield, Virginia. I have been building a prototype application for a client in Django over the last couple of months and have been very impressed with the framework. It looks like the application will be going to Beta soon so it will be great to pick up some expert tricks that will take me to the next level.

Snow Leopard Issues

Thursday, September 10th, 2009

For some reason, I upgraded to Snow Leopard at my first opportunity. To tell the truth, the end-user improvements are pretty modest and it is difficult to perceive Snow Leopard’s slight speed improvements. That is not to say that the upgrade did not have impact. Some software that I regularly use stopped working. I am all up and working now and here is now.

Problem Solution
Oxygen 9.3 stopped working I upgraded to Oxygen 10.3 for $147
My macport MySQL, Apache, PHP setup broke I tried to do port upgrade of MySQL and PHP and it failed. Compiling my own PHP didn’t work either. I wound up installing the MySQL package from the MySQL site and using the Apache and PHP (5.3) that comes with Snow Leopard.
My eZ Publish development environment broke This was because I upgraded to PHP 5.3. This was easily fixed by re-installing eZ Publish because eZ Publish is compatible with PHP 5.3
My Drupal development environment broke Drupal is not compatible with PHP 5.3. Until it is, I am running it on MAMP.
Python MySQL did not easy_install for Python 2.6 This is the problem that I struggled most with. I followed the instructions here. The compile and build worked but I could not import the library. I finally solved the problem by re-installing the 32 bit version of MySQL, setting Python to run in 32 bit, and following these simple instructions.
1Password 2.x is incompatible with Snow Leopard I pre-purchased version 3 and enrolled in the 3.0 beta program. The UI looks really slick and I am happy that I upgraded.
Cornerstone is incompatible with Snow Leopard I am waiting for the next release of Cornerstone (due this month) that will be Snow Leopard compatible and also support Subversion 1.6.

Looking back, it would have made more sense to wait on the upgrade and see what other people on the mailing lists said about their experiences. But, since I figure I would share what I learned.

Ingredients for a good software development project

Wednesday, September 2nd, 2009

While nearly all of my consulting work is in software selection and technical strategy, I try to have at least one implementation project in the works so that I can stay relevant as a technologist. For my current project, I am building the technology infrastructure for a new web based startup. The project is going really well and I thought I would share some reasons why.

  • Great Client. My client is totally non technical and this is his first startup. He found me through a mutual friend that we both think very highly of. His lack of technology experience means he has not picked up any of the bad habits of technology owners. He didn’t force a detailed Gantt chart with a bunch of imaginary dates and tasks. His trust in me helped us to collaboratively prioritize and plan work around his schedule. I have followed through on all of my commitments so the trust has only gotten stronger.
  • Fluid Design. The concept has evolved substantially since its original inception. A critical part of the design process involves going through the application with prospective customers and then talking through how to incorporate their feedback. We have done a really good job of keeping things flexible without adding complexity. Some of the more speculative ideas are faked a bit until we get an indication of their viability. The underlying machinery is only reworked when we are really onto something. We have plans to do a holistic refactoring once the design is frozen and we are ready to start preparing the application for beta customers.
  • Strong Frameworks. Building the application on top of the Django web application framework makes development incredibly fast. Python has a great unit test system where the tests are written in the comments of the code. This makes them easy to maintain. All of the data administration interfaces are provided for free and, because everything is data driven, the client has a lot of control over what he shows potential investors.
  • Simple Tools. All of the work is planned using tickets and milestones in a service called Unfuddle (looks a lot like Trac). The information is very accessible and email notifications keep everyone up to date. I am probably going to bring another developer on board to help with the beta and I think that setting him or her up with a development environment is going to be really easy. Just pull everything down from Subversion and launch the text editor of choice.

I am not fooling myself into thinking all projects can be like this. You don’t always get to choose your clients or your tools. Often other constraints are forced upon you. But you can make some adjustments to achieve incremental improvements. For example, on another project for a very large, established client, I used to say “we are going over a waterfall in an agile barrel” meaning that the upper management saw a waterfall model project plan but internally, we were being as agile as possible. We tried to be vague rather than guess and commit. We made decisions that didn’t preclude other options. We made opportunities to refactor code. So, rather than throwing up your hands and accepting a recipe for failure, think of how you can get the ingredients for success.

Drupal Panels Tutorial

Tuesday, August 25th, 2009

Readers of the Drupal for Publishers know that Drupal lacks a native system for associating different page layouts for different sections or pages on a site. Sub-layouts are typically implemented with conditional logic in the template code (page.tpl.php). The Drupal Panels module puts this control in the administrative interface and is becoming widely used within the Drupal community. To learn more, check out GotDrupal’s excellent video tutorial on Drupal Panels. As you can see from the video, the Panels user interface is quite powerful but is also very complicated because you have to create these rules to determine under what conditions a layout should display. You still probably want a developer or well trained administrator to do the work on a staging environment and then migrate the configurations to the production environment.

What the Customer Really Needed

Friday, August 14th, 2009

This great cartoon showing how a product is understood and described by different stakeholders is definitely going in my toolbox for explaining project disfunction. My explanation will be that you can’t start to achieve a common understanding of something until get visual. If everyone got together and drew a picture of the solution and then regularly checked in as it was incrementally built, the potential for miscommunication would be nearly zero. Proofs of concept, prototypes, and pilots are useful for communicating more complex functionality — Much better than a lengthy detailed written specification.