Thursday, November 2, 2006

Levels of Localization

Frequently the terms "Internationalization" or "Localization" (abbreviated to "I18N" and "L10N") are found on requirements or Requests for Proposal documents. While companies typically are under-prepared to fully support a localized website, it is good that they thinking ahead to when they are ready to reach out to these different markets. Unfortunately, too often I hear localization talked about in binary terms. As in, "does this product support localization?" Or "Should we localize this site?" In reality it is not black and white - just many gradations of gray.

Faced with the similar problem of in determining whether a web site is "accessible," the World Wide Web Consortium (W3C) Web Content Accessibility Guidelines (WCAG), came up with a three tiered structure of priorities that range from "must" have to "may" have. This allows people to qualify just how accessible a site claims to be. There are many similarities between accessibility and localization. After all, when we talk about localization, aren't we really talking about accessibility for people with different languages and customs? In both cases:

  • We are trying to reach out to an audience that is presently not able to access the content.
  • There is a cost-benefit trade-off as to how far we go to serve these audiences. Hopefully, this will not create a big ethical debate but it all depends on your audience and what their capabilities and sensitivities are. Note: if your website works well with a screen reader, it will also work well with a search engine spider. So you don't have to care about social responsibility to care about accessibility.
  • You might be mandated to serve a certain audience. In the U.S., there is Rehabilitation Act Section 508. In the Canadian government, publications must be in English and French.

Surprisingly, there are no guidelines or evaluation criteria for localization. Not until now...

Later this month at the CM Professionals Summit, I am going to hold a round table to get feedback on these Levels of Localization. If the session is productive and we reach alignment, the working group will propose this to be a CM Professionals endorsed set of guidelines on localization. To get more feedback, I am going to post some initial ideas here.

Before I go too much further, it will be useful to define what I mean by localization. Localization means supporting specific alternative locales (geographic regions distinguished by language, government, and custom). Localization can be (and frequently is) part of an Internationalization strategy of reaching broader audiences and interacting in a global marketplace ("Globalization" is usually used to describe an economic process of regional economies merging into a global economy). Factors that go into the localization process include:

  • Text (including text on images) translated into local language and dialect
  • Prices and other money references converted into local currency
  • Formats (such as date and time) displayed according to local conventions
  • Weights and measures converted into locally accepted units
  • Culturally appropriate imagery and colors

Not all websites and content are worth going to all that trouble for. Here are some intermediate steps that might serve the goal of increasing accessibility to new user groups but fall short of the true definition of localization.

  1. Level 1. This level assumes that the audience is proficient enough in the site's primary language to be able to navigate a site and find what they need. However, the specific assets that they seek are critical enough or the detail is important enough that they would feel more comfortable accessing them in their local language. Sites that reach this level have the following characteristics.
    • Selective translation. While all of the main site components (navigation, header, footer, etc.) and the bulk of the content are uni-lingual, certain important assets are translated into alternative languages. Typical examples of selectively translated content include product manuals and downloadable forms. A news agency might have a feed of news that is written in or translated into a localized language.
    • Transactive accessibility. Pages that require user input should be usable by the target locale. For example: different phone number and "postal code" formats, neutral designation for address fields (province vs state), and (possibly), double byte characters.

  2. Level 2. This level is achieved when the user is able to select a language and whole site (including navigation and buttons) is presented in the selected language. All that is available in the primary language should be available on the localized languages. However, in cases where content is not translated, there should be a fall-back mechanism that notifies the user that the asset is not available in his selected language and provides access to the primary language version. There are many subtle nuances with this behavior that should be fully specified and understood. For example, when the site falls back to the primary language of an asset, does the whole site switch over as if the user selected another language? Or does the body of the primary language content appear within the selected locale's navigation? When a user selects another language, does the site refresh to the home page or to the localized version of the content currently being viewed? How divergent are different localizations of a site allowed to become? When a new version of one translation is published (or reverted), what happens to the other translations of that same content asset? Are they un-published until they are updated to reflect the new version of the primary translation? Is anyone warned? Or are they left up there with the potential of being out of sync? Level 2 localization is usually supported on a single site instance by the CMS's internationalization functionality that maintains relationships between different translations of content, remembers the users locale selection, and provides a translation framework for static text within presentation templates (for example, the word that says "search" on the search box). Usually the CMS's localization framework will have specific philosophies on these nuances so it is important to understand how localization is implemented.
  3. Level 3. The highest level of localization has all of the characteristics of the localization definition described earlier in this article. This level requires a balance of power and control between local management (that knows the local audience) and corporate headquarters (that understands the global strategy and vision of the company). Sometimes it is difficult to tell the difference between full localization and a chaotic collection of renegade foreign offices. If the appropriate balance is achieved, local audiences receive information that the international company wants to communicate but delivered in a way that the local branch wants to express it. Out of balance, different audiences will receive intentionally or unintentionally conflicting information. Achieving this level of localization is largely a governance problem. Local variances need to be explained to and approved by headquarters in a business process that does not unduly obstruct the local divisions need to conduct business. There needs to be a communication mechanism so that global and specific changes desired by corporate are communicated to and executed by the localized sites. These same forces complicate many other aspects of managing multi-national companies. Sometimes it is hard to shield customers from these dynamics. I am still waiting to see a perfect example of a single instance of a CMS supporting this complex network of control. In general, local business units require a certain degree of inde
    pendence and agility that is difficult to achieve on a single, centrally managed platform. However, there are things to look for in a CMS that will help matters:
    • A solid notification system. When something changes the appropriate people need to be notified of the nature and impact of the change. This could be workflow based email notifications to external users. It could be cross-system workflow where the primary system initiates a task on a secondary system.
    • A localization aware reporting system. In order to achieve adequate governance, it is necessary to understand how content is being translated and syndicated to localized sites. There should be some way for the system to know if a newly published content asset has been localized. This could be achieved by a post-publish workflow state. Web log analytics are useful here too. It is important to know of broken links coming from localized sites.
    • A replicatable platform. While it is unrealistic (and frequently undesirable) to require every local business unit to use the same CMS, synergies may be achieved if a corporate standard were made available for localized sites. Presentation templates can be re-used and customized for a local market, users required to work on multiple systems will have less technology to learn, and it may be easier to orchestrate cross-system workflow (on the last point, lock-in risk might be mitigated by integrating with a third party workflow engine). More importantly, the software acquisition costs should make it affordable to distribute the tools. This also holds for the case of when a local site manager needs to log into the primary CMS to access pre-published content. These occasional users need accounts and it should not be cost prohibitive to provide them.

Hopefully, these Levels of Localization will introduce some much needed terminology to the discussion of an important topic that companies are increasingly considering in their CMS implementations. I am sure that a continued dialog will refine these ideas and drive toward a new standard that helps companies understand their real localization needs and be able to communicate them in the specification of a CMS. If you find this concept interesting, please send feedback. Or, better yet, join me at the CM Professionals Summit!