Semantic tagging on the cheap with a WYSIWYG editor

I am surprised by how few companies employ the little trick of using the WYSIWYG editor in their CMS to semantically tag rich text fields. The general idea is that you overload the WYSIWYG CSS support by using the CSS classes with semantically meaningful names.

Here is an example. Lets say you are publishing a business journal and you write a lot about companies and regulatory agencies. You might have a sentence like

Apparently, following an investigation into the hacking of several dozen customer accounts, the SEC found LPL negligent.

source

The terms “SEC” and “LPL” are italicized in compliance with your style guide. To satisfy the style guide, your reporter probably just highlights the text and clicks the italics button. But, what if your style guide changes to say that company names are bolded and your regulatory agencies are in red text? Using style classes would give you much greater flexibility than “em” and “strong” tags. Most WYSIWYG editors can be configured to have a drop down list of style classes. When a user highlights the text and selects the class, the WYSIWYG editor writes it as a span tag with that class:

<span class="classname">LPL</span>

Now what should you name your classes? Here is the trick. Rather than call them “italics” or “bold” or “red,” give them names to indicate the meaning of the text that you are styling: such as “company” and “regulator.” In addition to giving more flexibility in styling, you will be able to do some really interesting things with your content. For example:

  • With a very simple XSL template you can have your rendering template put a list of mentioned companies and agencies on the page.
  • You could extend the logic of your CMS to automatically create metadata about the article to help your search engine figure out that your article is about Apple computers rather than Motts Brand Apple Juice.
  • You could have your rendering templates insert the stock symbol of the company or create a link to their ticker page.

The possibilities are really endless when your programming logic has access to the meaning of your content.

Using a WYSIWYG editor to do simple semantic tagging is not the only way to add meaning to your content. You can have your authors write in an XML editor. You can use a text mining engine match words against a centrally managed controlled vocabulary. However, I have found that this approach is the least expensive, most practical way to get started. They want to have their article look nice. This approach captures that intent and, with no additional effort, creates additional value that your author (at first) is less aware of. Once your content starts to have semantic tagging and your technology starts to leverage it, your authors will probably start to see the benefits and really get excited about the possibilities.

Related posts:

  1. The 6 Million Dollar WYSIWYG Editor Lisa Welchman has a great post on CMS Watch Trend...
  2. WYSIWYG Editors WYSIWYG editors are a key feature in most CMS products:...
  3. Hippo CMS 7’s New Content Type Editor Arjé Cahn posted a short video demonstrating the Hippo CMS...
  4. Good Article on Web Application Security Vulnerabilities The PHP programming language has historically gotten a bad rap...
  5. Making developer docs more accessible Uploaded with plasq’s Skitch! What do developers hate more than...

3 Responses to “Semantic tagging on the cheap with a WYSIWYG editor”

  1. Hartvig says:

    Wow – great idea. I’ll cook up a sample of this for umbraco.

  2. Adrian Sutton says:

    This is fantastic advice – far too many people leave the graphic designers to create the CSS and don’t think of it as part of the authoring user interface. By choosing class names carefully and semantically you get a range of advantages:

    * Better semantic capabilities as mentioned above
    * A consistent look across the site with easy maintenance rather than authors using inline styling as they see fit
    * Much better looking content. You can design CSS styles so that authors can quickly style whole sections for specific purposes much more easily – from a pull quote box to different types of tables.

    I work for Ephox and we provide a couple of articles along these lines that should be useful regardless which editor you happen to use:

    http://liveworks.ephox.com/hints-tips/driving-consistency-with-css
    and
    http://liveworks.ephox.com/hints-tips/creating-a-see-also-panel

    The second one is particularly good to give you an idea of how to make more significant formatting changes but only requiring your users to apply one class attribute so you can style whole DIVs, lists or tables easily.

  3. Antonio Volpon says:

    Interestint idea. I am just wondering if there is the risk of putting too many tags (eg. spans) surrounding the content.

Leave a Reply