A quick recap

A microformat is a means of formatting HTML (the stuff web pages are made out of) in a manner that describes something. This might be a person, an event, a book, or any number of other things that are commonly represented across the web. By presenting the information that describes the thing in the structure of a known microformat other systems can digest it and derive meaning from it in a consistent manner.

So what does that mean in real terms?

One example might be that of an event. Suppose we have a web page that gives details on an upcoming book signing. If this is just presented using common or garden HTML then there's only so much information that can be extracted automatically from this page by computers. Whilst a human can read the page and determine when the event is and where it is located a computer will just see lots of (relatively) meaningless text. There may be many dates mentioned on the page - which one represents the date the event is to be held? There is no inherent vocabulary within HTML to convey this specific information.

If this same page was created in a manner that respected the microformat for events then this ambiguity is removed and any compliant system can happily extract useful data from the page. The microformat has in effect created a vocabulary that such systems can make use of.

The end result of all this being that you might, for example, be able to click a button to add the event into your own personal calendar without having to copy the details manually.

Hold your horses, microformats aren't the only game in town

Microformats aren't the only mechanism that can achieve this. Two other popular options that essentially serve the same purpose are microdata and RDFa.

Microdata is an extension to HTML5 (the latest incarnation of HTML that's getting web developers excited at the moment). Generally speaking it's a little more complicated than microformats but boils down to a very similar approach.

RDFa (Resource Description Framework in attributes) is an extension to XHTML (the previous thing that got web developers excited). It's a bit more formal than either microformats or microdata and allows for a more complicated representation of data.

Whilst each takes a slightly different approach they all have the same goal - to allow greater semantic meaning to be derived from the web pages we create. They allow for the creation of more meaningfully structured data than can be achieved using pure HTML alone.

Enter stage left: schema.org

Different systems and different people have supported and evangelised these competing systems over the years. Generally speaking though they have stayed under the radar of most people and the majority of sites have either ignored them or made only limited use of them.

This might be about to change

The big three search engines (Google, Bing, and Yahoo!) have recently announced a new joint venture to standardise the use of structured data. Such data is important to search engines as it allows them to offer a more useful search interface. In a rare case of co-operation the three companies have decided to support the same formats and launched the schema.org website to detail these.

Interestingly they have decided to support microdata as the preferred method of defining the vocabularies for different types of structured data. This is bound to be a contentious decision. Microformats have been around longer and have a relatively large community behind them. RDFa is more complex and can represent some information that won't be directly possible in microdata.

Historically Google has supported all three formats and states that they will continue to do so although you have to assume that microdata will get preferential treatment moving forwards. Time will tell.

Yahoo! has supported microformats and RDFa (along with many other acronyms) over the years in a variety of ways.

Bing hasn't previously done anything (visible) with any of the formats so microdata support will be a new thing for them.

How do search engines use the data?

One of the most noticeable examples of structured data in the wild is Google's recipe search (www.google.com/landing/recipes/). This allows you to state specific characteristics for the types of recipes you want to find. So for example you can search for recipes that take less than 15 minutes to cook and contain less than 300 calories. This is only possible because the web pages that contain the recipes have used structured data to make this information available in a consistent way. Google can than catalogue this information and use it to power the search.

This extra information may well be available on web pages that don't use such structured data but without the extra semantics provided by them it is practically impossible to catalogue automatically in a manner that provides this extra level of filtering.

What else?

Structured data can be used for much more than recipes. The list is long and growing: CreativeWorks, Books, Movies, Music Recordings, TV Series, Audio, Images, Videos, Events, Organisations, People, Places, Local Businesses, Restaurants, Products, Reviews, and more.

Some of these are already used to decorate how search results are returned and the assumption must be that this will become more prevalent. Google refers to these extended search result entries as rich snippets.

For example, at the time of writing if you search for "thai green mango salad" on Google the first listing is decorated with some extra information showing the average review for that recipe on about.com.

It's not all about search engines

As mentioned above, search engines are not the only systems that can benefit from the use of structured data. Examples are numerous and include the following.

A web browser based tool could scan a page being viewed for addresses and show them on a map - even if the original page didn't offer this feature.

Details about a person on a page might be decorated with options to automatically add the person to your address book at the click of a button. Without the structured data in place you would have to manually copy/paste each field of information into your address book.

A collection of websites offering product reviews that make use of a suitable structured data format could be aggregated to give a broader picture of how people rate the product in question.

And so on. All of this can be done without the existence of structured data… it's just harder to achieve and so less likely to happen.

Conclusion

There has always been an inherent chicken and egg problem with structured data. Until more systems make use of it what's the point in generating it… and until you generate it why should more systems use it?

Maybe with increasingly tangible benefits coming from the adoption of things like microdata the day of structured data has come. And where the search engines lead it's entirely possible that the other potential uses of structured data will blossom. Whereas in the past it has generally taken the installation of browser plug-ins for end users to make use of them it may well be that more support becomes available by default.

And from there maybe things will snowball.

Maybe...

For more information about this or any other topic in Insights, call us on 01787 319393. Alternatively, you can get in touch via the form on our Contact page.