Monday, November 26, 2012

On the talismanic fight between RDFa and microdata

On the talismanic fight between RDFa and microdata:
A new fight has broken out in specland, between the supporters of RDFa and supporters of microdata. Observers may be wondering why; both are methods of adding extra markup to existing content in order that machines may better understand the content. Semantic Web proponents (note capital letters) dream of a Web where all content is linked by said machines. Semantic Web sceptics have more humble aspirations of search engines better understanding micro-content (is this string of digits a book ISBN, or a phone number?).
RDFa was part of XHTML 2. It became a W3C standard (or, in their vernacular, a “recommendation”) in 2008. microdata was invented by Ian Hickson as part of HTML5 because he identified deficiencies in RDFa. microdata was subsequently modularised out of W3C HTML5, but microdata is part of HTML5; it validates, whereas RDFa doesn’t.
Note the history. Like football fights that break out because one guy called an opposing team fan’s pint “a pouff”, this isn’t about the actual slight at all, this is about the past, allegiances and alliances; it’s a clash of world views. This is XML versus non-XML; it’s the XHTML 2 gang against the uncouth young turks of HTML5. This is Rangers vs Celtic; it’s Blur vs Oasis; it’s Tiswas vs Swap Shop.
What follows is the observation of a layman; I’ve not used much structured content, so am not an expert (I once tried to use microformats for events at the Law Society, but their accessibility problems prevented it.
In my opinion, the primary deficiencies of Classic RDFa are that it’s too hard to write. For professional metadata-ologists it may be simple (but, hey, those guys understand Dublin Core!). The difficulty for me as an HTML wrangler was namespacing, CURIEs, and triples. This is XML land, and most web authors are not particularly adept with XML.
There’s also the problem that in order to use RDFa properly, you need an xmlns attribute which is separate from the content you’re actually marking up. In a world where lots of content is syndicated via machine, or copy and pasted by authors (many of whom don’t really understand what they’re copy and pasting), this leads to breakage as not all of the necessary moving parts get transferred to their new environment. Hixie wrote

Copy-and-paste of the source becomes very brittle when two separate parts of a document are needed to make sense of the content. Copy-and-paste is how the Web evolved, so I think it is important to keep it functional and easy.

microdata solves this problem. It’s also easier to write (in my opinion) although I’m still mystified by the itemid attribute. I intend to start using microdata on this site soon (in order to plug the holes left by removal of the HTML5 pubdate attribute).
I’ve been recommending that people use microdata. Its main advantages:

Manu Sporny understood the problem that RDFa is hard to author for those of us who find the best ontology is a don’t-ology. Almost a year ago, he set about simplifying RDFa and came up with RDFa Lite. RDFa Lite greatly simplifies RDFa; in fact, you can search and replace microdata terms with RDFa terms (see his post Mythical Differences: RDFa Lite vs. Microdata).
RDFa has multiple advantages, too:

  • it’s compatible with existing RDFa data on the Web (which is why it uses many of the same patterns as microdata but uses a different syntax)

  • you can use different vocabularies in the same item, which you can’t in microdata – see Jeni Tennison’s Using Multiple Vocabularies in Microdata. This allows you to support both schema.org and Facebook’s Open Graph Protocol (OGP) using a single markup language

  • you can easily switch to full-fat RDFa in the future if you feel the need

  • RDFa will also be supported by schema.org (although it’s unclear to me whether it already is)


It seems to me that developers should just choose the one that meets their project’s needs. Need valid code, don’t care about Facebook? Choose microdata. Care about Facebook, don’t care about a JavaScript API? Use RDFa Lite.
The current fight, however, won’t allow that. The RDFa gang want to stop microdata going further in the standardisation process because RDFa became a Recommendation first, and microdata is quite similar to it.
While I completely understand that two competing standards makes it harder for developers in the short term, I agree with Marcos Caceres (who isn’t a WHATWG/ HTML5 zealot) who counters Manu Sporny’s objection to microdata progressing thus:

I don’t see what it being a “Recommendation” has to do with anything – just because it’s a W3C Recommendation does not mean that RDFa has a monopoly on structured data in HTML. So, just because that spec reached Rec first doesn’t mean that it’s somehow better or preferable to any other future solution (including micro data). That would be like objecting to Javascript because assembler (or punch cards) already meet all the use cases…
I hope you will instead focus your energy on convincing the world that RDFa is the “correct technology” on its own merits and not place your bets on a mostly meaningless label (“Recommendation”) given by some (much loved, but) random standard organisation.

So, developers; which tickles your fancies? RDFa Lite or microdata?

DIGITAL JUICE

No comments:

Post a Comment

Thank's!