XML vs. RDF :: N × M vs. N + M

0 comments | 2003-11-09 | permalink

Or, Questioning Why People Can Only See the Semantic Web AI Strawman

Clay Shirky criticizes the Semantic Web in his article, The Semantic Web, Syllogism, and Worldview, to which Sam Ruby accurately assesses, "Two parts brilliance, one part strawman."

Joe Gregorio responds to Shirky's piece with this very concrete statement:

This is exactly the point I made in The Well-Formed Web, that the value that the proponents of the Semantic Web were offering could be achieved just as well with just XML and HTTP, and we are doing it today with no use of RDF, no need to wait for ubiquitous RDF deployment, no need to wait for RDF parsing and querying tools.

Yet, in the "just XML" world there is no one that I know of working on a "layer" that lets applications access a variety of XML formats (schemas) and treat similar or even logically equivalent elements or structures as if they were the same. This means each XML application developer has to do all of the work of integrating each XML format (schema): N × M.

Forget the AI strawman, forget even the RDF model and format for a moment, and tell me that's not a problem — today or foreseeable.

The RDF model along with the logic and equivalency languages, like OWL (nee DAML+OIL), altogether referred to as "the Semantic Web", is the current W3C effort to address that problem. Factoring those equivalencies into a common layer allows application developers to work with an already-integrated model, and the libraries to do the work of mapping each schema to the integrated model using a shared schema definition: N + M

One can take potshots at RDF for how it addresses the problem, and the Semantic Web for possibly reaching too far too quickly in making logical assertions based on relations modeled in RDF, but to dismiss it out of hand or resort to strawmen to attack it all while not recognizing the problem it addresses or offering an alternative solution simply tells me they don't see the problem, and therefore have no credibility in knocking RDF or the Semantic Web for trying to solve it.

Followups:


P.S. Although I did well in math, maybe it doesn't come back to me as well: The second amount is actually M × 2, M × 1, or even M × (O+1), where N is the number of applications reading various formats, M is the number of formats, and O is the potential number of canonical formats. RDF is an example of a canonical format. If you had M various XML formats, each with a transform to RDF, you would have M × 2 formats plus a transform for each. If the M format was already in RDF or other canonical form, you would have M × 1. Otherwise, generally, you have M formats plus transforms to one or more canonical formats each, or M × (O+1). In his article, Dare suggests that XSLT can be used to transform to a canonical format, but doesn't suggest what that format should be or that anyone is working on a common, public repository of those transforms.