One of the blessings of living in Austin — and it’s important to remember them this time of year when you feel your eyeballs melting out when you step into mid-afternoon sun — is its legacy of work in machine learning and AI. Here we have a very active interest group, Semantic Web Austin run by Juan Sequeda, who has, over the last year or so, brought some very visible researchers in Semantic Web to town to teach hands-on tutorials.
If the concept of “Semantic Web” is foreign to you, let me try to capture
its essence succinctly. Presently one can conceive of the Web as a web of
documents: presentation and data are represented as web pages. My web document
points to Ryan’s document and Lauren’s document. Now imagine a résumé on the
Web. This résumé is a series of facts (and gross exaggerations
), these
have nothing per se to do with the document construct you learn from
books called a résumé — the thing with a name at the top, horizontal rules under section headings, etc. that, purportedly, employers like to read. Non-human examiners of my résumé web page
care only about the facts, not the prettiness of the artifact. Thus, the Semantic Web is one in which meaningful
data is presented (as a résumé) for humans, but also presented (as the
essential facts of the résumé) for machines such that relationships between the various data can be utilized by semantically-aware web applications.
Both Tom Heath and Peter Mika gave great presentations full of ideas and hands-on activities to the Semantic Web Austin group. From Tom I learned the basics of RDF, the language for enumerating data-facts to machines, and how to build a basic RDF document. Peter showed us RDFa and illustrated that HTML and RDF data can be written into the same document. That was a “whoa” moment for me.
Because I hadn’t had a chance to integrate these lessons from the SemWeb Austin sessions, my understanding was a bit shaky. The only way, I decided, to actually figure this out was to find a project that would give me opportunity to work with these respective ideas.
About this time my yearly review concluded and I was about to update my résumé, an activity I exhort you to do after reviews. Yet résumé-writing had always irritated me: writing a document and then trying to port it to various formats, and then Mithras help you if you need to “skew” these documents to particular employers quickly.
Thus I decided I needed to write my résumé in some sort of meta-language so that I could publish to both LaTeX and HTML and “skew” to particular employers quickly. This was the goal of m4resume.
The output is Steven Harms XHTML+RDFa résumé. If you’re interested in how I learned RDFa well enough to be able to embed it into XHTML, and are curious how I was able to disintegrate that into a series of M4 macros, you may want to read on in this exceedingly technical post. Oh by the way, this very post also has an RDF / Semantic Web payload: check it out.Phase 1: Get familiar with the RDF specifications
There is really no way around it, you need to get familiar and comfortable with the RDF and RDFa specifications. I wound up needing to consult them so often that I created local copies (for offline access). If you:git clone git://github.com/sgharms/m4resume.gityou will be given the default ‘master’ branch. If you want to view the branch that generates my résumé, execute:
git checkout -b demo origin/sgharms_exampleyou’ll find my reference documentations in m4resume/reference in your freshly created “demo” branch. You’ll want to read: 1. RDF Primer.webarchive 1. notes from rdf primer.txt This should give you familiarity with the basic terms, and hopefully my notes should give you a few salient summation points. My notes are in the “notes from rdf primer.txt” file
Build validating RDF documents
At this point, I spent a lot of time playing with the W3C’s RDF Validator. If I’ve learned anything about writing things that produce other things, it’s very helpful to produce the thing you want, so that you can test whether your producing thing actually produces something identical. As such, I wrote out my résumé by hand, in RDF. I slowly built it up block by block in RDF/XML and fed it through the validator. I took advantage of a number of ontologies:- cv=”http://purl.org/captsolo/resume-rdf/0.2/cv#”
- dc=”http://dublincore.org/2008/01/14/dcelements.rdf#”
- dc1=”http://purl.org/dc/terms/”
- doap=”http://usefulinc.com/ns/doap#”
- foaf=”http://xmlns.com/foaf/0.1/”
- geo=”http://www.w3.org/2003/01/geo/wgs84_pos#”
- rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
- rdfs=”http://www.w3.org/2000/01/rdf-schema#”
- xhv=”http://www.w3.org/1999/xhtml/vocab#”
- xml=”http://www.w3.org/XML/1998/namespace”
- doap
- ResumeRDF Ontology Specification.webarchive
Embed RDF into XHTML, thus making RDFa
Now that you have working RDF your battle is half-won, you need to integrate it into XHTML. This is defined in the XHTML+RDFa DTD. Here were the sources that I used to make an RDFa-ized version of the RDF-Resume (again, in $GITROOT/reference).- RDFa - Wikipedia, the free encyclopedia.webarchive
- RDFa Primer.webarchive
- RDFa Use Cases: Scenarios for Embedding RDF in HTML.webarchive
- RDFa for HTML Authors.webarchive
- RDFa in XHTML: Syntax and Processing.webarchive
- Tip Use rdf about and rdf ID effectively in RDF XML.webarchive
M4Resume
Why M4?
First question, why M4? It was written in the early 60’s, has arcane syntax, and, in the words of one SWiK IRC-er: “does anyone use M4 for any serious programming anymore?” Here was my thinking… First, I come from a Sendmail admin background, so knowing M4 (sorta) is not an optional thing for me so the syntax wasn’t that baffling for me to get into. Second, M4 also has the ability to be entirely self-contained: no libraries, no external dependenices, no gems (despite the very generous leg-up Dave Coupland tried to give my stubborn self — he underestimated my block-headedness). Third, it’s philosophically sexy. I have a philosophy degree and holders of this are not the most pragmatically-minded people. There’s something very attractive about this example of m4 code.divert(-1)`'dnl ------------------------------------------------------------------------------ The above makes sure definitions don't get put on the output stream, we're going to define there macros below. ------------------------------------------------------------------------------ define(`foo',`FOO') define(`bar', `BAR') define(`FOOBAR', `M4 all the way down') ------------------------------------------------------------------------------ Next, we'll get back on the main output stream ------------------------------------------------------------------------------ divert`'dnl "M4's philosophical coolness BEGIN" dnl Put a closing "tag" in a buffer, handy if M4 needs to generate markup divert(2)dnl "The End!" divert`'dnl indir(foo`'bar) dnl You don't need the following, the temporary 2 buffer is dnl automatically dumped undivert(2)Because of the simple stream-based replace, it allows you to embed the following in your sources define(`__RDFA_CANDIDATE_NAME’, `ifdef(`do_rdfa’, `more metadata’, `less metadata`)’) Lastly because of its recursion friendly design, M4 feels a lot more like programming a text-stream editing LISP. Unlike imperative paradigms, you don’t have to know how many iterations, you just let expansions happen until they don’t and then drop that out STDOUT. That was attractive to me. M4 is tail-recursion capable so all the iteration you need is there and there are enough decision structures to allow rich application logic. How often are you going to be tweaking your résumé? That said, if I ever rewrite this with a specific eye towards RDFa, I would think about using Ruby objects effectively mapping to RDF/XML blocks. Live and learn.