Toward Easier RDF
Introduction
I’ve ground to a halt on this project. In part, other things came up (like a trip to England and Thanksgiving). In greater part, the technology stack continues to stymie me at many turns.
This week, David Booth posted in the semantic web mailing list a challenging assault on the status quo: “Toward easier RDF: a proposal” which, to my mind, is some of the most direct challenge made to the current state of the stack since Manu Sporny launched his invectives via the JSON-LD project. In particular, Booth caught my eye with this quote:
The value of RDF has been well proven, in many applications, over the 20+ years since it was first created. At the same time, a painful reality has emerged: RDF is too hard for average developers. By “average developers” I mean those in the middle 33 percent of ability. And by “RDF”, I mean the whole RDF ecosystem – including SPARQL, OWL, tools, standards, etc. – everything that a developer touches when using RDF.
In my day job at Flatiron School, we’re always discussing how to make content more accessible and more readable, with less accidental complexity to confuse and distract from our lessons’ objectives.
This is not a value that seems to be demonstrated often in the SW ecosystem. Being someone who doesn’t feel the standing (“imposter syndrome”) to speak to the power of this group, I never laid the problems out, but I felt like Booth had created a moment and here’s my follow-up.
The Post
Esteemed SW Community, I've been silent on this list because I am not a practising ontologist. I'm (just a) "middle 33% developer" who thought that making a graph of knowledge about books would be interesting[0]. I've tried to document[1] my experiences, up to the point a few weeks ago that I ground to a halt. When I saw David's post[2], I was excited because I thought it might occasion discussion around the simple, pragmatic problems that stymied me. I'd like to list a few signals that RDF* sends in the first hour of exploration to the pragmatic 33%-er (me) that suggest that the explorer's further time won't pay off. I've also spent 2 hours with a near-identical (hand-wave) competitor, [AirTable][9], where I was able to get my prototype up and running in under 2 hours[10]. Based on these criticisms and comparison with the marketplace, a developer curious about RDF* receives ample signal to "close tabs, move on, and drop out of the funnel." A. Lack of a Clear Entry Point ============================== Compare "How do I write React" Google results with "How do I write RDF" Google results. * React's first hit[3] is served by its authority (reactjs.org). It links to a description that is compelling, welcoming, and relatively easily scanned. It's visually attractive and modern as well. It looks maintained. Versus: * RDF's first hit is hosted by w3schools.com[4] and feels scanty (NB: Not even * a W3C link!) * RDF's second hit is hosted by a site whose look and feel is akin to a textbook[5] and is equally exciting * RDF's third hit[6] is the same * RDF's fourth hit [7] is the first link that starts educating on the Jena API These sites look state of the art for the pre-Clinton era. Should one actually find the W3C spec, the look-and-feel there (to say nothing of the writing style and tone) suggests "Keep moving, peasant." As a pragmatic 33%-er, my intuition is screaming "Close tabs; abort." B. Lack of Technology Framing ============================= Compare the React home[2] to any of those previous links [3][4][5][6]. The navigational tree hits topics that provide "big picture," "tools required," "help if you get stuck," "what is this technology," and "when is it an optimal choice?" By comparison, I don't have any idea what RDF* thinks its use or chief benefits are. To the pragmatic 33%-er, this site says "You're welcome here, prepare to be awesome." C. A Highly Fractured Ecosystem =============================== Said Booth: > a painful reality has emerged: RDF is too hard for *average* developers. By > "average developers" I mean those in the middle 33 percent of ability. And by > "RDF", I mean the whole RDF ecosystem -- including SPARQL, OWL, tools, > standards, etc. -- everything that a developer touches when using RDF. While RDF is wonderfully graspable in its simplicity: triples that can be serialized into multiple formats; its ecosystem of clever acronyms and backronyms is tedious, over-precious, and opaque. RDF* requires the learner to hold too many cognitive circuits open before anything starts to resolve. React avoids this by doing complete layers (e.g. no classes, classes without JSX, classes with JSX) where complete, albeit small, artifacts are created repeatedly. Most of these technologies' defining document is a W3C standard written in the opaque style of W3C standards (see Sporny, at length). While these standards cover cases exhaustively, they're difficult to understand applying to a toy example. React makes tic-tac-toe from which I can extrapolate Twitter integrations or JavaScript widgets. RDF* has no such entry point. Supposing one finds a canonical entry point, RDF* feels like it solves someone else's problem and not mine (close tab; bye!). D. Lack of Automated Feedback ============================= One of the greatest things that happened in learning HTML (1994, in my case) was the existence of validators to provide feedback of whether I was doing it right. The RDF* suite provides me no feedback as to whether I'm doing it right. When I get a serialization to parse, I can see a really pretty graph. Is that _right_? Is that _recommended_? No idea. It's like learning German, going to Germany, speaking German, and finding out that no one there will (patiently) correct you when you use the wrong article. In all seriousness, I used Juan Sequeda et al's GRAFO[8] in order to have something generate an artifact that I could use to confirm my use of hand-coded RDF* and OWL. Booth's comparison to Assembly is apt; many times developers let `gcc` spit out Assembly code to get validation of their tedious-to-write, difficult-to-edit hand codings. I say more about tooling in H, and I, below. Where tooling is unavailable (or engineering effort costly in time / money), a suitable shim is possible with a (or multiple) canonical example(s). E. Lack of a Canonical Example ============================== In the dawn of the JavaScript frameworks (2014-ish) _everyone_ did a TODO app. One could compare Angular to Ember to Knockout to BatmanJS ('memba that?) and see what trade-offs the various implementers made. It was a problem with a trivial domain but from whose implementation one could project the technology learning ladder. RDF* lacks a consistent example. Where it is consistent, it is trivially small. The most consistent example (in my experience) is using a `foaf:` ontology to make some boring and fairly shallow statement e.g. "Alice knows Bob." Great. So what? How do I start building classes, and predicates (schemas) and start creating graphs based on my ideas? "Read more specs, pleb." Sigh. While it's readily obvious that we could use (the fractured ecosystem of) ontology providers to assert more about Alice and Bob, to create a schema is an entirely opaque process that isn't "ramped to" based on grokkable atoms. Where do I go to get more properties? Should I mix multiple ontologies? Is there an example? No. F. Lack of Intermediate Canonical Example ========================================= This is really an extension of E, but there's a huge gulf between some foaf-y triviality and "Model a Medical Product Ontology." Uhm, how about something obvious and fun (modeling board games, or card games, books, plays..anything?) G. Curiously Strong Rejection of SQL and OO as Metaphors ======================================================== RDF* is neither SQL nor Object-Oriented programming, but dear Mithras, SQL and OO are powerful, pervasive metaphors that most RDF* learners' mental models appeal to when they're learning. Why aren't we translating trivial OO code or trivial DB modeling in those metaphors to RDF*? Considering the blood, sweat, tears, and bile I lost learning to write SQL construction commands I'm galled to type the following: It's easier to learn to write SQL tables by hand (schema as well as content) than it is to design an RDF* schema and load it up. (To say nothing of the gigabytes of tutorial material, StackOverflow posts, etc. to help correct and steer you out of the gutter.) I re-read this now and am staggered. RDF*'s a data format that's conceptually _simpler_ than SQL but which is _orders of magnitude_ harder to learn (see A-F, above). H. Lack of Tools ================ Beginners drown in the options. Booth's suggestion of a default stack (even better if we could get it in http://repl.it) is very much needed. Give me a canonical (even dumbed down) version of tools that let me work through the canonical examples and then I'll write Python or Ruby or use GUI abstractions to get out of the, per Booth, assembly language verbosity of the RDF* stack. Many e.g. UNIX tutorials use nano (these days, I used pico back in the 90's). This is sensible. Trust that the learner will soon tire of the tool (or not) and decide to upgrade their tooling (unto `vim`, say). But by all means, make them effective! Why not use use turtle or N3 or (better yet!) JSON (because people know it) consistently? Whichever is simpler and more neatly fits in code samples. Because of the hesitancy to voice a strong opinion or a good starting point, beginners don't know where to start and drown in the undifferentiated murk. Close tabs; move on. I. Obvious Moribundity of Tools =============================== I first started learning about RDF* technology in Austin, TX at Cyc under the organizational passion of one Juan Sequeda in 2008-9. Can you imagine how staggered I was to find that the tooling ecosystem has made no appreciable progress in a literal decade? Name any other software that can see so little growth and still be called "vibrant." The majority of tools I downloaded required JVM and / or failed to start when installed locally. Web options were poor as well. I rather enjoyed my trial of Grafo[8] as it's the first twitch of life I've seen in this space since before the Obama administration. J. Faster, More-Than "Just Barely Good Enough" Competitors ========================================================== By way of comparison, I _just now_ used Airtable[9] to build my book cataloging proposal[1] in 2 intuitive, friendly hours and I can readily see how to extend it to serve my problem domain. I grant that I'm losing the advanced query structure of SPARQL (which confuses me to no end and promises hours of delightful spec reading; no loss) and the hopes for inference, but at roughly the same time it takes to grok one of the 1-5 standards one has to read to use RDF*, I have something that I can provide as a read-only share to anyone reading this post: https://airtable.com/shrJILw0CTILV0My2 (*and* AirTable features like collaboration, note history sans RCS, read-only sharing, etc.) Airtable has existed substantially less time than RDF* and has solved a majority of the tool-chain, reference implementation, bootstrapping hurdles. React has done the same. Why as RDF*'s ecosystem so fundamentally failed to meet the quality, ease, and friendliness of these latecomer technologies? Conclusion ========== I'm sure I certainly stepped on some toes here. I'm sorry if I hurt YOUR feelings. No one likes to have tech they wrote or tech that they labored to get up and over the learning curve on whipped like this. I also know that I'm dissmissable with: * "Just RTFM better" * "If it was meant to be easy we wouldn't be getting PhDs in it" * "It's a specification, precision and authority outrank ease of use." * "Your dumb book logging idea is too simple a domain for technology this powerful, use an Excel sheet, peasant." But I hope this can be a clarion call: commercial entities are doing similar work with beautiful interfaces that are intuitive and running laps around the RDF* universe. If the bar for RDF* remains as high as it is, the future of the web will be _theirs_ to decide; Facebook squashed foaf, Facebook / Google squashed OpenID, something like if not AirTable will squash RDF* at this rate. Kathy Sierra said one of the most profound things I ever heard at SXSW in the early aughts (about the time I was dabbling with SW): "When tools are great, users say 'This tool is awesome'; when tools or docs are awful, users say 'I suck.'" After 10 years of feeling like "I suck" in RDF* land, I'm starting to wonder why I'm still trying. Footnotes ========= *: Booth has overloaded "RDF" to mean an ecosystem. I'll be using "RDF" similarly. References ========== [0]: https://stevengharms.com/research/semweb/problem_statement/ [1]: https://stevengharms.com/research/semweb/ [2]: https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0036.html [2]: https://reactjs.org/tutorial/tutorial.html [4]: https://www.w3schools.com/xml/xml_rdf.asp [5]: http://www.linkeddatatools.com/introducing-rdf-part-2 [6]: http://www.linkeddatatools.com/introducing-rdf [7]: https://jena.apache.org/tutorials/rdf_api.html [8]: https://gra.fo/ [9]: https://airtable.com [10]: https://airtable.com/shrJILw0CTILV0My2