Turtle Macros

Improving the Conciseness of Turtle and SPARQL suggests the use of macro and/or template processors to shorten code and allow efficient use of Semantic Web Design Patterns. This page explores these ideas in a bit more depth.

Motivation

Aside from being a fertile source of "cut and paste" errors, repetitive bodies of code violate the principle of "Don't Repeat Yourself":

"Every piece of knowledge must have a single, unambiguous, authoritative representation within a system."

-- Don't repeat yourself (WP)

Large bodies of code, repetitive or not, hinder comprehension and hide errors. In short, large and/or repetitive bodies of code are a maintenance problem.

This makes me wonder why I don't see any suggestions on how to DRY out Turtle and SPARQL code. SPIN has templates, to be sure, but their capabilities appear to be heavily constrained by SPARQL syntax. I'd like something quite a bit more general.

Example: FHEO Data

Allemang and Hendler's excellent book on Semantic Web techniques bases one of its Challenges on a Data.gov data set. The data set (FHEO Filed Cases) lists Title VIII fair housing cases filed by the United States Office of Fair Housing/Equal Opportunity:

    Semantic Web for the Working Ontologist:
      Effective Modeling in RDFS and OWL (2e)
    Dean Allemang, Jim Hendler
    Morgan Kaufmann, 2011, ISBN 978-0-12-385965-5

    Chapter 9: Using RDFS-Plus in the Wild, pp. 189-190
    Challenge 25: How can RDFS help us organize and process FHEO data?

The data set entries, drawn from a spreadsheet table, describe specific cases:

:entry1
  a             dgtwc:DataEntry  ;
  :case_number  "02-06-0270-8"   ;
  :color        "0"              ;
  :disability   "1"              ;
...

As the Challenge suggests, "A more flexible way to represent information of this sort is to define a class of complaints based on each factor, along with a class for complaints in general":

FHEO:Asian       rdfs:subClassOf  FHEO:Complaint  .
...
FHEO:Disability  rdfs:subClassOf  FHEO:Complaint  .
...
FHEO:White       rdfs:subClassOf  FHEO:Complaint  .

It then suggests that we define :entry1 as a case of disability discrimination:

:entry1  a  FHEO:Disability  .

and define 19 queries, of the form:

CONSTRUCT {  ?e  a             :Disability         }
WHERE     {  ?e  a             dgtwc:DataEntry  .
             ?e  :disability  "1"               .  }

RDFS and Turtle Magic

The Challenge code already uses SPARQL and Turtle syntax, so it avoids the verbosity of RDF/XML. However, there's still plenty of room for improvement.

By creating a superClassOf predicate and using some of Turtle's triple abbreviation magic, we can refactor the class definitions to eliminate quite a bit of repetition:

my:superClassOf  owl:inverseOf    rdfs:subClassOf

FHEO:Complaint   my:superClassOf
  FHEO:Asian, ..., FHEO:Disability, ..., FHEO:White .

Macro Processing Magic

Next, let's look at the code that defines our SPARQL queries. Although each definition is only three lines long, there are 19 of them. So, with comments and intervening newlines, we'd be editing at least 75 lines of brittle, repetitive code. Feh.

Let's create a macro that abstracts the general form of the definition. We can then generate each query with a (much shorter) macro call.

Note: The following code is a proof of concept, using Ruby here documents and methods. Many other implementations are possible, using domain-specific languages, template processing, etc.

def entry(name)
  <<-EOT
CONSTRUCT {  ?e  a                 :#{ name }          }
WHERE     {  ?e  a                 dgtwc:DataEntry  .
             ?e  :#{ name.snake }  "1"              .  }
  EOT
end

The entry method returns a text string containing the SPARQL code to CONSTRUCT a single query. For convenience, it fudges the name from CamelCase to snake_case. A straightforward way to use this method would be as follows:

def entries_1
  entry('Asian') + ... + entry('White')
end

Although this is much shorter, there's still a lot of overhead. So, let's define a list of Complaint names, map them into entries, and join the results into a single string:

def entries_2
  names = %w[ Asian ... White ]
  names.map {|name| entry(name) }.join
end

This is pretty concise, but it raises an embarrassing question. We already told the RDF Store the names of the Complaint classes; why don't we just ask it for them?

def entries_3
  sub_classes('Complaint').map {|name| entry(name) }.join
end

Discussion

Although I am comfortable with Ruby syntax, many ontologists are not. So, some of the coding techniques shown above might seem a bit scary. However, as noted above, there are many ways of doing macro processing; surely one of them can achieve similar results in a digestible manner.

With that issue out of the way, let's look at the results. In place of 75+ lines of static code, we have ~10 lines of code which dynamically accommodates itself to new complaint classes, etc.

And, just as we can use macros (etc) to generate SPARQL and Turtle, we can generate the macros themselves, using similar techniques. This can allow a single "macro wizard" to support many ontologists, allowing them to define needed macros using simple method calls.

With the proper sort of infrastructure, ontologists would be able to define and use macros for any repetitive bits of Turtle. This would let them express idioms and design patterns in a higher level than they could using only Turtle syntax.


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r4 - 07 Aug 2011, RichMorin
This site is powered by Foswiki Copyright © by the contributing authors. All material on this wiki is the property of the contributing authors.
Foswiki version v2.1.6, Release Foswiki-2.1.6, Plugin API version 2.4
Ideas, requests, problems regarding CFCL Wiki? Send us email