Serialization

Using tree-structured data and reference data types, programmers can represent arbitrary graphs. This is very convenient with a single process, but it isn't well supported (let alone standardized) by most serialization formats. Nor is there common support for (say) multiply-indexed data. So, the reader and writer have to agree -- out of band -- on ways to specify the necessary post-load processing.

Consequently, programmers commonly find themselves defining ad hoc idioms (and writing the code to support them). This seems like a massive waste of effort, with many opportunities for confusion and error.

Approaches

Here are some approaches I've encountered; please let me know about anything interesting that i've missed:

edn-format

Rich Hickey's edn-format (extensible data notation) proposal has no provision for graphs or reference types. However, its extensibility can be used to generate them (and much more). Basically, the idea is that the writer can precede any readable element by a tag that indicates how the reader should process the element:

... A tag indicates the semantic interpretation of the following element. It is envisioned that a reader implementation will allow clients to register handlers for specific tags. Upon encountering a tag, the reader will first read the next element (which may itself be or comprise other tagged elements), then pass the result to the corresponding handler for further interpretation, and the result of the handler will be the data value yielded by the tag + tagged element, i.e. reading a tag and tagged element yields one value. This value is the value to be returned to the program and is not further interpreted as edn data by the reader.

-- https://github.com/edn-format/edn#tagged-elements

Because the writer is limited to tags that the reader supports, this seems both flexible and safe. However, a possible downside is that readers and writers have to agree in advance about the tags that they intend to use. So, perhaps we need a "standard vocabulary" of tags.

JSON-LD

JSON-LD (see also JSON-LD 1.0 has various approaches (eg, Data Indexing, Named Graphs) that look promising. I'm still looking into this, so I'll withhold comment for now.

YAML

YAML References provide a direct, if somewhat limited, solution to the problem:

foo:
  bar: &a
    - 1
    - 2
    - 3
baz:   *a

In this YAML snippet, the key baz refers to the same list that the key path foo/bar defines. Unfortunately, YAML does not support forward declarations. This keeps YAML from supporting arbitrary graphs, which can contain cycles. Even when no cycles are involved, the ordering limitation may result in awkward code.


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r2 - 16 Mar 2014, RichMorin
This site is powered by Foswiki Copyright © by the contributing authors. All material on this wiki is the property of the contributing authors.
Foswiki version v2.1.6, Release Foswiki-2.1.6, Plugin API version 2.4
Ideas, requests, problems regarding CFCL Wiki? Send us email