Transit

Transit is a data serialization format, developed by Cognitect. Transit has the same abstract syntax as JSON (nested hashes and lists, with strings as leaf nodes). However, unlike JSON, it offers a standardized yet extensible way to encode type information within the data stream. So, for example, it can encode simple values (eg, dates, UUIDs), as well as data structures (eg, sets, tuples)

Transit uses JSON and MessagePack as low-level encodings, so its deserialization libraries can take advantage of well-tuned, existing parsers. This greatly reduces the effort of supporting Transit in new languages and also tend to yield very efficient implementations.

Motivation

JSON was "discovered" as a useful subset of JavaScript by Douglas Crockford. JSON is primarily used to communicate with JavaScript-based applications, but JSON libraries have become nearly universal in programming languages. Unfortunately, JSON's JavaScript origins have also given it a very limited repertoire of data types:

  • signed decimal number
  • character string (Unicode)
  • boolean truth value
  • null (empty value)

  • array (ordered list)
  • object (associative array)

Consequently, other types must be converted in and out of these types, saving the original metadata so that both the type and value can be reproduced. This places a burden on the application programmer, makes application programs more complex, and (because of special-case logic) makes systems more brittle.

Transit handles these problems in an automatic, efficient, and largely invisible fashion. To a large degree, programs can simply exchange data structures, without worrying about the details of (de)serialization. Of course, the programs still have to be cognizant of higher-order (ie, semantic) details.

Data Flow

In its simplest form, connecting programs by means of intermediate data encodings looks something like this:

That is, Program A encodes and writes some data which is then read and decoded by Program B. Both the format and transmission method are variable: we could use a JSON message, a YAML file, etc. Libraries are available to perform both encoding and decoding, but they assume that the data involved uses only the data types that the formats directly support.

Transit augments these libraries, adding extensible (tag-based) support for type information. It also adds features for performance (eg, binary encoding, caching) and programmer convenience (eg, human-readable JSON format). Data type mapping is performed by collections of read and write "handlers":

Using introspection, the Write Library determines the original data type and invokes the appropriate handler. Other handlers may be invoked (recursively) until only Transit's base data types remain. The Read Library operates in a similar manner, using tags to determine the encoded data type.

If no appropriate Read Handler is available, Transit simply passes along the tagged, encoded data. This allows the reading application to deal with the data in a minimal manner (eg, copying it to an output stream), even if it cannot "understand" the data type(s) involved. This could be useful, for example, in message routing.

Because handlers only have to deal with single data elements, they tend to be small and simple (hence, easy to write and test). So, adding new data types to Transit is much easier than it would be for (say) raw JSON or YAML. See the main Transit page for detailed information on architecture, data flow, etc.

Transit Maps

Transit is well suited to "green field" use cases, where compliant data encodings can be generated as needed. In many cases, however, the input data may be non-compliant. So, I'd like to see a way to map this data into compliance (eg, using a Transit Map file and a Transit Mapper utility):

To provide language independence, the Transit Map should be encoded as a data structure (eg, JSON, YAML). The Transit Mapper will typically be run on a server, so it could be written in any Transit-supported language (eg, Clojure, Ruby). And, because it will be a rather simple program, it could easily be implemented in multiple languages.

Format

I'm not at all sure what format would be best for the Transit Map, but here is one possibility. Assume that we have this JSON content as our input:

"big_dec_1":     "123.456"
"big_dec_2":     "234.567"
"map_1": {
  "big_int_1":     "123"
  "big_int_2":     "234"
}

And we want to produce this (Transit-compliant) JSON:

"big_dec_1":     "~f123.456"
"big_dec_2":     "~f234.567"
"map_1": {
  "big_int_1":   "~n123"
  "big_int_2":   "~n234"
}

We can specify these mappings, using some simple rules (based on Perl5 regular expressions):

"big_dec_\d+":   "~f"
"map_1":         { ".+":  ~n" }

Obviously, there will be cases where this sort of declarative syntax will not be sufficient, but even an 80% solution can be useful...

Resources


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r8 - 04 Apr 2016, RichMorin
This site is powered by Foswiki Copyright © by the contributing authors. All material on this wiki is the property of the contributing authors.
Foswiki version v2.1.6, Release Foswiki-2.1.6, Plugin API version 2.4
Ideas, requests, problems regarding CFCL Wiki? Send us email