Overview

Arborescences

Arborescences (ie, heterogeneous tree structures) are commonly used for semi-structured data (eg, abstract syntax trees, configuration settings, working data):

In graph theory, an arborescence is a directed graph in which, for a vertex u called the root and any other vertex v, there is exactly one directed path from u to v.

Equivalently, an arborescence is a directed, rooted tree in which all edges point away from the root. Every arborescence is a directed acyclic graph (DAG), but not every DAG is an arborescence.

-- Arborescence (graph theory) (WP)

Note: I find it a bit surprising that such a popular data structuring idiom is known only by such an obscure (if lovely) term. Thanks to Matthias Neeracher for pointing it out!

Building Blocks

Historically, most programmers created data structures from low-level elements (eg, fixed-length scalars and homogeneous arrays, pointers). Although some programmers still do this, the practice is mostly restricted to low-level programming in languages such as assembly language and C.

In higher-level languages (eg, JavaScript, Perl, Lisp, Python, Ruby), programmers generally start from much higher-level building blocks, eg:

  • Dynamically-allocated, garbage-collected data

  • Character strings, references, symbols

  • Heterogeneous, numerically-indexed arrays
    that can also be treated as stacks, queues, etc.

Combinations

These building blocks can be combined in a variety of ways. For example, because collections can contain other collections, it is easy to create heterogeneous tree structures. The indexes may be any sort of objects, including integers, strings, symbols, etc:

foo[42]['string'][:symbol][object]

If a tree includes indexes, keys, or references into other parts of the data structure, it can be used to represent arbitrary graphs. In short, practically any desired data structure can be created and managed with minimal effort. However, these pages concentrate (mostly) on trees.

Usage

The arborescence is found in many programs. For example, language compilers and interpreters typically use an abstract syntax tree to store information on program structure. Most data serialization formats (eg, JSON, property lists, XML, YAML) are also based on this structure. The arborescence is typically encoded as a tree of hashes and lists, using scalars (e.g., numbers, strings) as leaf nodes.

Although ordered lists can be used for internal nodes, I don't (generally) consider this to be a Best Practice. Using hashes for most nodes minimizes the use of lists, trading (undocumented and error-prone) Connascence of Position for (self-documenting and tractable) Connascence of Name. So, I prefer to use hashes unless naming simply isn't appropriate.


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r5 - 18 Mar 2014, RichMorin
This site is powered by Foswiki Copyright © by the contributing authors. All material on this wiki is the property of the contributing authors.
Foswiki version v2.1.6, Release Foswiki-2.1.6, Plugin API version 2.4
Ideas, requests, problems regarding CFCL Wiki? Send us email