Arborescences (ie, heterogeneous tree structures)
are commonly used for semi-structured data
(eg, abstract syntax trees, configuration settings, working data):
In graph theory, an arborescence is a directed graph in which,
for a vertex u called the root and any other vertex v,
there is exactly one directed path from u to v.
Equivalently, an arborescence is a directed, rooted tree
in which all edges point away from the root.
Every arborescence is a directed acyclic graph (DAG),
but not every DAG is an arborescence.
-- Arborescence (graph theory) (WP)
I find it a bit surprising that such a popular data structuring idiom
is known only by such an obscure (if lovely) term.
Thanks to Matthias Neeracher for pointing it out!
Historically, most programmers created data structures from low-level elements
(eg, fixed-length scalars and homogeneous arrays, pointers).
Although some programmers still do this,
the practice is mostly restricted to low-level programming
in languages such as assembly language
In higher-level languages (eg,
programmers generally start from much higher-level building blocks, eg:
- Dynamically-allocated, garbage-collected data
- Character strings, references, symbols
- Heterogeneous, numerically-indexed arrays
that can also be treated as stacks, queues, etc.
These building blocks can be combined in a variety of ways.
For example, because collections can contain other collections,
it is easy to create heterogeneous tree structures.
The indexes may be any sort of objects,
including integers, strings, symbols, etc:
If a tree includes indexes, keys, or references
into other parts of the data structure,
it can be used to represent arbitrary graphs.
In short, practically any desired data structure
can be created and managed with minimal effort.
However, these pages concentrate (mostly) on trees.
The arborescence is found in many programs.
For example, language compilers and interpreters
typically use an abstract syntax tree
to store information on program structure.
Most data serialization formats (eg, JSON
are also based on this structure.
The arborescence is typically encoded as a tree of hashes and lists,
using scalars (e.g., numbers, strings) as leaf nodes.
Although ordered lists can be used for internal nodes,
I don't (generally) consider this to be a Best Practice.
Using hashes for most nodes minimizes the use of lists,
trading (undocumented and error-prone) Connascence of Position
for (self-documenting and tractable) Connascence of Name
So, I prefer to use hashes unless naming simply isn't appropriate.
This wiki page is maintained by Rich Morin
an independent consultant specializing in software design, development, and documentation.
Please feel free to email
comments, inquiries, suggestions, etc!