Data - Abstract

This page summarizes some of Clojure's abstract concepts regarding data. It also provides links to more detailed information.

Clojure embodies a number of abstract concepts related to data manipulation, structuring, etc.

Code as Data

As part of its Lisp heritage, Clojure is a homoiconic language. So, Clojure's real "source code" is encoded in its own data types (eg, lists, scalars, vectors). This enables automatic programming, program transformation, syntactic macros, etc.

While processing a file of program source code, Clojure's reader performs simple parsing of the input code (ie, "reader forms"), generating a data structure. During evaluation, each structure is compiled into Java bytecode (if need be), then executed by a Java Virtual Machine (JVM).

As discussed in Architecture, this is not the only possible data path. For example:

  • Parsed code may be generated by other parts of the program, read in from other files or programs, etc.

  • Parsed code may be edited (possibly multiple times) by macros before it is compiled and evaluated.

Data as API

Clojure encourages the idea of using data structures as the Application Programming Interface (API) for software systems. Rather than hiding data structures behind a limited set of accessor methods, Clojurists argue, why not expose and document them? Here is an oft-seen quote:

It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.

-- Alan J. Perlis, Epigrams in Programming

Rich Hickey's version of the epigram folds in Clojure's emphasis on data abstractions (eg, Sequences):

It is better to have 100 functions operate on one data abstraction than 10 functions on 10 data structures.

-- Rich Hickey, quoted in The Joy of Clojure

Destructuring

Clojure's Destructuring (aka Binding Form) syntax provides powerful and concise ways to bind symbols to specified parts of data structures (eg, Hashes, Lists, Sequences, Vectors). This syntax is supported in let binding lists, fn parameter lists, and any macro that expands into a let or fn.

Immutability

As is common in functional programming languages, Clojure's data tends to be immutable. That is, once a symbol is set to a value, it never changes. This means that data can be passed to other functions or even to other execution threads without concern about changes being made to it.

Clojure recognizes that mutable data is sometimes necessary, but it strongly discourages unconstrained changes to data. See References and Transactions for more details.

Namespaces

Clojure's Namespaces are mappings from simple (unqualified) symbols to values (eg, Clojure Vars, Java Class objects). So, for example, foo/bar refers to the symbol bar in the foo namespace. Namespaces are not hierarchical, but various special characters (eg, period) are commonly used to simulate path separators (eg, foo.bar/baz).

The scoping of namespaces is dynamic and global: executing the in-ns function (eg, in the REPL) or the ns macro (eg, at the top of a file) changes the current namespace for the entire process (creating a new namespace if needed). However, note that this namespace will be "popped off the stack" at the end of any file loading form (eg, load, require, use).

Persistence

When Clojurists speak about persistent data structures, they are referring to data structures that:

  • preserve previous versions when modified copies are produced

  • maintain performance characteristics in the original and the copy

Clojure provides persistent implementations of several popular data structures (eg, hash, list, set, vector). These implementations allow Clojurists to use familiar and convenient idioms, while achieving acceptable performance.

References

Many programming languages have References. However, Clojure expands on the idea, providing mutable references to immutable data. So, although the referenced data may become obsolete, external actions cannot cause it to become internally inconsistent. See the References page for more information.

Sequences

Clojure's Sequence ("seq") interface allows a variety of data types (eg, files, lists, maps, sets, strings) to be accessed in the same manner. So, they can all benefit from a common (and massive!) collection of functions.

Most Clojure sequences support lazy evaluation; some also act as infinite data structures. See the References and Sequences pages for more information.

Transactions

Clojure's Software Transactional Memory (STM) system adapts the notion of database transactions as a way to allow multiple execution threads to exchange and share mutable objects.

Within an STM transaction, data access is guaranteed to be atomic, consistent, and isolated (but not durable). However, supporting three out of four of the ACID properties isn't bad!

Transparency

An expression is referentially transparent if it can be replaced with its value without changing the behavior of a program. For example, a referentially transparent function call will always generate the same output when given the same inputs. This has several benefits, including:

  • Expressions can be analyzed and tested independently.

  • Expressions can be cached (ie, memoized) for performance.


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r8 - 07 Mar 2013, RichMorin
This site is powered by Foswiki Copyright © by the contributing authors. All material on this wiki is the property of the contributing authors.
Foswiki version v2.1.6, Release Foswiki-2.1.6, Plugin API version 2.4
Ideas, requests, problems regarding CFCL Wiki? Send us email