puffy (comparative puffier, superlative puffiest)
-- puffy, in Wiktionary
- Of or pertaining to puffs or puffiness; being pillow-like, exhibiting swelling, inflated
- Speaking or writing in an exaggeratedly eloquent and self-important manner
(e.g., in message
bodies) into persistent data structures
Like cloud computing
, it relies on shared resources
such as cached binaries
and lightweight server processes (i.e., actors
Conveniently, the name sounds a bit like "PF'y data"
(i.e., akin to purely functional
Of course, both clauses of the definition above may apply (:-)
Elixir has all of Erlang's data structures, and more.
However, there are still possibilities for improvement in capabilities and efficiency.
has very powerful data management capabilities.
In particular, Clojure's reference types (
provide safe, efficient ways for colocated processes to share state.
This seems like a useful complement to state-retaining processes (e.g., Elixir's Agents
Erlang Term Storage (ETS
), and the Mnesia DBMS
Based on these foundations, it may be possible to add other interesting types.
For example, Datomic
has some very powerful capabilities
for time-sensitive data storage and retrieval,
massively scalable read access, etc.
Some of Clojure's data types have much better performance than their Elixir alternatives.
For example, the
type allows O(1) appending and indexing
(as opposed to O(n) for a
type is also very convenient and efficient for certain tasks.
Also, deep copying
of inter-process message data isn't free.
The copying operation takes processor time and each copy of the data takes up memory.
Saša Jurić discusses this in Section 5.4.3 of Elixir in Action
(Shared nothing concurrency):
Deep copying is an in-memory operation, so it should be reasonably fast.
Occasionally sending a big message shouldn't present a problem.
But having many processes frequently sending big messages may affect the performance of the system.
The notion of small versus big is subjective.
Simple data, such as a number, an atom, or a tuple with few elements, is obviously small.
A list of a million complex structures is big.
The border lies somewhere in between and depends on your special case.
Persistent Data Structures
Persistent data structures
aren't a new concept for Elixir,
but introductions to Elixir's data structures tend to gloss over this fact.
As described in Does Elixir have persistent data structures similar to Clojure?
implementations all use this approach.
There is a wealth of literature on this topic (see Resources
but David Karger's lecture notes
provide a very concise and readable description:
Once changes have been made to an ephemeral data structure,
no mechanism exists to revert to previous states.
Persistent data structures are really data structures with archaeology.
Partial persistence lets you make modifications only to the present data structure,
but allows queries of any previous version.
These previous versions might be accessed via a timestamp.
Full persistence lets you make queries and modifications
to all previous versions of the data structure.
With this type of persistence, the versions don't form a simple linear path –
they form a version tree.
The obvious way to provide persistence is to make a copy
of the data structure each time it is changed.
This has the drawback of requiring space and time proportional
to the space occupied by the original data structure.
In turns out that we can achieve persistence with O(1) additional space
and O(1) slowdown per operation for a broad class of data structures.
The use of the term "modification" in the text above could confuse the unwary.
So, for the record, persistent data structures are immutable
no in-place modification is ever performed on them.
The magic behind persistent data structures is "structural sharing".
Each "copy" of the structure uses as much of the old structure as possible,
adding only the new values and some overhead.
(See the Implementation
page for more information.)
Clojure and Datomic take great advantage of this idea.
Clojure makes it the basis of a set
of concurrency-friendly (e.g., transactional) "reference data types".
Datomic uses it to allow massive scaling of database query operations,
convenient temporal analysis of data, and much more.
Bringing It Together
Persistent data structures are supported by a number
of functional programming
(e.g., Clojure, Haskell
and have also been ported to more conventional languages
So, it seems like time for Elixir to get on the bus.
If, at the same time, we could extend Elixir's set of data types
and make large-scale message passing more convenient and efficient,
it begins to sound like a no-brainer.
So, I'm looking into it...
The Erlang environment already provides some related facilities.
Let's take a look.
Erlang Term Storage
Erlang Term Storage (ETS
) is the traditional way to share large amounts of data:
This module is an interface to the Erlang built-in term storage BIFs.
These provide the ability to store very large quantities of data in an Erlang runtime system,
and to have [in general] constant access time ...
Data is organized as a set of dynamic tables, which can store tuples.
Each table is created by a process.
When the process terminates, the table is automatically destroyed.
Every table has access rights set at creation.
Tables are divided into four different types,
ordered_set table can only have one object associated with each key.
duplicate_bag can have many objects associated with each key.
The number of tables stored at one Erlang node is limited. ...
Note that there is no automatic garbage collection for tables.
Even if there are no references to a table from any process,
it will not automatically be destroyed unless the owner process terminates.
It can be destroyed explicitly by using
This module provides some limited support for concurrent access.
All updates to single objects are guaranteed to be both atomic and isolated. ...
No other support is available within ETS that would guarantee consistency between objects. ...
The Binary Heap
Saša Jurić describes an interesting and useful optimization
that might provide an opportunity for another solution:
A special case where deep copying doesn't take place involves binaries (including strings)
that are larger than 64 bytes.
These are maintained on a special shared binary heap
and sending them doesn't result in a deep copy.
This can be useful when you need to send information to many processes
and the processes don't need to decode the string.
If the same data structure is being used by multiple processes,
this optimization will reduce both copying time and data storage.
However, the use of global data interferes with per-process garbage collection.
Target Data Types
Clojure supports a wide variety of well-documented data types.
have their own pages.
Only a few of these data types are present in Elixir, but any of them could be added to the language,
by appropriate use of macros and protocols.
My intention is to start with Elixir's existing types (e.g., lists, maps, sets),
then move on to some simple structures from Clojure (e.g., vectors).
Clojure's reference data types (e.g., Agents, Atoms, Refs, and Vars)
would seem to be philosophically at odds with the Actor model
as used in Elixir, Erlang, etc.
They are also limited to colocated (as opposed to distributed) processes.
(See Message Passing and Actors
for Clojure's rationale.)
However, they offer an interesting alternative (with great performance and safety benefits)
to message passing and other state-sharing approaches.
So, even though integrating them may be challenging,
we can't really leave them out of the design.
This wiki page is maintained by Rich Morin
an independent consultant specializing in software design, development, and documentation.
Please feel free to email
comments, inquiries, suggestions, etc!