Although a typical modeling experiment will only use a few dozen programs,
the count of processes and files may be far larger (eg, thousands).
This produces a great deal of incidental complexity
which is seldom of interest to researchers (or even developers).
We need a way to manage this complexity and hide it from casual view,
while allowing control and visibility into it as needed.
When specifying an experiment, developers and researchers
need to define general patterns of processing and data flow.
However, many specific details can be handled automatically, including:
- File management (eg, archiving, conversion, indexing)
- Model structure generation and model evaluation
- Analysis and presentation of modeling results
- Process management (eg, parallel execution)
Graph-based data structures (eg, nodes, properties, relations)
are good at encoding complexity.
Graph databases (eg, Neo4j) provide robust storage for these structures,
as well as tools (eg, libraries, query languages) for dealing with them.
Here are some instances of directed graphs
that a graph database might handle for IPM Lab:
- abstract data flow (eg, programs, file types)
- concrete data flow (eg, processes, files)
- model information (eg, entities, scores)