IPM Lab

Motivation

Inductive Process Modeling (IPM) research has been progressing for more than a decade. Dozens of papers have been published and several experimental software systems have been developed. IPM Lab is intended as a way to merge the best aspects of these systems, producing a convenient and extensible "laboratory workbench" for research on IPM and related topics.

Each of these systems introduced new approaches and/or capabilities, eg:

Although the current system (SC-IPM) embodies most of the useful capabilities developed over the past decade, some useful features have been lost along the way. For example, PROMETHEUS accepted infix (ie, algebraic) notation for equations, whereas SC-IPM requires prefix (ie, Lisp) notation. PROMETHEUS also generated diagrams of model structures, showing the relationships among entities and processes. Arguably, these sorts of diagrams can help to "explain" the model structures.

In addition, the SC-IPM code base has become a substantial maintenance burden. Including modified and/or unsupported libraries, it contains about 20K lines of minimally-documented Common Lisp code. Whenever changes are proposed, their benefits must be weighed against the difficulty of folding them into the existing code base. Experience has shown us that this difficulty can be substantial, even for (seemingly) minor changes.

As research software, it's not surprising that SC-IPM has accumulated a certain amount of technical debt. However, nothing says that we have to continue paying off the interest. Our current research aims will require substantial enhancements (eg, support for finite element methods and/or partial differential equations). Adding these to the current software base would be difficult, at best. Fortunately, this is not the only option open to us.

Approach

IPM Lab is envisaged as a modular "laboratory", composed of pluggable components (ie, programs, intermediate files). It would retain SC-IPM's key capabilities, regain lost features (and add new ones), while eliminating most of the current inflexibility and limitations. Perhaps surprisingly, the basic approach is very simple:

  • Replicate SC-IPM's key capabilities (eg, model generation and evaluation).

  • Package the capabilities as programs (ie, Unix commands, data filters).

  • Communicate by means of standardized data formats (eg, JSON, YAML).

  • Add programs for new capabilities (eg, FEM, PDE), as desired.

Much of the current code base (eg, McCLIM-based GUI support) will simply go away. In other cases (eg, FEM, ODE, and PDE solvers), we may be able to use externally supported libraries and programs. So, we gain access to large bodies of well-supported code, while dramatically reducing the amount of code that we need to maintain.

Data formats

Much of the complexity in SC-IPM (and subsystems such as MISC) has to do with cross-language communication (eg, parsing and generating Lisp-encoded data structures in other languages, supporting foreign function interfaces for C libraries, translating Fortran libraries to Lisp).

Standardized data formats eliminate a great deal of this complexity from IPM Lab. Instead of translating (and supporting) needed libraries, we can use them "as-is". In many cases, existing libraries will handle data encoding issues for us. Finally, these formats make it trivial to interact with web-based technologies such as D3, as well as software used by scientific collaborators.

The default serialization format will be YAML, a human-friendly, standardized syntax with support in many languages. Programs in other languages may use the data in JSON format, by means of (trivial) conversion filters. Transit encoding may be used to conserve data type information.

Components

The specific division of components is still a bit unclear, but there are some obvious candidates for extraction or replication from SC-IPM, adoption from external sources, or (if need be) development:

  • validation of model set specifications (specs)

  • generation and evaluation of model structures

  • generation and relaxation of constraints

  • visualization (eg, models, results, specs)

Plumbing

Because the Lab's components are connected by data files, plumbing them together should require very little effort. For example, evaluating models in parallel becomes trivial. This capability is likely to be very useful as we enter the (computationally expensive) realm of FEM, PDEs, etc.

Fundamental architectural changes also become possible to explore. For example, we may be able to feed model evaluation results back to the generation code, improving the quality of model structures and adding to the knowledge contained in the model set specification:

Workflow

Although the Lab's programs can be linked together by simple scripts, this is not the only possibility. Existing scientific workflow systems such as Kepler scientific workflow system are worth considering; a custom system is also an intriguing possibility.

Kepler

Kepler can be used to assemble experimental systems, providing a number of benefits:

Kepler is a free software system for designing, executing, reusing, evolving, archiving, and sharing scientific workflows. Kepler's facilities provide process and data monitoring, provenance information, and high-speed data movement. Workflows in general, and scientific workflows in particular, are directed graphs where the nodes represent discrete computational components, and the edges represent paths along which data and results can flow between components.

-- Kepler scientific workflow system (WP)

Neo4jc

It may be possible to gain many of these benefits, and more, by creating a custom job control framework. The Neo4jc page explores a design concept based on Git, Neo4j, Rake, and other open source software infrastructure.


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r51 - 04 Apr 2016, RichMorin
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CFCL Wiki? Send feedback