STEMS Project

Note: Little of the following has been implemented, so any assertions should be regarded with extreme suspicion ("an ounce of implementation trumps a pound of rhetoric"). However, it is a working design document, so getting it out for review and comment still has merit.

STEMS (Statement Tracking, Extrapolation, and Maintenance System) is, as the acronym suggests, a system for tracking, extrapolating, and maintaining sets of statements. A statement, in STEMS' use of the term, can be a collection of mechanically-harvested data, human-entered text (eg, code, facts, rules), and/or extrapolations from these starting points.

Motivation

The initial motivation for developing STEMS was the need for a flexible Data Store for use in mechanized documentation projects. Such systems may contain large numbers (eg, millions) of data points, covering hundreds of types of entities and relationships.

To allow developers to identify sources of errors, the system must make it easy to examine the basis for each statement. When this basis changes (whether in code or data), the system should be able to recalculate any extrapolated statements. Finally, to enable experimentation and comparisons, it would be useful to be able to maintain multiple "views" of reality.

Finally, although we cannot be sure that our input data or calculations are correct, we should have confidence in our bookkeeping (eg, which calculations were performed on which input data to yield a given result). Bookkeeping may not be glamorous, but it's important to get it right!

Approach

STEMS is able to do all this, and more, due to its peculiar approach to the problem. STEMS uses an RDBMS to manage all "metadata" about statements. The actual content of each statement, however, is stored as a "blob" in a Git repository.

This is an extremely flexible, powerful, and reliable combination. An RDBMS is very good at storing and querying relationship information. The Git suite excels at storing disparate "views" (aka "branches") of data. It imposes small space and time overhead, while providing cryptographic assurances for both data and metadata.

If we are willing to give up some of STEMS' more powerful features (eg, archive compression, cryptographic assurances), we can avoid the need for Git. This can be useful for early prototyping, debugging, etc. See STEMS Lite for more information.

Available Information


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r30 - 19 Jul 2008, RichMorin
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CFCL Wiki? Send feedback