Coding idioms (eg, programming idioms, design patterns, data serialization formats)
are commonly recognized as being valuable.
However, there may be opportunities to improve their usage.
(See the Background
for an extended discussion of these opportunities.)
Programmers often re-implement coding idioms,
either independently or based on published design patterns.
This produces repetitive code which can be needlessly difficult to maintain.
These pages explore the possibility of creating tooling
to help maintainers identify repetitive code
and replace it with calls to suitable abstractions (eg, functions, macros).
The Basic Notion
The Don't Repeat Yourself
is frequently espoused by programmers.
However, the practice of DRYing out code is difficult, tedious, and error-prone.
In order to detect and replace repetitive code, one must:
- recognize and match up uses of abstract patterns
- locate (or develop) a formal abstraction (eg, macro)
- replace the repetitive code, using the abstraction
It would be lovely to hand these tasks off to a computer,
but there are some daunting challenges.
In the general case, each of the tasks above may be AI-complete
Fortunately, it may be possible to simplify some tasks
and leave others to be done by the programmer.
For example, a program could "harvest" repeated patterns from source code,
presenting them for evaluation.
If an idiom seemed useful and intuitive,
the programmer could apply it to the reported use cases,
possibly with some mechanized assistance.
Many programming and data representation languages have complex syntax.
This isn't a show-stopper for code analysis and manipulation,
but it does present assorted difficulties.
So, let's begin with more tractable code bases.
dialects use a regular, simple syntax
which maps readily to an abstract syntax tree
By targeting Lispish code bases initially,
we can defer dealing with issues related
to a target language's concrete syntax
is a Lisp dialect that supports
It is well suited to symbolic computation
which will be the heart of any tooling we create.
Because it is also homoiconic
(and matches our input data format),
the syntax found in the target code bases will be a natural fit.
In particular, we can take advantage of Clojure's facilities for code manipulation.
Like many Lisp dialects, Clojure supports syntactic macros
which can be used to encode idioms in a clear and concise way.
There is also a Clojure library (kibit
which can perform template-based refactoring of code
to replace hard-coded idioms with macro invocations.
My initial plan is to harvest idioms from two substantial code bases.
One (Clojure) is a collection of program code;
the other (SUMO) is a collection of ontological axioms.
This should provide a significant amount of diversity
in application areas, coding idioms, and language structures.
Clojure's coding style encourages the use
of small, independent functions and macros.
In general, each of these can be analyzed on its own,
eliminating the need for global context.
In addition, Clojure's emphasis on simple, composable abstractions
meshes well with our overall objectives.
Quite a bit of Clojure code is available,
including base distributions that target assorted platforms
(eg, JVM bytecode
and numerous contributed libraries.
There is also an active community of Clojure developers and users,
some of whom may view this as an interesting project.
The Suggested Upper Merged Ontology
is intended as a foundation ontology
for a variety of computer information processing systems.
It encodes logical axioms (eg, classes, relationships, rules) in SUO-KIF,
a variant of
Knowledge Interchange Format
As hinted above, the plan is to start with (relatively) easy problems.
Specifically, we'll look for small-scale programming idioms
found within AST-friendly code bases.
The basic idea is to build an idiom harvester
that can identify and tally idioms,
as represented in Clojure's abstract syntax tree format.
The proposed approach is to generate signatures of possible idioms,
tally their frequency of occurrence, and generate a summary report.
An interactive program (based on kibit)
could then allow a programmer to accept and polish idioms
(eg, as functions or macros),
then use the idioms to replace their expanded forms in the code base.
If the code base can be split into small, syntactically-independent line sequences,
we can avoid the need to deal with global context.
Conveniently, Clojure's functions and macros,
like SUMO's axioms, meet this desire.
We then convert each line sequence into some number
of snippets (ie, candidate use cases of idioms).
Each snippet captures one or more levels of the AST,
ignoring surrounding and/or subsidiary code as need be.
The next step is to remove specific details from each snippet,
turning it into an abstract signature.
Finally, we tally occurrences of signatures,
for use in (say) a summary report.
My initial plans are based on the following technologies:
- core.logic - A Logic Programming library for Clojure & ClojureScript
- kibit - There's a function for that!