Coding Idioms

Coding idioms (eg, programming idioms, design patterns, data serialization formats) are commonly recognized as being valuable. However, there may be opportunities to improve their usage. (See the Background page for an extended discussion of these opportunities.)

Programmers often re-implement coding idioms, either independently or based on published design patterns. This produces repetitive code which can be needlessly difficult to maintain. These pages explore the possibility of creating tooling to help maintainers identify repetitive code and replace it with calls to suitable abstractions (eg, functions, macros).

The Basic Notion

The Don't Repeat Yourself (DRY) mantra is frequently espoused by programmers. However, the practice of DRYing out code is difficult, tedious, and error-prone. In order to detect and replace repetitive code, one must:

  • recognize and match up uses of abstract patterns

  • locate (or develop) a formal abstraction (eg, macro)

  • replace the repetitive code, using the abstraction

It would be lovely to hand these tasks off to a computer, but there are some daunting challenges. In the general case, each of the tasks above may be AI-complete. Fortunately, it may be possible to simplify some tasks and leave others to be done by the programmer.

For example, a program could "harvest" repeated patterns from source code, presenting them for evaluation. If an idiom seemed useful and intuitive, the programmer could apply it to the reported use cases, possibly with some mechanized assistance.

Language Issues

Many programming and data representation languages have complex syntax. This isn't a show-stopper for code analysis and manipulation, but it does present assorted difficulties. So, let's begin with more tractable code bases.

Lisp dialects use a regular, simple syntax which maps readily to an abstract syntax tree (AST). By targeting Lispish code bases initially, we can defer dealing with issues related to a target language's concrete syntax.

Implementation Language

Clojure is a Lisp dialect that supports functional programming. It is well suited to symbolic computation, which will be the heart of any tooling we create. Because it is also homoiconic (and matches our input data format), the syntax found in the target code bases will be a natural fit.

In particular, we can take advantage of Clojure's facilities for code manipulation. Like many Lisp dialects, Clojure supports syntactic macros, which can be used to encode idioms in a clear and concise way. There is also a Clojure library (kibit) which can perform template-based refactoring of code to replace hard-coded idioms with macro invocations.

Target Languages

My initial plan is to harvest idioms from two substantial code bases. One (Clojure) is a collection of program code; the other (SUMO) is a collection of ontological axioms. This should provide a significant amount of diversity in application areas, coding idioms, and language structures.

Clojure

Clojure's coding style encourages the use of small, independent functions and macros. In general, each of these can be analyzed on its own, eliminating the need for global context. In addition, Clojure's emphasis on simple, composable abstractions meshes well with our overall objectives.

Quite a bit of Clojure code is available, including base distributions that target assorted platforms (eg, JVM bytecode, JavaScript, Python) and numerous contributed libraries. There is also an active community of Clojure developers and users, some of whom may view this as an interesting project.

SUMO

The Suggested Upper Merged Ontology (SUMO) is intended as a foundation ontology for a variety of computer information processing systems. It encodes logical axioms (eg, classes, relationships, rules) in SUO-KIF, a variant of Knowledge Interchange Format (KIF).

Implementation

As hinted above, the plan is to start with (relatively) easy problems. Specifically, we'll look for small-scale programming idioms found within AST-friendly code bases.

Approach

The basic idea is to build an idiom harvester that can identify and tally idioms, as represented in Clojure's abstract syntax tree format. The proposed approach is to generate signatures of possible idioms, tally their frequency of occurrence, and generate a summary report.

An interactive program (based on kibit) could then allow a programmer to accept and polish idioms (eg, as functions or macros), then use the idioms to replace their expanded forms in the code base.

Sequences

If the code base can be split into small, syntactically-independent line sequences, we can avoid the need to deal with global context. Conveniently, Clojure's functions and macros, like SUMO's axioms, meet this desire.

Snippets

We then convert each line sequence into some number of snippets (ie, candidate use cases of idioms). Each snippet captures one or more levels of the AST, ignoring surrounding and/or subsidiary code as need be.

Signatures

The next step is to remove specific details from each snippet, turning it into an abstract signature. Finally, we tally occurrences of signatures, for use in (say) a summary report.

Infrastructure

My initial plans are based on the following technologies:

  • core.logic - A Logic Programming library for Clojure & ClojureScript

  • kibit - There's a function for that!
Topic revision: r36 - 14 Aug 2012, RichMorin
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CFCL Wiki? Send feedback