Comments

Overview

Erlang's :epp.parse_file/2 function generates an abstract syntax tree (AST) which captures everything needed to execute the code. Unfortunately, comments aren't captured. This is understandable, from the perspective of the BEAM: comments aren't executable code, so why bother with them? However, ignoring comments is unacceptable in a translation that humans will be maintaining, studying, etc.

In order to ignore comments, parse_file must know how to detect them. So, it should be able to handle them in some other manner. For example, it could encode them as AST entries for phony functions. However, this seems a bit too challenging at the moment, so I'm taking an indirect approach: preprocess the Erlang source code in Elixir.

The downstream code I'm writing will be able to handle the phony AST entries, regardless of how they are generated. Thus, if I decide to modify parse_file (or whatever) later on, the downstream changes should be minimal.

Specific Goals

Here are some plausible goals for comment handling. Some look harder to achieve than others...

  • All comments should appear in the translation.

  • Comments should be positioned appropriately
    with respect to the surrounding source code.

  • Questionable comments should be flagged.

  • Some comment blocks should be translated
    into @doc or @moduledoc declarations.

Detection

Erlang comments begin with a percent sign (%) and extend to the end of the line. There is no explicit support for multi-line comments (i.e., comment blocks), so a series of single-line comments and/or blank lines must be used instead.

Although detecting and extracting comments would seem to be trivial, it is not. Specifically, we need to detect and avoid reporting false positives. For example, a percent sign that is embedded in a string should not be treated as the start of a comment:

% This is an inter-line comment.
X = "barfle".  % This is an inline comment.
Y = "Despite the %, this isn't a comment.".

So, to be reliable, a comment extractor would need to understand a large fraction of Erlang's concrete syntax. Fortunately, there are some possible hacks and workarounds.

Inter-function comments

The "false positive" problem only affects comments within functions. So, we can handle inter-function comments in a straightforward manner. Here's a possible approach:

  • Generate an AST from the raw Erlang source code.

  • Determine the lowest and highest line numbers for each function.

  • Generate a list of tuples defining boundaries of comment sections.

  • Collect lists of inter-function comments, trimming blank lines
    between blocks and at the start and end of each list.

  • Generate a phony function for each "meaningful" list.

So, the following set of comment lines:


% This is a comment.
% This is another.

% And yet another.

might be turned into these sorts of phony functions:

erlex_phony(F_0042a) ->
  [ "% This is a comment.",
    "% This is another." ].
erlex_phony(F_0042b) ->
  [ "% And yet another." ].

Intra-function comments

Intra-function comments (especially inline ones) look more challenging. Aside from being harder to detect, we need ways to encode them in the AST and represent them in Elixir code. So, this is still an open problem.

@doc, etc.

Erlang doesn't have anything equivalent to Elixir's @doc and @moduledoc declarations, but some comment blocks could be translated into them. Some heuristics may allow us to perform this translation:

  • A series of single-line comments and / or blank lines
    can be treated as a comment block.

  • A comment block preceding an attribute declaration
    can be translated into a @moduledoc declaration.

  • A comment block found between a pair of functions
    can be translated into an @doc for the second function.

Unfortunately, this leaves us with some outliers, including:

  • comment blocks between the last attribute declaration
    and the first function

  • comment blocks after the last function

Experimentation with real-world code bases should help us find ways to resolve these and similar issues.

Other Tooling

The Syntax Tools application might be of use in parsing Erlang code, although it doesn't use the same AST format. The parsing could also be performed by the Parse Tools application (i.e., leex, yecc) or even a hand-written parser.

Note: Thanks to Robert Virding for his patience and helpful suggestions.


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r7 - 17 Jul 2015, RichMorin
This site is powered by Foswiki Copyright © by the contributing authors. All material on this wiki is the property of the contributing authors.
Foswiki version v2.1.6, Release Foswiki-2.1.6, Plugin API version 2.4
Ideas, requests, problems regarding CFCL Wiki? Send us email