generates an abstract syntax tree (AST
which captures everything needed to execute the code.
Unfortunately, comments aren't
This is understandable, from the perspective of the BEAM:
comments aren't executable code, so why bother with them?
However, ignoring comments is unacceptable
in a translation that humans will be maintaining, studying, etc.
In order to ignore comments,
must know how to detect them.
So, it should be able to handle them in some other manner.
For example, it could encode them as AST entries for phony functions.
However, this seems a bit too challenging at the moment,
so I'm taking an indirect approach: preprocess the Erlang source code in Elixir.
The downstream code I'm writing will be able to handle the phony AST entries,
regardless of how they are generated.
Thus, if I decide to modify
(or whatever) later on,
the downstream changes should be minimal.
Here are some plausible goals for comment handling.
Some look harder to achieve than others...
- All comments should appear in the translation.
- Comments should be positioned appropriately
with respect to the surrounding source code.
- Questionable comments should be flagged.
- Some comment blocks should be translated
Erlang comments begin with a percent sign (
) and extend to the end of the line.
There is no explicit support for multi-line comments (i.e., comment blocks),
so a series of single-line comments and/or blank lines must be used instead.
Although detecting and extracting comments would seem to be trivial, it is not.
Specifically, we need to detect and avoid reporting false positives.
For example, a percent sign that is embedded in a string
be treated as the start of a comment:
% This is an inter-line comment.
X = "barfle". % This is an inline comment.
Y = "Despite the %, this isn't a comment.".
So, to be reliable, a comment extractor would need
to understand a large fraction of Erlang's concrete syntax.
Fortunately, there are some possible hacks and workarounds.
The "false positive" problem only affects comments within functions.
So, we can handle inter-function comments in a straightforward manner.
Here's a possible approach:
- Generate an AST from the raw Erlang source code.
- Determine the lowest and highest line numbers for each function.
- Generate a list of tuples defining boundaries of comment sections.
- Collect lists of inter-function comments, trimming blank lines
between blocks and at the start and end of each list.
- Generate a phony function for each "meaningful" list.
So, the following set of comment lines:
% This is a comment.
% This is another.
% And yet another.
might be turned into these sorts of phony functions:
[ "% This is a comment.",
"% This is another." ].
[ "% And yet another." ].
Intra-function comments (especially inline ones) look more challenging.
Aside from being harder to detect,
we need ways to encode them in the AST and represent them in Elixir code.
So, this is still an open problem.
Erlang doesn't have anything equivalent
but some comment blocks could be translated into them.
Some heuristics may allow us to perform this translation:
- A series of single-line comments and / or blank lines
can be treated as a comment block.
- A comment block preceding an attribute declaration
can be translated into a
- A comment block found between a pair of functions
can be translated into an
@doc for the second function.
Unfortunately, this leaves us with some outliers, including:
- comment blocks between the last attribute declaration
and the first function
- comment blocks after the last function
Experimentation with real-world code bases
should help us find ways to resolve these and similar issues.
The Syntax Tools
might be of use in parsing Erlang code,
although it doesn't use the same AST format.
The parsing could also be performed
by the Parse Tools
or even a hand-written parser.
Thanks to Robert Virding for his patience and helpful suggestions.
This wiki page is maintained by Rich Morin
an independent consultant specializing in software design, development, and documentation.
Please feel free to email
comments, inquiries, suggestions, etc!