Pygments

Handling indentation (etc) for various programming languages is a critical part of our handling of Monospace text. Fortunately, the Pygments utility solves a large part of this problem very nicely. Pygments is a syntax highlighter and prettyprinter, written in Python. It handles 300+ languages, as well as other text formats.

As discussed in Write your own lexer, adding lexical analysis for a new language appears to be fairly easy. Lexers are written largely using declarative programming: associative arrays, containing regular expressions, strings, tuples, and (if need be) references to callback methods.

Several output formats are available, including HTML, RTF, LaTeX, and ANSI escape code sequences. We're using HTML, with Pygments' generated CSS file. In this format, a div (of class highlight) contains a set of code snippets, wrapped in span elements whose class indicates the desired display style.

Expected Usage

Pygments recognizes and delineates keywords and other tokens from about 300 formats. We can use this information in a number of ways, including code coloring, code folding, and indexing.

Code Coloring

Code coloring has many benefits for sighted readers. In a static context, such as our code reading tool, styling choices (e.g., color, font, size) can make certain elements easier to recognize, speeding up some kinds of visual scanning. Color choices can also help to produce a desired "feel" to the page.

In a dynamic context, such as a text editor, code coloring can alert the sighted programmer about mismatched quotes and other issues whose effects ripple through the code. However, it's not clear how to let a blind programmer gain similar value.

It might be useful to add naive introspection and navigation capabilities, based on Pygments tokens. For example, by selecting an item and requesting a contextual menu, the user could find out what kind of token it has, search for other uses of the item, etc.

Code Folding

Amanda's Get_Ruby_Code() module scans Ruby code for tokens (e.g., keywords). It uses this information to determine the line ranges of Ruby methods and associated code. Finally, it returns a data structure (list of hashes) describing each foldable section.

The scanning code, which uses regular expressions and hand-coded logic, only handles a few tokens for a single language. Using Pygments as a code scanning front end would let us handle a much larger range of tokens, covering about 300 languages and other text formats.

Note, however, that Pygments is not a complete replacement for our scanning code. For example, it can't recognize the terminating end keyword for a method. Similar issues will exist for other languages and formats, so we'll need a way to recognize sections for each language we wish to support.

Indexing

By scanning the marked-up code, we can generate an index of significant items (e.g., classes, methods, modules). This could, for example, be used to generate a page-level Table of Contents. Alternatively, it could help to support navigation.

Classes and Tokens

The following table is adapted from the STANDARD_TYPES literal in token.py and the Pygments Builtin Tokens page.

Token Class Notes
Comment c any comment
Comment.Hashbang ch hashbang (aka shebang) comments
Comment.Multiline cm multiline comments
Comment.Preproc cp preprocessor comments
Comment.Single c1 comments that end at the end of a line
Comment.Special cs special data in comments
Error err represents lexer errors
Escape esc ?
Generic g generic, unstyled token
Generic.Deleted gd marks token as deleted
Generic.Emph ge marks token as emphasized
Generic.Error gr marks token as an error message
Generic.Heading gh marks token as a heading
Generic.Inserted gi marks token as inserted
Generic.Output go marks token as program output
Generic.Prompt gp marks token as a command prompt
Generic.Strong gs marks token as bold
Generic.Subheading gu marks token as a subheading
Generic.Traceback gt marks token as a part of an error traceback
Keyword k any kind of keyword
Keyword.Constant kc keywords that are constants
Keyword.Declaration kd keywords used for variable declarations
Keyword.Namespace kn keywords used for namespace declarations
Keyword.Pseudo kp keywords that aren’t really keywords
Keyword.Reserved kr reserved keywords  
Keyword.Type kt builtin types that can’t be used as identifiers
Literal l any literal
Literal.Date ld date literals
Name n any name
Name.Attribute na names of attributes (e.g. in HTML)
Name.Builtin nb builtin names
Name.Builtin.Pseudo bp builtin names that are implicit
Name.Class nc names of classes
Name.Constant no names of constants
Name.Decorator nd names of decorators
Name.Entity ni special entities (e.g., in HTML)
Name.Exception ne names of exceptions
Name.Function nf names of functions or methods
Name.Label nl names of statement labels
Name.Namespace nn names of namespaces
Name.Property py names of properties
Name.Tag nt names of tags (e.g., in HTML)
Name.Variable nv names of variables
Name.Variable.Class vc names of class variables
Name.Variable.Global vg names of global variables
Name.Variable.Instance vi names of instance variables
Number m any number
Number.Bin mb binary number
Number.Float mf floating point number
Number.Hex mh hexadecimal number
Number.Integer mi integer number
Number.Integer.Long il long integer number
Number.Oct mo octal number
Operator o any punctuation operator
Operator.Word ow any operator that is a word
Other x data not matched by the parser
Punctuation p any punctuation which is not an operator
String s any string literal
String.Backtick sb strings enclosed in backticks
String.Char sc single characters
String.Doc sd documentation strings
String.Double s2 strings enclosed in double quotes
String.Escape se escape sequences in strings
String.Heredoc sh "here document" strings
String.Interpol si interpolated parts in strings
String.Other sx other strings
String.Regex sr regular expression literals
String.Single s1 strings enclosed in single quotes
String.Symbol ss symbols (interned strings)
Whitespace w spaces and tabs


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r12 - 17 Dec 2016, RichMorin
This site is powered by Foswiki Copyright © by the contributing authors. All material on this wiki is the property of the contributing authors.
Foswiki version v2.1.6, Release Foswiki-2.1.6, Plugin API version 2.4
Ideas, requests, problems regarding CFCL Wiki? Send us email