Edit Hacks - awsb

This page shows a "proof of concept" implementation of a script (awsb) to analyze and display white space boundaries. It prints each line in the input (with tabs expanded into spaces), then prints a line that indicates the location of "boundary" character positions.

Note: I wrote this script in a procedural dialect of Ruby, because that works well for simple, stand-alone utilities. However, it wouldn't surprise me to see it being recoded in other languages (e.g., Elisp, Elixir, Elm) for use in other execution contexts.

Code

I've split up the code into sections, for convenience. The full script is shown at the bottom of this section.

main, etc.

Here's a high-level view of the script. The shebang line (#!/usr/bin/env ruby) causes the script to be interpreted as Ruby code.

#!/usr/bin/env ruby
#
# awsb - analyze white space boundaries
#
#   main
#   |   get_blob
#   |   put_report
#   |   |   get_weight
#   |   |   put_mark
#
# Written by Rich Morin, CFCL, 2016

  def main
    get_blob
    put_report
  end

  def get_blob ...
  def get_weight(char_ndx, line_ndx, range) ...
  def put_mark(char, min_wt) ...
  def put_report ...

  main

get_blob

This function gets and processes an input "blob" (e.g., the contents of a file, as in Git). It uses the expand(1) command to read standard input and expand tabs, saving the result as an array of strings (@lines). It then creates an array of character position weights for the blob (@blob_wts).

  def get_blob
  #
  # Get and process an input blob.

    @lines      = (`expand` + "\n").lines
    @num_lines  = @lines.length
    @blob_wts   = Array.new(@num_lines)

    @lines.each_with_index do |line, line_ndx|
      chars       = line.chars
      num_chars   = chars.length
      @blob_wts[line_ndx] = Array.new(num_chars, 0.0)

      # Calculate weights of character positions,
      # based on the count of preceding spaces.

      cnt_before  = 0
      chars.each_with_index  do |char, char_ndx|
        char_wt = @blob_wts[line_ndx][char_ndx]  ||= 0.0
        if (char == ' ')
          cnt_before  += 1
        else
          char_wt     += [ cnt_before, 2 ].min
          cnt_before   = 0
        end
      end
    end
  end

get_weight

This function scans corresponding character positions in nearby lines. The resulting weight is calculated by summing their raw weights, scaled by the number of lines in the range.

  def get_weight(char_ndx, line_ndx, range)
  #
  # Calculate the weight for this character position.

    min_ndx   = [ line_ndx - range, 0              ].max
    max_ndx   = [ line_ndx + range, @num_lines - 1 ].min
    range     = 1.0 + max_ndx - min_ndx
    weight   = 0.3

    min_ndx.upto(max_ndx) do |windex|
      tmp_wt  = @blob_wts[windex][char_ndx]
      weight += (tmp_wt ? tmp_wt : 0) / range
    end

    weight
  end

put_mark

This function selects and outputs a mark (e.g., ^, -), based on the number of preceding spaces, the position weight, etc.

  def put_mark(char, min_wt, weight)
  #
  # Select and output a mark.

    if (char == ' ')
      print '-'
      @cnt_before  += 1
    else
      tmp_wt  = weight * @cnt_before
      if (   @cnt_before  > 2  ||
           ( @cnt_before  > 0  && tmp_wt >= min_wt) )
        print '^'
      else
        print '-'
      end
      @cnt_before   = 0
    end
  end

put_report

This function performs the final data processing and output the report.

  def put_report
  #
  # Process data and output a report.

    min_wt   = 0.4
    range    = 9

    @lines.each_with_index do |line, line_ndx|
      chars       = line.chars
      num_chars   = chars.length
      chars.each { |char| print char }

      # Calculate character position weights, based on nearby
      # and current content, then indicate probable boundaries.

      @cnt_before  = 0
      0.upto(num_chars-2) do |char_ndx|
        weight = get_weight(char_ndx, line_ndx, range)

        char  = @lines[line_ndx][char_ndx]
        put_mark(char, min_wt, weight)
      end
      puts
    end
  end

Full Script

Here is the full script is shown at the bottom of this section.

#!/usr/bin/env ruby
#
# awsb - analyze white space boundaries
#
#   main
#   |   get_blob
#   |   put_report
#   |   |   get_meta
#   |   |   put_mark
#
# Written by Rich Morin, CFCL, 2016

  def main
    get_blob
    put_report
  end


  def get_blob
  #
  # Get and process an input blob.

    @lines      = (`expand` + "\n").lines
    @num_lines  = @lines.length
    @blob_wts   = Array.new(@num_lines)

    @lines.each_with_index do |line, line_ndx|
      chars       = line.chars
      num_chars   = chars.length
      @blob_wts[line_ndx] = Array.new(num_chars, 0.0)

      # Calculate weights of character positions,
      # based on the count of preceding spaces.

      cnt_before  = 0
      chars.each_with_index  do |char, char_ndx|
        char_wt = @blob_wts[line_ndx][char_ndx]  ||= 0.0
        if (char == ' ')
          cnt_before  += 1
        else
          char_wt     += [ cnt_before, 2 ].min
          cnt_before   = 0
        end
      end
    end
  end


  def get_weight(char_ndx, line_ndx, range)
  #
  # Calculate the weight for this character position.

    min_ndx   = [ line_ndx - range, 0              ].max
    max_ndx   = [ line_ndx + range, @num_lines - 1 ].min
    range     = 1.0 + max_ndx - min_ndx
    weight   = 0.3

    min_ndx.upto(max_ndx) do |windex|
      tmp_wt  = @blob_wts[windex][char_ndx]
      weight += (tmp_wt ? tmp_wt : 0) / range
    end

    weight
  end


  def put_mark(char, min_wt, weight)
  #
  # Select and output a mark.

    if (char == ' ')
      print '-'
      @cnt_before  += 1
    else
      tmp_wt  = weight * @cnt_before
      if (   @cnt_before  > 2  ||
           ( @cnt_before  > 0  && tmp_wt >= min_wt) )
        print '^'
      else
        print '-'
      end
      @cnt_before   = 0
    end
  end


  def put_report
  #
  # Process data and output a report.

    min_wt   = 0.4
    range    = 9

    @lines.each_with_index do |line, line_ndx|
      chars       = line.chars
      num_chars   = chars.length
      chars.each { |char| print char }

      # Calculate character position weights, based on nearby
      # and current content, then indicate probable boundaries.

      @cnt_before  = 0
      0.upto(num_chars-2) do |char_ndx|
        weight = get_weight(char_ndx, line_ndx, range)

        char  = @lines[line_ndx][char_ndx]
        put_mark(char, min_wt, weight)
      end
      puts
    end
  end


  main

Example

Here's an (abridged) example of awsb processing its own source code:

$ chmod +x awsb
$ awsb < awsb
...
      @cnt_before  = 0
------^------------^--
      0.upto(num_chars-2) do |char_ndx|
------^--------------------------------
        get_meta(char_ndx, line_ndx, range)
--------^----------------------------------


        char  = @lines[line_ndx][char_ndx]
--------^-----^---------------------------
        put_mark(char, min_wt)
--------^---------------------
      end
------^--
      puts
------^---
    end
----^--
  end
--^--
...


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r4 - 28 Jun 2016, RichMorin
This site is powered by Foswiki Copyright © by the contributing authors. All material on this wiki is the property of the contributing authors.
Foswiki version v2.1.6, Release Foswiki-2.1.6, Plugin API version 2.4
Ideas, requests, problems regarding CFCL Wiki? Send us email