Parse Trees

Getting Them

Scanning Ruby code (eg, for method definitions and uses) is a bit tricky. Fortunately, most of the hard work has already been done. After parsing the code and turning it into an S-expression (aka sexp), we can walk through this Ruby data structure and pick up the needed info. MRI (Matz's Ruby Interpreter) does not provide a way to generate sexps, but Ryan Davis's ParseTree gem does:

ParseTree is a C extension (using RubyInline) that extracts the parse tree for an entire class or a specific method and returns it as a s-expression (aka sexp) using ruby's arrays, strings, symbols, and integers.

http://rubyforge.org/forum/forum.php?forum_id=17624

So, it should be possible to grab arbitrary Ruby files (and even erb files, with some work) and extract sexps from them. The Bad News is that YARV is reputed to break ParseTree irreparably; the Good News is that Rubinius is coming along fast. So, it should be possible to use ParseTree for a while, then (if need be) segue to Rubinius. For that matter, we'll probably want to jump to Rubinius in any case, as it enables an amazing range of Cool Tricks...

Using Them

Ruby sexps are encoded as hierarchies of nested arrays, populated by scalars (eg, Fixnums, Strings, and Symbols). Although these can certainly be traversed by hand-written code, the SexpProcessor class (included with ParseTree) provides some useful tools.

Method Binding

One of our principal tasks, in Arti, is determining Method Binding (ie, Dependency Analysis). Fortunately, a dependency analyzer is provided as an example of using ParseTree and SexpProcessor. With luck, this code shouldn't be too hard to adapt to our needs.

Rails "Declarations"

Although Ruby doesn't really contain any declarative statements, Rails likes to pretend that it does. For example, the ActiveRecord class provides a variety of methods such as belongs_to, has_many, and has_one. Because these methods are generally called at the top level of a class, using Symbols as arguments, the parse tree will contain unambiguous information about the "declaration" being made. For example:

 
$ irb --simple-prompt
>> %w{rubygems parse_tree pp}.each {|i| require i }
=> ["rubygems", "parse_tree"]

>> c = 'has_many :wombats'
=> "has_many :wombats"
>> pp ParseTree.translate(c)
[:fcall, :has_many, [:array, [:lit, :wombats]]]
=> nil

Amusingly, the example above uses require in a manner that would not be particularly amenable to this sort of analysis. Fortunately, most Rails programs do not contain the following sort of code:

 
>> c = '[ :kumquats, :radishes, :wombats ].each {|i| has_many i }'
=> "[ :kumquats, :radishes, :wombats ].each {|i| has_many i }"
>> pp ParseTree.translate(c)
[:iter,
 [:call,
  [:array, [:lit, :kumquats], [:lit, :radishes], [:lit, :wombats]],
  :each],
 [:dasgn_curr, :i],
 [:fcall, :has_many, [:array, [:dvar, :i]]]]
=> nil

Also, even if our code cannot analyze these sorts of "dynamic declarations", it can certainly flag them for a human to look at, analyze, and describe in terms that Arti can use.

public, protected, and private

Calls to public, protected, and private act very much like declarations. When made without arguments, moreover, their placement in the code affects the interpretations of any method definitions that follow.

One of the things we can do with a parse tree is to use the placement of these methods to get clues about the visibility of other methods:

 
>> c="
class Foo
              def pub_fun;  puts 'pub';  end
  protected;  def pro_fun;  puts 'pro';  end
  private;    def pri_fun;  puts 'pri';  end
end"; nil
=> nil
>> pp ParseTree.translate(c)
[:class,
 :Foo,
 nil,
 [:scope,
  [:block,
   [:defn,
    :pub_fun,
    [:scope, [:block, [:args], [:fcall, :puts, [:array, [:str, "pub"]]]]]],
   [:vcall, :protected],
   [:defn,
    :pro_fun,
    [:scope, [:block, [:args], [:fcall, :puts, [:array, [:str, "pro"]]]]]],
   [:vcall, :private],
   [:defn,
    :pri_fun,
    [:scope, [:block, [:args], [:fcall, :puts, [:array, [:str, "pri"]]]]]]]]]
=> nil

Extraction Script

This section presents an example script which extracts information from Ruby files and outputs the result as YAML. In fact, this code is very nearly ready to be used by Arti...

Input Files

The input files for this example are tiny, but they cover most of the common scenarios for method definition. They use both classes and modules, use protected and private declarations, and define both class and instance methods:

 
$ cat x1c.rb
class Foo
  private :f1_pub
              def      f1_pub;  puts      'f1_pub';  end
              def self.f2_pub;  puts 'self.f2_pub';  end
  protected;  def      f1_pro;  puts      'f1_pro';  end
  private;    def      f1_pri;  puts      'f1_pri';  end
end 

class Bar < Foo
              def      b1_pub;  puts      'b1_pub';  end
end


$ cat x1m.rb
module Baz
              def      b1_pub;  puts      'b1_pub';  end
  protected;  def      b1_pro;  puts      'b1_pro';  end
  private;    def      b1_pri;  puts      'b1_pri';  end
end

Output

The script produces two sets of output, separated here for clarity of exposition. The first part of the output is really just a debugging trace; it's the S-expression for the parsed Ruby code:

 
[:block,
 [:class,
  :Foo,
  nil,
  [:scope,
   [:block,
    [:fcall, :private, [:array, [:lit, :f1_pub]]],
    [:defn,
     :f1_pub,
     [:scope, [:block, [:args], [:fcall, :puts, [:array, [:str, "f1_pub"]]]]]],
    [:defs,
     [:self],
     :f2_pub,
     [:scope,
      [:block, [:args], [:fcall, :puts, [:array, [:str, "self.f2_pub"]]]]]],
    [:vcall, :protected],
    [:defn,
     :f1_pro,
     [:scope, [:block, [:args], [:fcall, :puts, [:array, [:str, "f1_pro"]]]]]],
    [:vcall, :private],
    [:defn,
     :f1_pri,
     [:scope,
      [:block, [:args], [:fcall, :puts, [:array, [:str, "f1_pri"]]]]]]]]],
 [:class,
  :Bar,
  [:const, :Foo],
  [:scope,
   [:defn,
    :b1_pub,
    [:scope,
     [:block, [:args], [:fcall, :puts, [:array, [:str, "b1_pub"]]]]]]]]]
[:module,
 :Baz,
 [:scope,
  [:block,
   [:defn,
    :b1_pub,
    [:scope, [:block, [:args], [:fcall, :puts, [:array, [:str, "b1_pub"]]]]]],
   [:vcall, :protected],
   [:defn,
    :b1_pro,
    [:scope, [:block, [:args], [:fcall, :puts, [:array, [:str, "b1_pro"]]]]]],
   [:vcall, :private],
   [:defn,
    :b1_pri,
    [:scope,
     [:block, [:args], [:fcall, :puts, [:array, [:str, "b1_pri"]]]]]]]]]

The second part of the script's output is vanilla YAML, save that we are forcing hashes to be sorted by their keys. Aside from making the output a bit easier to read, This fits Arti's desire to have identical data represented in identical files:

 
--- 
path: 
  ./x1c.rb: 
    class: 
      Bar: 
        from: 
        - :const
        - :Foo
        public: 
          b1_pub: {}

      Foo: 
        private: 
          f1_pri: {}

        protected: 
          f1_pro: {}

        public: 
          f1_pub: {}

          self.f2_pub: {}

    call: 
      :private: 
      - !seq:Sexp 
        - :array
        - !seq:Sexp 
          - :lit
          - :f1_pub
      :protected: []

      :puts: 
      - !seq:Sexp 
        - :array
        - !seq:Sexp 
          - :str
          - f1_pub
      - !seq:Sexp 
        - :array
        - !seq:Sexp 
          - :str
          - self.f2_pub
      - !seq:Sexp 
        - :array
        - !seq:Sexp 
          - :str
          - f1_pro
      - !seq:Sexp 
        - :array
        - !seq:Sexp 
          - :str
          - f1_pri
      - !seq:Sexp 
        - :array
        - !seq:Sexp 
          - :str
          - b1_pub
  ./x1m.rb: 
    module: 
      Baz: 
        private: 
          b1_pri: {}

        protected: 
          b1_pro: {}

        public: 
          b1_pub: {}

    call: 
      :private: []

      :protected: []

      :puts: 
      - !seq:Sexp 
        - :array
        - !seq:Sexp 
          - :str
          - b1_pub
      - !seq:Sexp 
        - :array
        - !seq:Sexp 
          - :str
          - b1_pro
      - !seq:Sexp 
        - :array
        - !seq:Sexp 
          - :str
          - b1_pri

Source Code

Finally, here is the source code for the script. Because we use SexpProcessor, the code is very modular and simple. The main method reads the files, parses them, and processes the resulting sexps. The process_* methods define behavior for SexpProcessor; mostly, this records information in some global variables. Finally, main emits the YAMLized content of out1:

 
#!/usr/bin/env ruby
#
# x1 - read file, parse code, walk sexp, emit YAML; rinse, repeat...  

%w{rubygems parse_tree pp sexp_processor yaml}.each {|i| require i } 

def main

  out1                = {}
  out2 = out1['path'] = {} 

  dir   = '.'
  files = %w{ x1c.rb x1m.rb }

  files.each do |file|

    $path = "#{dir}/#{file}"
    code  = File.open($path).read
    sexp  = ParseTree.translate(code) 
    pp sexp #T

    $out  = out2[$path]  = {} 
    sp    = SP.new.process(sexp) 
  end

  puts out1.to_yaml
end 

class Hash

  # Force hash keys to be output in sorted order.
  #
  def to_yaml( opts = {} )
    YAML::quick_emit( self, opts ) do |out|
      out.map( taguri, to_yaml_style ) do |map|
        self.keys.sort_by {|a| a.to_s }.each do |k|
          map.add( k, self[k] )
        end
      end
    end
  end
end 

class SP < SexpProcessor

  def initialize
    super
    self.strict          = false
    self.auto_shift_type = true
  end

  # Classes and Modules

  def process_class(exp)                # class ...
    name  =         exp.shift
    from  =         exp.shift
    $f_cl = from
    $n_cl = name.to_s
    $t_cl = 'class'
    $vis  = 'public'
    body  = process exp.shift
    return s(:class, name, body)
  end 

  def process_module(exp)               # module ...
    name  =         exp.shift
    $f_cl = nil
    $n_cl = name.to_s
    $t_cl = 'module'
    $vis  = 'public'
    body  = process exp.shift
    return s(:module, name, body)
  end 

  # Method Definitions

  def process_defn(exp)                 # def foo ...
    name  =         exp.shift
    args  = process exp.shift
    body  = process exp.shift
    save_def("#{name}")
    return s(:defn, name, args, body)
  end 

  def process_defs(exp)                 # def self.bar ...
    recv  =         exp.shift
    name  =         exp.shift
    args  = process exp.shift
    body  = process exp.shift
    save_def("#{recv}.#{name}")
    return s(:defs, recv, name, args, body)
  end 

  # Method Calls

  def process_fcall(exp)                # foo(42)
    name  =         exp.shift
    args  = process exp.shift
    save_call(name, args)
    return s(:fcall, name, args)
  end 

  def process_vcall(exp)                # bar()
    name  =         exp.shift
    $vis  = name.to_s \
      if name.to_s =~ /^(public|protected|private)$/
    save_call(name, nil)
    return s(:vcall, name)
  end 
end 

def save_def(name)
  $out[$t_cl]                     ||= {}
  $out[$t_cl][$n_cl]              ||= {}
  $out[$t_cl][$n_cl]['from']      ||= $f_cl if ($f_cl)
  $out[$t_cl][$n_cl][$vis]        ||= {}
  $out[$t_cl][$n_cl][$vis][name]  ||= {}
end 

def save_call(name, args)
  $out['call']                    ||= {}
  $out['call'][name]              ||= []
  $out['call'][name]  <<  args if (args)
end 

main

Resources


This wiki page is maintained by Rich Morin, an independent consultant specializing in software design, development, and documentation. Please feel free to email comments, inquiries, suggestions, etc!

Topic revision: r12 - 28 Jun 2008, RichMorin
This site is powered by Foswiki Copyright © by the contributing authors. All material on this wiki is the property of the contributing authors.
Foswiki version v2.1.6, Release Foswiki-2.1.6, Plugin API version 2.4
Ideas, requests, problems regarding CFCL Wiki? Send us email