Linkage Types

Much of the information that Meta will manage is in the form of "linkage" between items, as opposed to free-standing "facts". For example, the ownership (UID) of a file can be used as linkage information, but the size cannot. This page attempts to summarize Meta's linkage needs and strategies.

The early focus of Meta development was on FreeBSD. Consequently, the discussion below is focused on the needs of a Meta implementation that handles only the basic FreeBSD system (including everything but X11). A full-scale Meta implementation (e.g., including all Open Source offerings) would require far more linkage information. See the end of the page for a rough estimate.

It is possible, if one knows a rule, to generate a set of explicit links. For example, it would be trivial to generate explicit links that tie files in /usr/bin to corresponding man pages and build directories. Assuming that disk space and batch compute time are free, this seems like a good "brute force" solution.

In some cases, however, this strategy fails. For example, it is impossible (or at least, unreasonable) to generate static information for each possible file in /var/spool/mqueue. On the other hand, a given file name (e.g., dfVAB20435) can easily be recognized and parsed by a regular expression. In short, some rules must be left as rules, rather than being instantiated as explicit links.

In the production Meta system, these rules might be implemented as objects within the database. In the current prototype, they can be coded into the CGI script and/or encoded within the file tree.

Commands

  • commands to error messages (and explanations)

    The source code for commands includes the code to generate error messages. By extracting and annotating error information, Meta can help to clarify ambiguous diagnostics.

  • commands to manual pages and build directories

    • Commands in certain directories generally have manual pages (in corresponding sections) and build directories with corresponding names:
          Command Directory        Section            Source Directory
          -----------------        -------            ----------------
      
          /bin                     1                  /usr/src/bin
          /sbin                    8                  /usr/src/sbin
          /usr/bin                 1                  /usr/src/usr.bin
          /usr/games               6                  /usr/src/games
          /usr/sbin                8                  /usr/src/usr.sbin
         
    • In some cases, the source code for a command (e.g., sendmail(8)) is stored in a separate "contrib" directory (e.g., /usr/src/contrib/sendmail/src). The Makefile in the command's source directory contains the necessary linkage information.

    • The /usr/obj tree contains relocatable object files, "built" executables, and manual pages for the source code subtree(s). Thus, /usr/obj/usr/src/bin/cat contains assorted files that are related to /bin/cat.

  • commands to files, devices, and directories

    Some commands "know about" given parts of the file system. For instance, sendmail(8) knows about /var/spool/mqueue. A Meta system should be able to explain which files are used by which commands, how, and why. In general, the needed linkage information can be elicited from the command's source code, but the process requires the participation of a competent programmer.

  • commands to function (and other) definitions

    Source code files for commands may contain references to data structures, functions, include files, and macros. A rather large number of realationships may arise out of a single external reference. When a C function, for instance, references a function or macro that is defined elsewhere, several links mmay be generated:

        function_a                 calls              function_b
        function_a                 is_in              file_a
        function_b                 is_in              file_b
        function_b                 is_in              library_c
        function_b                 has_man_page       b(3)
        function_b                 calls              syscall_d
        syscall_d                  has_man_page       d(3)
       

Control Files

  • control files to manual pages

    System control files (e.g., in the /etc subtree) often have manual pages in section 5, but the correspondence is rather loose. A hand-edited list (in progress) performs this mapping.

  • linkage information within control files

    Many control files contain linkage information. For example, the /etc/passwd and /etc/group files relate uids and gids, respectively, to user and group names.

File System

  • device nodes to manual pages

    The device nodes in /dev generally have manual pages, but the correspondence between device names and manual pages can be rather loose (e.g., nrwt0 is documented by wt(4). There are also conventions which map node names to minor numbers, etc. A hand-edited list (in progress) performs this mapping.

  • file system navigation

    EclecticSystems have notions of file system navigation (e.g., parent and child relationships, hard and soft links) which allow programs to traverse file trees. Access rules, based on ownership and permision data, restrict the things that programs are able to do. Linkage information about the file system can let Meta predict system behavior and (perhaps) explain it after the fact.

  • miscellaneous name formation schemes

    Like /dev, several other directories and sub-trees have semi-regular to very regular name formation schemes, as:

        /proc                  name is PID of process
        /usr/share/groff_font  names signify font families, etc.
        /usr/share/calendar    names signify calendar types
        /usr/share/info        names signify relevant commands
        /usr/share/man         names signify man sections and pages
        /usr/share/me          names signify groff_me(7) macro families
        /usr/share/tmac        names signify troff(1) macro families
        /usr/share/zoneinfo    names signify time zones
        /var/spool/mqueue      [dqx]f[AAA]nnnnn
       

  • manual pages to files

    As hinted above, a manual page foo(#) will have corresponding files in /usr/share/man:

        cat#/foo.#.gz          formatted (nroff output)   version
        man#/foo.#.gz          raw       ([nt]roff input) version
       

Storage Requirements

As noted above, early development focused on FreeBSD. The 4.1 system (including everything but X11) contained about 50 K files and directories. The "src" components accounted for perhaps 2/3 of this.

Assuming that each item in the distribution generates 20 links, we would need about 1 M links to handle everything. This assumption is a bit high for most files, but low for many control files, source files, etc.

A full-scale Meta implementation might cover the 10 K distinct packages that are listed in FileWatcher. If each of these contains 50 files, each needing 20 links, we would need about 10 M links to handle everything.

Note: Mac OS X (including the Developer Tools and X11) contains in excess of 200K files and directories.

-- Main.RichMorin - 16 Jun 2003
Topic revision: r5 - 08 Jun 2003, WikiGuest
This site is powered by Foswiki Copyright © by the contributing authors. All material on this wiki is the property of the contributing authors.
Foswiki version v2.1.6, Release Foswiki-2.1.6, Plugin API version 2.4
Ideas, requests, problems regarding CFCL Wiki? Send us email