2009-12-03

GraphViz for Fun and Profit

A long time ago, in a data center far, far away, we had a NIS server. It was a good little NIS server. It knew of usernames, and passwords, and automounter maps, and netgroups. Ah, the joys of NIS netgroups. There is one basic form of netgroup that we used:

([computer],[user],-)

Specifies a computer, user, or computer/user combination ("-" indicates "any user"). In addition, netgroups can contain other netgroups, so you get a nice graph of which netgroups are part of other netgroups.

Along with this wonderful little NIS server, there was an old Python script which would run through and generate a set of webpages to graphically display netgroup relations. If netgroup A contains netgroups B and C, and C contains D, we could check the generated graph to see:

    A
   / \
  B   C
      |
      D

Fast forward a decade or so to an insanely large NIS tree, with netgroups for things we've long since thrown away (e.g. line-printer-access). When we upgraded from NIS to LDAP as the backend directory service, we kept the netgroups around, though. Still in their NIS format. Of course, our wonderful little Python app broke because of the transition to LDAP, but nobody wanted to admit to knowing enough Python to fix it.

So in working on the replacement for our Python app, I got to learn a whole new set of tools. Graphviz is a suite of tools for displaying graphs and trees. A friend of mine showed me his Perl program to generate "dot" diagrams for Tic-tac-toe (turns out the only winning move really is not to play), and I realized that just redoing the generator in ruby or Perl would be much easier than trying to tackle Python's LDAP bindings (not to mention debugging Python).

The "dot" language spec, while exhaustive, isn't very helpful for learning the language. The User Manual, though slightly dated, serves as a much gentler introduction. The syntax allows for a lot of room for error, once you get the basics down: semi-colons are (nearly) optional; whitespace isn't counted for or against you; and nodes that aren't pre-declared are created without complaint.

The most basic directed graph is something like:

digraph G {
    foo -> bar;
    bar -> blort;
    foo -> quux;
}

Generating that from a list of netgroups in ruby is also quite simple:

puts "digraph G {"

netgroups.each do |netgroup|
    netgroup.children.each do |child|
        puts "\"#{netgroup.name}\" -> \"#{child.name}\";"
    end
end

puts "}"

Which can then be used to generate a nice image by:

./generator.rb | dot -Tpng -o image.png

This is exactly what we used to have (except in dia) for our Python/NIS version, except that that had a page for the subgraph at every node. So, clicking on C (using dot-generated HTML image maps) should pop up a page with:

    C
    |
    D

The original script had all the busy work to figure out which nodes to include on subgraphs done in native Python, and tons of generated dia files that got passed on to generate the images and image maps. Graphviz, however, includes "gvpr," an awk-like tool for graphs written in "dot."

BEGIN{} and END{} blocks function exactly the same as in awk.
BEG_G{} and END_G{} blocks are called at the beginning and end of each graph to be processed.
N{} blocks are called for each node, and
E{} blocks are called for each edge.

Just like awk, there is an optional test parameter (inside [], not //) to specify which "things" to operate on. For example:

N [ color == "blue" ] { color = "red" }
E [ color == "red" ] { color = "blue" }

turns all the blue nodes red and all the red edges blue. There is a whole API built in to work with graphs, nodes, and edges within gvpr, which includes a rich set of tools for "subgraphs."

Say we want to generate a graph from a couple nodes in the original graph we were given (known as "root" in gvpr).

graph_t g = subg( root, "SubGraph" ); /* create a new subgraph */
subnode( g, node( root, "C" ) );  /* add node "C" to the subgraph */
subnode( g, node( root, "D" ) );  /* add node "D" to the subgraph */

Now, this is just the nodes. To copy the edges, we could loop through all the edges from "C" (there are for and while in gvpr), or, we can "induce" the edge relations:

induce( g );

Which the man page states "extends g to its node-induced subgraph extension in its root graph." Roughly translated to English, this means "draw all the edges as they are in the original graph."

Add in some writeG() calls to output to files, and several Python files are now around 15 lines of gvpr, 40 lines of ruby, and a shell script to fire it all off.

:wq

No comments:

Post a Comment