2010-04-26

Monitoring Your Infrastructure with Zabbix

The last time I had an enterprise system monitoring implementation to work with, it was already set up and the organization had been using it for several years.

This time, I got to do a little research and set up a system at home to monitor my various "machines." After playing around with Cacti, I decided that seeing my logs fill with PHP deprecation warnings was a little too scary and started up a zabbix server.

Zabbix definitely takes more resources, both in terms of the server and the machines being monitored. Like everything else, it uses a database backend and has a nice clean web frontend, so no real differences there. By default, you can graph any piece of data you gather (including the version of Zabbix, if you want it), and you can create more complex graphs fairly easily. It took me a few minutes to get a cheap load average graph that works quite well. Paging is actually a more robust "notification" system that supports Email, SMS, Jabber, and custom scripts, so you can extend it to do whatever you can think of (fax? log to a file? print a .pdf from your boss' printer whenever a server goes down?).

For the truly lazy, Zabbix offers a .vmdk built on SuSE that you can just fire up and have a running Zabbix server.

Monitoring, graphing, paging...there are all the same tasks that any monitoring system does.

Where Zabbix's architecture differs from Nagios, Zenoss, Cacti, et. al., is that it's a client-server system. This took me a while to grok, since "client" and "server" aren't what you'd normally expect them to be (think iSCSI). The "Zabbix Server" box dials out to an (access-list restricted) open port on the "client" and retrieves the data from a running service. There's even a Windows version of the client available, if you're into that sort of thing.

This means that there is a daemon running on the client to gather data and send it to the server and a daemon running on the server to periodically go and nab the data from the clients. The clients need to know where to trust connections from (by IP, so watch out if you have AAAA's), and then the hosts have to be manually added to the Zabbix server. For rolling out to an existing infrastructure this is more work than a pure SNMP-monitoring solution that can just scan your network and learn all your hosts.

A side-effect of this is that Zabbix has a proxy server available. You could have a host on each subnet (or at each site) gather all the information, and then submit bulk updates to a central Zabbix server, or implement even more tiers of proxying if your infrastructure requires it.

The biggest "gotcha" to just getting a Zabbix server running is that the front-end isn't necessarily bundled with the server. The only requirement for the web front-end is that it can talk to the database, so you can run the front-end, Zabbix server, and database on 3 different systems if you want to. The front-end is all PHP, so your existing Apache server is more than adequate.

Zabbix also does some basic host-based intrusion detection. If you're going to be running something more beefy (Samhain/Beltane, for example), then this is likely just a waste of cycles. Still, this is leaps and bounds above the option of no host-based intrusion detection.

:wq

2010-04-25

Duplicate IPv6 Address Detection in Karmic

I've finally gotten my little empire of virtual machines somewhat stable, but a little wrench in the works has been the extra flag I keep needing for ssh: -4.

Sure, I can log in without it, but the systems will try using IPv6 first, since they all have IPv6 addresses in DNS, and then fall back to IPv4 when that times out. I could use my .ssh/config file to force IPv4, but that gets rid of the nagging reminder that I need to fix it.

Turns out IPv6 is broken by default in Karmic (and hopefully fixed in Lucid, but I'm not holding my breath). One of the "features" of IPv6 is Duplicate Address Detection; you should never have the problem of having the same IPv6 address allocated to 2 computers unless you do it intentionally. Of course, EUI-64 addressing should take care of that anyway...

Unfortunately, Karmic's implementation of DAD is horribly broken. Even with unique entries in DNS and a properly configured rtadvd, I kept getting logs filled with:

Apr 25 13:49:59 vm1 kernel: [ 6851.010394] eth0: IPv6 duplicate address detected!
Apr 25 13:50:01 vm2 kernel: [147906.360484] eth0: IPv6 duplicate address detected!
Apr 25 13:50:00 vm3 kernel: [315943.970527] eth0: IPv6 duplicate address detected!

So, I tried to configure IPv6 manually. But, Ubuntu is smarter than me (or so it thinks). Once it's detected that the address is a duplicate, you're not allowed to actually assign it to an interface. All the packets were attempting to be sent out with a source ip of ::1, so they all went out the lo interface which effectively firewalls the whole IPv6 stack from the outside world.

The only solution is to turn off Duplicate Address Detection.

sysctl net.ipv6.conf.all.dad_transmits=0
sysctl net.ipv6.conf.all.accept_dad=0

Perfect!

Well, not quite. See, "all" in sysctl doesn't actually mean "all." I'm not sure what it means, but it seems to be the exact opposite of "all," i.e. "none."

What I actually had to do was explicitely disable DAD on both lo and eth0. lo shouldn't need it, since all IPs on loopback are the same box anyway. If I see an IPv6 loopback packet storm, maybe I'll think about turning DAD back on for lo.

The final sysctl.conf:

net.ipv6.conf.all.dad_transmits = 0
net.ipv6.conf.all.accept_dad = 0
net.ipv6.conf.lo.dad_transmits = 0
net.ipv6.conf.lo.accept_dad = 0
net.ipv6.conf.eth0.dad_transmits = 0
net.ipv6.conf.eth0.accept_dad = 0

Phew! Good thing for me I have this all in Puppet, so it's easy to replicate across all my hosts:

exec { "sysctl -p":
  cwd         => '/',
  path        => '/bin:/sbin:/usr/bin:/usr/sbin',
  refreshonly => true,
  subscribe   => File[sysctl_config]
}

:wq

2010-04-22

Copy-on-write Semantics on DABL

I've been thinking of a way to do this for a while, and finally got some time to hack out a prototype yesterday.

First, the problem statement. We have a piece of data (street address) that needs to be associated with multiple entities. Basically, I want a many:many relationship between them, but with a catch. Once the address is attached to an entity, it should be manageable on a per-entity basis.

The obvious solution is to make a copy of the data and just attach the copy to the second entity. That's fine, but it's not unrealistic to have 1,000 entities with the same information. That means 1,000 copies of the data, just in case it might change.

So, I implemented a copy-on-write save() method for DABL, our pet ORM/MVC.

(editor's note: DABL is now hosted on github)

The basic idea is that we need to generate a new ID for the record, then pass that new record off to a subset of the related records. If all the related records are to be modified, there's nothing to be gained (except revision history; if you want that, it's trivial) from creating a new record. Rather than letting the database update all the related records, we have to do it ourselves since we only want to CASCADE some of them.

For the source of my prototype pet project, see DABL cow-save() or fork my nonblocking-random repository on github. The basic database setup is a Userdata table which has many Users. Userdata holds the equivalent of GECOS information (firstname, lastname), while User holds the login information (username, password). The production version of this has much more data in the related table and is written in Propel, which makes it significantly uglier.

:wq

2010-04-03

Managing your Checking Account with PocketMoney

Balancing your checkbook is one of those chores, like taxes, that everyone hates but does anyway. Of course, unlike taxes, you slack off for a month and the feds don't come busting down your door. So you slack off for another month...

If you're like me, you have several abortive attempts to use the check register that came with your checkbook to manage your money. My current check register even includes a handy three-year calendar for 2003, 2004, and 2005. The only real use I have for this ancient piece of banking technology is to track when I've overdrawn my account. After each time that happens, there's another 1/4 to 1/2 page of studiously usage until I start forgetting again.

Part of the problem with using a check register is that hardly anyone even takes checks anymore. I write around 30 or 40 checks a year; all of them are for paying bills, so I don't even bring my checkbook with me when I go out. I also don't generally carry cash, so that means a lot of miscellaneous debit and credit card transactions. And, lets be honest, who actually wants to carry a checkbook just to record all of those so they can balance their checkbook at the end of the month?

Of course, one thing I do always have with me is my iPhone. And, as they say, there's an app for that (several of them, actually). PocketMoney is the one that I finally settled on. There's a free trial, so really there's no excuse not to at least take it for a test drive. Training myself to use this was definitely easier than training to use a check register, and the built-in budgeting tools are impressive. When I've got the app handy, "I'll record this $1.40 at Starbucks when I get home" is really hard to justify.

One of PocketMoney's most useful features is the different budgeting schedules. If you get paid bi-weekly, enter that as the repeat period. The program will automatically pro-rate your budget based on how many weeks (or partial weeks) are in the given month. Enter Rent monthly, Groceries weekly, Salary bi-weekly, and Auto Insurance quarterly, and the app will take care of all the calculations to give you correct budget breakdowns by any time range you choose.

For me, though, the real power of this app comes when combined with online banking and online bill-pay. The recurring transactions feature lets you schedule automatically repeating transactions, so you never forget about an automatic withdrawal again. Online banking means that I can balance my "checkbook" against my bank account daily, weekly, or just whenever I remember and have an extra 10 minutes to spend on it. For me, this is a whole lot more convenient than remembering at the beginning of the month when I get my bank statement in the mail. Plus, since I have everything synchronized, I know how much money I don't have and can keep from overdrawing yet again.

:wq

2010-01-31

Do the Wave...

My first abortive attempt at a Google Wave bot was an IRC <-> Wave bot that would let me look at IRC over Wave (or, more likely, Wave over IRC).

The second bot I've attempted actually works, though, so naturally I'm more proud of that one. Inspired in part by an Ars Technica article, my D&D group decided to investigate gaming over Wave. I still think that OpenRPG is a better overall solution, but in a pinch, Wave could work. So we added the bot that they used for the article and played around with it a little bit.

For those not familiar with how D&D die rolls work, I recommend D&D for Dummies, but the basics should suffice for this example: NdM indicates rolling N M-sided dice. For monopoly, you use 2d6 (two, six-sided dice). For D&D, the standard roll is 1d20, plus or minus some modifiers. From there, you build up more and more complicated expressions until you're a level 10 Rogue/Warlock with 22 Dex waving your +3 Dagger at the zombie and you roll up 1d20+5+6+1+3+1+3+1 vs Reflex (and yet still miss!).

But I digress. All those modifiers you add to a die should be cumulative and per total roll: 2d6+3 is two, six-sided dice, plus 3. The die rolling wave bot linked by Ars Technica treats the modifier as modifying each die, so you get two, six-sided-dice-plus-three, effectively doubling the modifier (1d6+3 + 1d6+3).

Well, I had a dice-roller before that kind of worked (from my IRC adventures), so I decided to take my work on version 2 of that and port it to Java. Why Java? Because I hate Python. And for a Wave robot, those are the only choices.

So, I loaded up my least favorite IDE, reminded myself why I hate Eclipse, switched to NetBeans, lost a hard drive (which means I owe the internet the obligatory "Back Up Your Data" blog post), forgot all about it for a couple days, installed NetBeans on a different computer, and settled in to write myself a bot. Ah, Java. Good times.

Normally, I'd use a Linux as my IDE of choice, but Netbeans and Eclipse both have plugins for deploying a project to Google App Engine (where all Wave bots call home), and I didn't want to have to worry about that part. Plus, Java is a library language, so integrated docs for the libraries is rather nice.

A wave robot is pretty simple, you register for events, and then write a handler for them. The handler in this case is a little compiler (yes, it lexes and parses the die roll into a syntax tree; I'm a bit of a masochist, apparently), and an "interpreter" that actually rolls the result. Borrowing from OpenRPG, it looks for a roll inside square braces and replaces the contents with the result:

[1d20+5+6+1+3+1+3+1] becomes [1d20+5+6+1+3+1+3+1 = (37)] (I rolled well). There's always more to do with it, but this is a good start for a weekend. To play with the bot, add die-roller@appspot.com to your Wave contacts and then invite it to your favorite dragon-slaying waves. The author asks only a modest share of the XP from any dragon slayed with the help of this tool.

Now for the ugly:

NOWHERE on the internet could I find this little "feature," so I'm going to warn people here. When replacing text in the Java API (TextView.replace()), Annotations get "shifted" by the difference in text length you're inserting. As do Range objects which completely enclose the area. For example:

Range r = new Range(0, 10); // a range from 0 to 1
textview.replace(new Range(1,9), "a bunch more text");
// r is now the range [0, 18] because 8 more characters were inserted

Unfortunately, this happens on Annotations that have already been added (in this jsonrpc call) as well, but incorrectly. They get shifted twice (or one-and-a-half times, or something...they always end up beyond where they should), so this code acts very strangely indeed:

while (!stack.empty()) {
  Range r = stack.pop();
  ...
  textview.replace(r, a_string);
  textview.setAnnotation(r, "style/color", "#336699");
  ...
}

The "correct" way to do this is to keep track of an offset and then shift the Annotations yourself:

int offset;

for(Range r: stack) {
  ...
  textview.replace(new Range(r.getStart()+offset, r.getEnd()+offset), a_string);
  textview.setAnnotation(new Range(r.getStart()+offset, r.getEnd()+offset), "style/color", "#336699");
  offset += a_string.length() - (r.getEnd() - r.getStart());
  ...
}

Obviously, the more Java way of doing this would involve several dozen more Factorys, but at least Google embraces Python and so doesn't do things the "Java way."

Oh, and back up your data.

:wq

2010-01-30

Building a jQuery color picker with CSS-3

One of the apps I work on has a new feature that requires a color picker applet. I went searching around on the internet for something useful, only to find that most of the solutions for color pickers are way overblown for our needs. A simple select box with a dozen or so colors would probably work just fine, but I wanted something a little flashier. The smallest one that would kind-of work that I found out in the wild was 6K. And then it didn't work with the jQuery version we run on the app. An hour or so of Javascript-golf later and I have a basic color-picker applet that fits neatly into a single packet (including the HTTP headers).

This is the un-minified version; see /packer/ for a simple online Javascript-packer.

(function($){
var colors = [
  '#f7b', '#fb3', '#fb7', '#fbb',
  '#f77', '#bf3', '#bf7', '#3f7',
  '#37f', '#7f3', '#7bf', '#b3f',
  '#b7f', '#9af', '#7fb', '#bfb'
];

$(function() {
  $('input.colorpicker').each(function() {
    var $input = $(this),
      $div = $('<div />'),
      $div3 = $('<div />')
        .addClass('colorpicker-bin')
        .hide(),
      $div2 = $('<div />')
        .addClass('colorpicker-button')
        .css({
          'background-color': this.value,
          'cursor': 'pointer'
        })
        .html('&nbsp;')
        .click(function() {
          $div3.toggle("fast");
        });

    $input.css({
      'background-color': this.value,
      'color': '#000'
    })
      .attr('disabled','disabled');

    $(colors).each(function() {
      var color=this;
      var $div4 = $('<div />')
        .addClass('colorpicker-color')
        .css({
          'background-color' : color,
          'cursor': 'pointer'
        })
        .html('&nbsp;')
        .click(function(){
          $div2.css('background-color',color);
          $input.css('background-color',color)
            .val(color);
          $('div.colorpicker-color')
            .removeClass('colorpicker-selected');

          $(this).addClass('colorpicker-selected');
        });

      if (color==$input.val()) {
        $div4.addClass('colorpicker-selected');
        $div3.append($div4);
      }
    });

    $input.wrap($div).after($div2.append($div3));
  });
});
})(jQuery);

On page load (DOM ready, actually), it adds a color-picker to all input elements with class colorpicker. It even works with hidden input boxes. The (function($) { ... })(jQuery); hack lets me use $ throughout the plugin, even with jQuery in no conflict mode (as we do use it in this particular app). Most of the rest of the code is just build up to the onClick() events that do the heavy-lifting. The value of the input box is replaced with the background-color of whichever swatch the user clicks on, so form submission works like normal.

I wired it up with some simple css-3 (shamelessly stolen from Zurb) and I'm in business. @webkit-keyframes paired with the colorpicker-selected class is very neat. Just make sure not to forget about Gecko-based browsers and Internet Explorer.

:wq

2010-01-26

Nested Transactions in MySQL

Transactions are awesome. They can keep your database in a consistent state, without resorting to nasty business logic to cleanse the input first.

A couple months ago, I switched jobs and now I'm working as a Web Developer with a side of System Administration. This has recently meant that I've gotten up close and personal with some of the nastier quirks of MySQL. One of the things I wish it supported better is transactions.

The default storage engine doesn't support them at all, but we mostly use InnoDB tables anyway, so we get "transactions" for free. I use quotes here because these aren't quite the same as what I would expect. For example,

(editor's note: the default storage engine in MySQL 5.5+ is InnoDB, which does support transactions)

mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE foo (...);
Query OK, 0 rows affected (0.00 sec)

mysql> ROLLBACK;
Query OK, 0 rows affected (0.00 sec)

Rolling back a transaction should undo everything in the transaction, right?

Not quite so in MySQL. Observe:

mysql> SHOW TABLES LIKE 'foo';
+----------------------+
| Tables_in_test (foo) |
+----------------------+
| foo                  |
+----------------------+
1 row in set (0.00 sec)

Why do I now have a table `foo`? MySQL COMMITs a transaction, and puts you back into auto-commit mode whenever you add or drop a table. This is a good caveat to know, as you can do destructive things thinking you're safely in a transaction when you're not.

Unfortunately, this means the following doesn't work as intended:

mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)

mysql> ALTER TABLE `foo` ADD FOREIGN KEY `bar_id` REFERENCES `bar` (`bar_id`);
Query OK, 5 rows affected (0.02 sec)

mysql> DELETE * FROM `bar`;
Query OK, 3 rows affected (0.01 sec)

mysql> ROLLBACK;
Query OK, 0 rows affected (0.00 sec)

Here, I have ninja-truncated `bar`. and by "ninja," I mean, "sh*t, where's my backup of that database?"

ALTER TABLE works by first renaming the original table, then creating a new table with the original name and new schema, and then inserting all the records again. This means there's a hidden CREATE TABLE statement lurking in every ALTER TABLE. Which commits the transaction, and puts me back into auto-commit mode. Even though it's between a BEGIN and a ROLLBACK, the DELETE isn't in a transaction at all. And this can spell trouble.

Is there a way around all this? Sure. Is it easy? Not so much.

The naive approach runs about like this:

<?

class MyPDO extends PDO {
  protected $transactions;

  function __construct($dsn) {
    $this->transactions = 0;
    parent::__construct($dsn);
  }

  /**
   * Push a transaction on the stack.
   * @return bool
   */
  function beginTransaction() {
    if (!$this->transactions) {
      if (!parent::beginTransaction()) return false;
    }
    ++$this->transactions;
    return true;
  }

  /**
   * Commit if we're the outermost transaction.
   * @return bool
   */
  function commit() {
    --$this->transactions;
    if (!$this->transactions) {
      return parent::commit();
    }
    return true;
  }

  /**
   * Roll back if we're the outermost transaction.
   * @return bool
   */
  function rollback() {
    --$this->transactions;
    if (!$this->transactions) {
      return parent::rollback();
    }
    return true;
  }
}

This follows the basic contract for PDO transactions. Of course, an actual implementation would have to recognize when database-altering statements destroy your transaction; probably by querying the database's current auto-commit mode setting after every query.

Also, note that this makes the outermost transaction win. In the code I see at work, this makes sense, because things generally follow the contract:

<?

try {
  // function with query that may fail
} catch (PDOException $e) {
  $conn->rollback();
  throw $e;
}

The problem is that our save methods do this same thing. And most of them call other objects' save methods. So nested transactions are the rule, not the exception.

Notice that throw $e;? I think that's what keeps the system from coming crashing down in a raging inferno; instead it's just a small, constant flame...biding its time...waiting to doom us all.

This works fine for a regular stack of calls, but fails utterly once you throw in looping constructs. For example, removing a node from the middle of a tree:

<?
try {
  foreach($this->childen as $c) {
    $c->setParent($this->parent);
    $c->save();
  }
  $this->delete();
} catch (Exception $e) {
  $conn->rollback();
}

All these steps need to occur atomically. But, the save() method starts and commits its own transaction, so the database isn't guaranteeing atomicity (hint: that's the reason for transactions in the first place). What happens when the 3rd child fails? Children 1 and 2 are already committed, so the database is now in an inconsistent state. That's not a happy place to be, and the logic to prevent it could break horribly because of concurrency issues. Remember, the <? and ?> tags mean this goes on the internets; there are going to be concurrency issues to deal with.

So to fix it, we'll have to hack PDO. There needs to be a single transaction, and the outermost transaction has to "win" instead of the innermost. MySQL just doesn't support these semantics out of the box, otherwise it'd be a non-issue.

I'm just glad that our app doesn't have any schema-altering queries in it...

:wq