fmt </dev/urandom >>/dev/internet: 2010

2010-07-04

Restoring a System Image from Windows Backup

Previously, I mentioned that I had lost a hard drive, and the importance of backing up files.

If you're running a Windows 7 system, go do yourself a favor and turn on Windows Backup to an external CIFS server. I'll wait.

Everybody backing up regularly now? Good.

Yesterday, I finally threw a new hard drive in my dead system and flipped the switch again. As expected, the system complained that the hard drive was unbootable, and I started rooting around for my Windows 7 disc.

Rather than installing, there's a Repair your computer link. Tell it you want to restore a system image, let it connect to the network, select your system image, and go grab a book. This took me about 40min over a 100Mbit link, so it was comparable timewise to a full install.

That's it. The system will reboot automagically, at which point you probably want to pull out the disc.

That's it; log in and run Windows Update a few times and you're back in business.

:wq

2010-06-20

Blogger Editing via ${EDITOR} - GoogleCL

Google CL allows the use of google tools over command line. This allows amazing things like editing google docs in vim (or emacs, if you're into that sort of thing). There's also blogger and picasa functionality, and you can dump your entire google calendar to stdout (in csv format, so start brushing up on your awk).

One of the things that's blatantly missing, however, is blogger integration with your favorite editor. Hopefully, this post will fix that, as it's doubling as my test of google-cl blogger integration and my blogger script.

#!/bin/sh

GOOGLE=/usr/bin/google
BLOGFILE=`mktemp`

ed=${VISUAL-${EDITOR-/usr/bin/vim}}

timestamp=`date -r "${BLOGFILE}" +%s`

${ed} "${BLOGFILE}"

newtime=`date -r "${BLOGFILE}" +%s`

if [ $? -eq 0 -a "${newtime}" -gt "${timestamp}" ]; then
 title=`head -1 "${BLOGFILE}"`
 /bin/sed -i '1d' "${BLOGFILE}"
 echo 'Posting to blog "fmt >/dev/internets"'
 ${GOOGLE} blogger post --title "${title}" "${@}" "${BLOGFILE}"
fi

rm "${BLOGFILE}"

File inclusion courtesy of :r. Now that makes me happy.

:wq

2010-06-15

Get the Keys of a Hash in Javascript

I found myself today needing to do something that should be really easy in nearly any language: get a key of a given hash. (In the actual problem, any random key would do.)

After trying all kinds of different jQuery and array hacks, I finally found the solution; sometimes, you spend so much time looking in libraries you forget about the actual language:

var key = (function() { for(foo in obj) { return foo; })();

And that's exactly what I used; since any key is good enough, the first key is the easiest one to grab. Making it an anonymous function means I don't have to muck with break; or the like, and running it immediately means I can save the return value into a variable directly.

You can also use the for(key in Object) construct to get all the keys of any given Object. This is mostly academic, since you could always just use for directly (and without the overhead of a function). It may come in handy for debugging routines, however.

/**
 * Returns the keys of a given object.
 *
 * @param Object obj
 * @return Array
 */
function keys(obj) {
  var ks;
  for (k in obj) { ks.push(k) }
  return ks;
}

:wq

2010-06-09

Git Trick: Multiple remotes

git is a wonderful little vcs. It took a while for me to warm up to it, but now I could never go back to svn or cvs. RCS still holds a special place in my heart, and is so damn useful I can't give it up completely. But these days, my vcs world is all git-based.

I'm sure there's a way to do this multiple-remote trick in bzr and hg, but I don't use them, so haven't bothered to figure it out. If I need hg, there's always hg-git, which may earn its own entry soon.

git makes it easy to track multiple remote branches, which is ostensibly used for sharing work between colleagues. i.e. I can 'merge steve/master' to bring all Steve's unpublished changes to master into my current local branch. This is great for hacking in small groups or pairs without needing a central server. However, the true power of this, I think, is that any git url can be a remote.

So, for example, I'm building a site based on dabl, and I want to keep it up to date with the latest changes. Now, up until last week, that meant I would check out dabl, check out my code, and then copy the files in and commit. Something like this:

git clone dabl
git clone my-project
cp -R dabl/{libraries,helpers} my-project/
cd my-project
git commit -a -v -m 'Updated dabl'
git push origin master

Now, of course that `git commit -a` wipes out any local changes. That means that, for example, anything in helpers/ that gets modified in both my-project and dabl will be replaced by the dabl version. Usually, that's fine because I've pushed the changes from my-project upstream. Where there would logically be a conflict, it is happily ignored in favor of the upstream version. Of course, upstream is the same procedure but in the reverse direction, and has the same pitfalls.

The solution is to recognize that these are both git repositories in their own right. Just add the dabl repository to my-project. Updating is as easy as:

git fetch dabl
git merge dabl/master

Now, I have the latest version of my upstream code, with all the benefits of merging it in. This is what I really wanted in the first place. I love cp, but it's never going to give me an octopus merge.

But, that's only one half of the problem. Is there any way pushing upstream could be aided by this?

Okay, it would be pretty anti-climactic if the answer to that were no. As it turns out, pushing upstream is the reverse of pulling from upstream (like with the cp "solution"); you just have to be a little more careful. Lets say I need to push HEAD on master upstream:

git checkout dabl/master
git cherry-pick master
git push dabl HEAD:master

Yes, it really is that easy. "master" is a valid commit-ish, so you can cherry-pick it. You can even do exciting things like master^ and master~3. And of course, you can name a commit by its hash or tag if you want to be really safe. In practice, it generally helps to be really safe.

A word of caution: DO NOT make your other remote call its refs "origin." This will give you and everyone you work with headaches as you will switch repositories instead of merging. And I mean "switch" in the svn sense of the word (i.e. "rebase"). Name it something sane and unique, like the name of the project.

:wq

2010-06-07

Setting Up a pacman Repository for Archlinux

Last night, I finally got around to recompiling my own vim binary through abs. I wanted X title bar support, and the python interpreter so that one of these days I can set up PHP debugging with x-debug.

Users of emacs can kindly redirect themselves to /dev/null. </religion>

Everything went off without a hitch; package compiled and installed fine, and I got a bonus gvim package to go with it. (On an unrelated note, anybody want a mint condition gvim package with ruby and python support compiled in? Has never even been opened; I don't even know if it works.) The dependencies are a little off (now requires libxt, ruby 1.9, and python 2.6), but I already had everything installed, so it was fine for me.

Then I went through and installed an ftp server on my latest test box and got the package added to the repository. It turns out that the name of the repo is important, which is a double edged sword. On the plus side, you can host multiple repositories on the same server without the huge directory tree required of other distros (*cough*ubuntu*cough*). On the other hand, it means that it took some trial and error to get the pacman.conf entry right:

[repository-name]
Server = ftp://hostname.example.com/

This actually downloads ftp://hostname.example.com/repository-name.db.tar.gz, which isn't what I would have expected. It's succinct, but means that you have to know the server's internal name for the repo (as opposed to, say, http://archive.ubuntu.com/ubuntu/dists/lucid/main/).

The other gotcha that wasn't clear from the Arch wiki is that the packages and the db file must live in the same directory. It looks like the following should work:

root@mirror:/srv/ftp# repo-add pkgs/vim-7.2-1-i686.pkg.tar.xz repository.db.tar.gz

and everything appears to work perfectly until you try and actually download repository/vim. The db entry doesn't store the full path, so you can simply move pkgs/*.tar.xz into the root of the ftp server and everything will magically start working.

Finally, pure-ftpd >> GNU inetutils ftpd. It supports IPv6, has rate-limiting, throttling, and chroot() built in, and permits anonymous, password-less logins. The motd at client connect I'm not crazy about, but that doesn't show up in pacman, so I don't much care. IPv6, on the other hand, means not having to deal with silly things like NAT. And we all know how well ftp works with NAT (hint: PASV is a hack that can now go the way of IE6).

So, now I have a native IPv6 arch repository that I can packages I compile from abs into. Let the hacking begin!

For those interested in what I'm up to:

[tingar]
Server = ftp://ftp.tingar.uni.cx/

:wq

2010-04-26

Monitoring Your Infrastructure with Zabbix

The last time I had an enterprise system monitoring implementation to work with, it was already set up and the organization had been using it for several years.

This time, I got to do a little research and set up a system at home to monitor my various "machines." After playing around with Cacti, I decided that seeing my logs fill with PHP deprecation warnings was a little too scary and started up a zabbix server.

Zabbix definitely takes more resources, both in terms of the server and the machines being monitored. Like everything else, it uses a database backend and has a nice clean web frontend, so no real differences there. By default, you can graph any piece of data you gather (including the version of Zabbix, if you want it), and you can create more complex graphs fairly easily. It took me a few minutes to get a cheap load average graph that works quite well. Paging is actually a more robust "notification" system that supports Email, SMS, Jabber, and custom scripts, so you can extend it to do whatever you can think of (fax? log to a file? print a .pdf from your boss' printer whenever a server goes down?).

For the truly lazy, Zabbix offers a .vmdk built on SuSE that you can just fire up and have a running Zabbix server.

Monitoring, graphing, paging...there are all the same tasks that any monitoring system does.

Where Zabbix's architecture differs from Nagios, Zenoss, Cacti, et. al., is that it's a client-server system. This took me a while to grok, since "client" and "server" aren't what you'd normally expect them to be (think iSCSI). The "Zabbix Server" box dials out to an (access-list restricted) open port on the "client" and retrieves the data from a running service. There's even a Windows version of the client available, if you're into that sort of thing.

This means that there is a daemon running on the client to gather data and send it to the server and a daemon running on the server to periodically go and nab the data from the clients. The clients need to know where to trust connections from (by IP, so watch out if you have AAAA's), and then the hosts have to be manually added to the Zabbix server. For rolling out to an existing infrastructure this is more work than a pure SNMP-monitoring solution that can just scan your network and learn all your hosts.

A side-effect of this is that Zabbix has a proxy server available. You could have a host on each subnet (or at each site) gather all the information, and then submit bulk updates to a central Zabbix server, or implement even more tiers of proxying if your infrastructure requires it.

The biggest "gotcha" to just getting a Zabbix server running is that the front-end isn't necessarily bundled with the server. The only requirement for the web front-end is that it can talk to the database, so you can run the front-end, Zabbix server, and database on 3 different systems if you want to. The front-end is all PHP, so your existing Apache server is more than adequate.

Zabbix also does some basic host-based intrusion detection. If you're going to be running something more beefy (Samhain/Beltane, for example), then this is likely just a waste of cycles. Still, this is leaps and bounds above the option of no host-based intrusion detection.

:wq

2010-04-25

Duplicate IPv6 Address Detection in Karmic

I've finally gotten my little empire of virtual machines somewhat stable, but a little wrench in the works has been the extra flag I keep needing for ssh: -4.

Sure, I can log in without it, but the systems will try using IPv6 first, since they all have IPv6 addresses in DNS, and then fall back to IPv4 when that times out. I could use my .ssh/config file to force IPv4, but that gets rid of the nagging reminder that I need to fix it.

Turns out IPv6 is broken by default in Karmic (and hopefully fixed in Lucid, but I'm not holding my breath). One of the "features" of IPv6 is Duplicate Address Detection; you should never have the problem of having the same IPv6 address allocated to 2 computers unless you do it intentionally. Of course, EUI-64 addressing should take care of that anyway...

Unfortunately, Karmic's implementation of DAD is horribly broken. Even with unique entries in DNS and a properly configured rtadvd, I kept getting logs filled with:

Apr 25 13:49:59 vm1 kernel: [ 6851.010394] eth0: IPv6 duplicate address detected!
Apr 25 13:50:01 vm2 kernel: [147906.360484] eth0: IPv6 duplicate address detected!
Apr 25 13:50:00 vm3 kernel: [315943.970527] eth0: IPv6 duplicate address detected!

So, I tried to configure IPv6 manually. But, Ubuntu is smarter than me (or so it thinks). Once it's detected that the address is a duplicate, you're not allowed to actually assign it to an interface. All the packets were attempting to be sent out with a source ip of ::1, so they all went out the lo interface which effectively firewalls the whole IPv6 stack from the outside world.

The only solution is to turn off Duplicate Address Detection.

sysctl net.ipv6.conf.all.dad_transmits=0
sysctl net.ipv6.conf.all.accept_dad=0

Perfect!

Well, not quite. See, "all" in sysctl doesn't actually mean "all." I'm not sure what it means, but it seems to be the exact opposite of "all," i.e. "none."

What I actually had to do was explicitely disable DAD on both lo and eth0. lo shouldn't need it, since all IPs on loopback are the same box anyway. If I see an IPv6 loopback packet storm, maybe I'll think about turning DAD back on for lo.

The final sysctl.conf:

net.ipv6.conf.all.dad_transmits = 0
net.ipv6.conf.all.accept_dad = 0
net.ipv6.conf.lo.dad_transmits = 0
net.ipv6.conf.lo.accept_dad = 0
net.ipv6.conf.eth0.dad_transmits = 0
net.ipv6.conf.eth0.accept_dad = 0

Phew! Good thing for me I have this all in Puppet, so it's easy to replicate across all my hosts:

exec { "sysctl -p":
  cwd         => '/',
  path        => '/bin:/sbin:/usr/bin:/usr/sbin',
  refreshonly => true,
  subscribe   => File[sysctl_config]
}

:wq

2010-04-22

Copy-on-write Semantics on DABL

I've been thinking of a way to do this for a while, and finally got some time to hack out a prototype yesterday.

First, the problem statement. We have a piece of data (street address) that needs to be associated with multiple entities. Basically, I want a many:many relationship between them, but with a catch. Once the address is attached to an entity, it should be manageable on a per-entity basis.

The obvious solution is to make a copy of the data and just attach the copy to the second entity. That's fine, but it's not unrealistic to have 1,000 entities with the same information. That means 1,000 copies of the data, just in case it might change.

So, I implemented a copy-on-write save() method for DABL, our pet ORM/MVC.

(editor's note: DABL is now hosted on github)

The basic idea is that we need to generate a new ID for the record, then pass that new record off to a subset of the related records. If all the related records are to be modified, there's nothing to be gained (except revision history; if you want that, it's trivial) from creating a new record. Rather than letting the database update all the related records, we have to do it ourselves since we only want to CASCADE some of them.

For the source of my prototype pet project, see DABL cow-save() or fork my nonblocking-random repository on github. The basic database setup is a Userdata table which has many Users. Userdata holds the equivalent of GECOS information (firstname, lastname), while User holds the login information (username, password). The production version of this has much more data in the related table and is written in Propel, which makes it significantly uglier.

:wq

2010-04-03

Managing your Checking Account with PocketMoney

Balancing your checkbook is one of those chores, like taxes, that everyone hates but does anyway. Of course, unlike taxes, you slack off for a month and the feds don't come busting down your door. So you slack off for another month...

If you're like me, you have several abortive attempts to use the check register that came with your checkbook to manage your money. My current check register even includes a handy three-year calendar for 2003, 2004, and 2005. The only real use I have for this ancient piece of banking technology is to track when I've overdrawn my account. After each time that happens, there's another 1/4 to 1/2 page of studiously usage until I start forgetting again.

Part of the problem with using a check register is that hardly anyone even takes checks anymore. I write around 30 or 40 checks a year; all of them are for paying bills, so I don't even bring my checkbook with me when I go out. I also don't generally carry cash, so that means a lot of miscellaneous debit and credit card transactions. And, lets be honest, who actually wants to carry a checkbook just to record all of those so they can balance their checkbook at the end of the month?

Of course, one thing I do always have with me is my iPhone. And, as they say, there's an app for that (several of them, actually). PocketMoney is the one that I finally settled on. There's a free trial, so really there's no excuse not to at least take it for a test drive. Training myself to use this was definitely easier than training to use a check register, and the built-in budgeting tools are impressive. When I've got the app handy, "I'll record this $1.40 at Starbucks when I get home" is really hard to justify.

One of PocketMoney's most useful features is the different budgeting schedules. If you get paid bi-weekly, enter that as the repeat period. The program will automatically pro-rate your budget based on how many weeks (or partial weeks) are in the given month. Enter Rent monthly, Groceries weekly, Salary bi-weekly, and Auto Insurance quarterly, and the app will take care of all the calculations to give you correct budget breakdowns by any time range you choose.

For me, though, the real power of this app comes when combined with online banking and online bill-pay. The recurring transactions feature lets you schedule automatically repeating transactions, so you never forget about an automatic withdrawal again. Online banking means that I can balance my "checkbook" against my bank account daily, weekly, or just whenever I remember and have an extra 10 minutes to spend on it. For me, this is a whole lot more convenient than remembering at the beginning of the month when I get my bank statement in the mail. Plus, since I have everything synchronized, I know how much money I don't have and can keep from overdrawing yet again.

:wq

2010-01-31

Do the Wave...

My first abortive attempt at a Google Wave bot was an IRC <-> Wave bot that would let me look at IRC over Wave (or, more likely, Wave over IRC).

The second bot I've attempted actually works, though, so naturally I'm more proud of that one. Inspired in part by an Ars Technica article, my D&D group decided to investigate gaming over Wave. I still think that OpenRPG is a better overall solution, but in a pinch, Wave could work. So we added the bot that they used for the article and played around with it a little bit.

For those not familiar with how D&D die rolls work, I recommend D&D for Dummies, but the basics should suffice for this example: NdM indicates rolling N M-sided dice. For monopoly, you use 2d6 (two, six-sided dice). For D&D, the standard roll is 1d20, plus or minus some modifiers. From there, you build up more and more complicated expressions until you're a level 10 Rogue/Warlock with 22 Dex waving your +3 Dagger at the zombie and you roll up 1d20+5+6+1+3+1+3+1 vs Reflex (and yet still miss!).

But I digress. All those modifiers you add to a die should be cumulative and per total roll: 2d6+3 is two, six-sided dice, plus 3. The die rolling wave bot linked by Ars Technica treats the modifier as modifying each die, so you get two, six-sided-dice-plus-three, effectively doubling the modifier (1d6+3 + 1d6+3).

Well, I had a dice-roller before that kind of worked (from my IRC adventures), so I decided to take my work on version 2 of that and port it to Java. Why Java? Because I hate Python. And for a Wave robot, those are the only choices.

So, I loaded up my least favorite IDE, reminded myself why I hate Eclipse, switched to NetBeans, lost a hard drive (which means I owe the internet the obligatory "Back Up Your Data" blog post), forgot all about it for a couple days, installed NetBeans on a different computer, and settled in to write myself a bot. Ah, Java. Good times.

Normally, I'd use a Linux as my IDE of choice, but Netbeans and Eclipse both have plugins for deploying a project to Google App Engine (where all Wave bots call home), and I didn't want to have to worry about that part. Plus, Java is a library language, so integrated docs for the libraries is rather nice.

A wave robot is pretty simple, you register for events, and then write a handler for them. The handler in this case is a little compiler (yes, it lexes and parses the die roll into a syntax tree; I'm a bit of a masochist, apparently), and an "interpreter" that actually rolls the result. Borrowing from OpenRPG, it looks for a roll inside square braces and replaces the contents with the result:

[1d20+5+6+1+3+1+3+1] becomes [1d20+5+6+1+3+1+3+1 = (37)] (I rolled well). There's always more to do with it, but this is a good start for a weekend. To play with the bot, add die-roller@appspot.com to your Wave contacts and then invite it to your favorite dragon-slaying waves. The author asks only a modest share of the XP from any dragon slayed with the help of this tool.

Now for the ugly:

NOWHERE on the internet could I find this little "feature," so I'm going to warn people here. When replacing text in the Java API (TextView.replace()), Annotations get "shifted" by the difference in text length you're inserting. As do Range objects which completely enclose the area. For example:

Range r = new Range(0, 10); // a range from 0 to 1
textview.replace(new Range(1,9), "a bunch more text");
// r is now the range [0, 18] because 8 more characters were inserted

Unfortunately, this happens on Annotations that have already been added (in this jsonrpc call) as well, but incorrectly. They get shifted twice (or one-and-a-half times, or something...they always end up beyond where they should), so this code acts very strangely indeed:

while (!stack.empty()) {
  Range r = stack.pop();
  ...
  textview.replace(r, a_string);
  textview.setAnnotation(r, "style/color", "#336699");
  ...
}

The "correct" way to do this is to keep track of an offset and then shift the Annotations yourself:

int offset;

for(Range r: stack) {
  ...
  textview.replace(new Range(r.getStart()+offset, r.getEnd()+offset), a_string);
  textview.setAnnotation(new Range(r.getStart()+offset, r.getEnd()+offset), "style/color", "#336699");
  offset += a_string.length() - (r.getEnd() - r.getStart());
  ...
}

Obviously, the more Java way of doing this would involve several dozen more Factorys, but at least Google embraces Python and so doesn't do things the "Java way."

Oh, and back up your data.

:wq

2010-01-30

Building a jQuery color picker with CSS-3

One of the apps I work on has a new feature that requires a color picker applet. I went searching around on the internet for something useful, only to find that most of the solutions for color pickers are way overblown for our needs. A simple select box with a dozen or so colors would probably work just fine, but I wanted something a little flashier. The smallest one that would kind-of work that I found out in the wild was 6K. And then it didn't work with the jQuery version we run on the app. An hour or so of Javascript-golf later and I have a basic color-picker applet that fits neatly into a single packet (including the HTTP headers).

This is the un-minified version; see /packer/ for a simple online Javascript-packer.

(function($){
var colors = [
  '#f7b', '#fb3', '#fb7', '#fbb',
  '#f77', '#bf3', '#bf7', '#3f7',
  '#37f', '#7f3', '#7bf', '#b3f',
  '#b7f', '#9af', '#7fb', '#bfb'
];

$(function() {
  $('input.colorpicker').each(function() {
    var $input = $(this),
      $div = $('<div />'),
      $div3 = $('<div />')
        .addClass('colorpicker-bin')
        .hide(),
      $div2 = $('<div />')
        .addClass('colorpicker-button')
        .css({
          'background-color': this.value,
          'cursor': 'pointer'
        })
        .html('&nbsp;')
        .click(function() {
          $div3.toggle("fast");
        });

    $input.css({
      'background-color': this.value,
      'color': '#000'
    })
      .attr('disabled','disabled');

    $(colors).each(function() {
      var color=this;
      var $div4 = $('<div />')
        .addClass('colorpicker-color')
        .css({
          'background-color' : color,
          'cursor': 'pointer'
        })
        .html('&nbsp;')
        .click(function(){
          $div2.css('background-color',color);
          $input.css('background-color',color)
            .val(color);
          $('div.colorpicker-color')
            .removeClass('colorpicker-selected');

          $(this).addClass('colorpicker-selected');
        });

      if (color==$input.val()) {
        $div4.addClass('colorpicker-selected');
        $div3.append($div4);
      }
    });

    $input.wrap($div).after($div2.append($div3));
  });
});
})(jQuery);

On page load (DOM ready, actually), it adds a color-picker to all input elements with class colorpicker. It even works with hidden input boxes. The (function($) { ... })(jQuery); hack lets me use $ throughout the plugin, even with jQuery in no conflict mode (as we do use it in this particular app). Most of the rest of the code is just build up to the onClick() events that do the heavy-lifting. The value of the input box is replaced with the background-color of whichever swatch the user clicks on, so form submission works like normal.

I wired it up with some simple css-3 (shamelessly stolen from Zurb) and I'm in business. @webkit-keyframes paired with the colorpicker-selected class is very neat. Just make sure not to forget about Gecko-based browsers and Internet Explorer.

:wq

2010-01-26

Nested Transactions in MySQL

Transactions are awesome. They can keep your database in a consistent state, without resorting to nasty business logic to cleanse the input first.

A couple months ago, I switched jobs and now I'm working as a Web Developer with a side of System Administration. This has recently meant that I've gotten up close and personal with some of the nastier quirks of MySQL. One of the things I wish it supported better is transactions.

The default storage engine doesn't support them at all, but we mostly use InnoDB tables anyway, so we get "transactions" for free. I use quotes here because these aren't quite the same as what I would expect. For example,

(editor's note: the default storage engine in MySQL 5.5+ is InnoDB, which does support transactions)

mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE foo (...);
Query OK, 0 rows affected (0.00 sec)

mysql> ROLLBACK;
Query OK, 0 rows affected (0.00 sec)

Rolling back a transaction should undo everything in the transaction, right?

Not quite so in MySQL. Observe:

mysql> SHOW TABLES LIKE 'foo';
+----------------------+
| Tables_in_test (foo) |
+----------------------+
| foo                  |
+----------------------+
1 row in set (0.00 sec)

Why do I now have a table `foo`? MySQL COMMITs a transaction, and puts you back into auto-commit mode whenever you add or drop a table. This is a good caveat to know, as you can do destructive things thinking you're safely in a transaction when you're not.

Unfortunately, this means the following doesn't work as intended:

mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)

mysql> ALTER TABLE `foo` ADD FOREIGN KEY `bar_id` REFERENCES `bar` (`bar_id`);
Query OK, 5 rows affected (0.02 sec)

mysql> DELETE * FROM `bar`;
Query OK, 3 rows affected (0.01 sec)

mysql> ROLLBACK;
Query OK, 0 rows affected (0.00 sec)

Here, I have ninja-truncated `bar`. and by "ninja," I mean, "sh*t, where's my backup of that database?"

ALTER TABLE works by first renaming the original table, then creating a new table with the original name and new schema, and then inserting all the records again. This means there's a hidden CREATE TABLE statement lurking in every ALTER TABLE. Which commits the transaction, and puts me back into auto-commit mode. Even though it's between a BEGIN and a ROLLBACK, the DELETE isn't in a transaction at all. And this can spell trouble.

Is there a way around all this? Sure. Is it easy? Not so much.

The naive approach runs about like this:

<?

class MyPDO extends PDO {
  protected $transactions;

  function __construct($dsn) {
    $this->transactions = 0;
    parent::__construct($dsn);
  }

  /**
   * Push a transaction on the stack.
   * @return bool
   */
  function beginTransaction() {
    if (!$this->transactions) {
      if (!parent::beginTransaction()) return false;
    }
    ++$this->transactions;
    return true;
  }

  /**
   * Commit if we're the outermost transaction.
   * @return bool
   */
  function commit() {
    --$this->transactions;
    if (!$this->transactions) {
      return parent::commit();
    }
    return true;
  }

  /**
   * Roll back if we're the outermost transaction.
   * @return bool
   */
  function rollback() {
    --$this->transactions;
    if (!$this->transactions) {
      return parent::rollback();
    }
    return true;
  }
}

This follows the basic contract for PDO transactions. Of course, an actual implementation would have to recognize when database-altering statements destroy your transaction; probably by querying the database's current auto-commit mode setting after every query.

Also, note that this makes the outermost transaction win. In the code I see at work, this makes sense, because things generally follow the contract:

<?

try {
  // function with query that may fail
} catch (PDOException $e) {
  $conn->rollback();
  throw $e;
}

The problem is that our save methods do this same thing. And most of them call other objects' save methods. So nested transactions are the rule, not the exception.

Notice that throw $e;? I think that's what keeps the system from coming crashing down in a raging inferno; instead it's just a small, constant flame...biding its time...waiting to doom us all.

This works fine for a regular stack of calls, but fails utterly once you throw in looping constructs. For example, removing a node from the middle of a tree:

<?
try {
  foreach($this->childen as $c) {
    $c->setParent($this->parent);
    $c->save();
  }
  $this->delete();
} catch (Exception $e) {
  $conn->rollback();
}

All these steps need to occur atomically. But, the save() method starts and commits its own transaction, so the database isn't guaranteeing atomicity (hint: that's the reason for transactions in the first place). What happens when the 3rd child fails? Children 1 and 2 are already committed, so the database is now in an inconsistent state. That's not a happy place to be, and the logic to prevent it could break horribly because of concurrency issues. Remember, the <? and ?> tags mean this goes on the internets; there are going to be concurrency issues to deal with.

So to fix it, we'll have to hack PDO. There needs to be a single transaction, and the outermost transaction has to "win" instead of the innermost. MySQL just doesn't support these semantics out of the box, otherwise it'd be a non-issue.

I'm just glad that our app doesn't have any schema-altering queries in it...

:wq