2010-04-26

Monitoring Your Infrastructure with Zabbix

The last time I had an enterprise system monitoring implementation to work with, it was already set up and the organization had been using it for several years.

This time, I got to do a little research and set up a system at home to monitor my various "machines." After playing around with Cacti, I decided that seeing my logs fill with PHP deprecation warnings was a little too scary and started up a zabbix server.

Zabbix definitely takes more resources, both in terms of the server and the machines being monitored. Like everything else, it uses a database backend and has a nice clean web frontend, so no real differences there. By default, you can graph any piece of data you gather (including the version of Zabbix, if you want it), and you can create more complex graphs fairly easily. It took me a few minutes to get a cheap load average graph that works quite well. Paging is actually a more robust "notification" system that supports Email, SMS, Jabber, and custom scripts, so you can extend it to do whatever you can think of (fax? log to a file? print a .pdf from your boss' printer whenever a server goes down?).

For the truly lazy, Zabbix offers a .vmdk built on SuSE that you can just fire up and have a running Zabbix server.

Monitoring, graphing, paging...there are all the same tasks that any monitoring system does.

Where Zabbix's architecture differs from Nagios, Zenoss, Cacti, et. al., is that it's a client-server system. This took me a while to grok, since "client" and "server" aren't what you'd normally expect them to be (think iSCSI). The "Zabbix Server" box dials out to an (access-list restricted) open port on the "client" and retrieves the data from a running service. There's even a Windows version of the client available, if you're into that sort of thing.

This means that there is a daemon running on the client to gather data and send it to the server and a daemon running on the server to periodically go and nab the data from the clients. The clients need to know where to trust connections from (by IP, so watch out if you have AAAA's), and then the hosts have to be manually added to the Zabbix server. For rolling out to an existing infrastructure this is more work than a pure SNMP-monitoring solution that can just scan your network and learn all your hosts.

A side-effect of this is that Zabbix has a proxy server available. You could have a host on each subnet (or at each site) gather all the information, and then submit bulk updates to a central Zabbix server, or implement even more tiers of proxying if your infrastructure requires it.

The biggest "gotcha" to just getting a Zabbix server running is that the front-end isn't necessarily bundled with the server. The only requirement for the web front-end is that it can talk to the database, so you can run the front-end, Zabbix server, and database on 3 different systems if you want to. The front-end is all PHP, so your existing Apache server is more than adequate.

Zabbix also does some basic host-based intrusion detection. If you're going to be running something more beefy (Samhain/Beltane, for example), then this is likely just a waste of cycles. Still, this is leaps and bounds above the option of no host-based intrusion detection.

:wq

No comments:

Post a Comment