Saturday, September 6, 2008

Building Performance Monitoring Solution with Nagios and NDOUtils Part 2 PerfNagios

Welcome to part 2 of this series. This took long time but was worth the effort. In this part i am going to describe how to save performance data and integrate a sample monitoring scripts for performance monitoring. In earlier post I wrote about how to install and configure NDOUtils for database support in nagios. We will be using the same setup for storing data but will add some more table to store performance repository. Remember NDOUtils will delete data periodically!.

Lets first discuss how would you save performance data.

PerfNagios

is sourceforge project started by me for implementing this idea. Currently only parsing code is stable, reporting and dashboard functionalities will be added later. Stay connected on this blog for future updates to this project.

PerfNagios is basically small web-interface for displaying nagios related data easily. Eventually it will have good reporting capabilities. Currently it only shows performance data for last 1000 readings. You can see screenshots below.




Following are tables i used for storing performance data.


CREATE TABLE `nagios_metrics` (
`metric_id` int(11) NOT NULL auto_increment,
`instance_id` smallint(6) NOT NULL ,
`host_object_id` smallint(6) NOT NULL ,
`service_object_id` smallint(6) ,
`unit` varchar(60),
`label` varchar(255),
PRIMARY KEY (`metric_id`)
) ENGINE=InnoDB;


CREATE TABLE `nagios_metric_data` (
`metric_data_id` int(11) NOT NULL auto_increment,
`metric_id` int(11) NOT NULL,
`value` double not null,
`warn` double,
`critical` double,
`min` double,
`max` double,
`date` datetime not null,
PRIMARY KEY (`metric_data_id`, `metric_id`)
) ENGINE=InnoDB;


CREATE TABLE `nagios_perf_batches` (
`date` datetime not null,
`last_service_check_id` int(11) not null,
`last_host_check_id` int(11) not null,
`host_checks` int(11) not null,
`service_check` int(11) not null
) ENGINE=InnoDB;


Lets take an example say we need to keep track of how cpu is getting used. For that we need to add service call CPU to monitoring host. Below is small script which outputs important cpu performance metrics : CPU Run length, User, System and Wait-on I/O. Output is like :

CPU : OK 0 | rl=0;2;5 us=2;85;85 sys=0;10;20 wa=0;5;10 total=3;80;90


  • check_java.sh

value=`vmstat 3 3 | awk -f /opt/nagios/libexec/ubuntu-sys-plugins/cpu.awk`
returnvalue=`echo $value | awk '{print $4}'`
echo $value;
exit $returnvalue;


  • and cpu.awk


{
#print $1,$13,$14,$15,$16;
if(NR>=5)
{
rl=rl + $1;
us= us+ $13;
sys = sys + $14;
wa = wa + $16;
}
#print NR,"--->", $0;
}
END {

rl = rl /2;
us= us/2;
sys = sys/2;
wa = wa /2;
total = us + sys + wa;

#print "Total is ", total;

if(total <= 75) { msg = "CPU : OK"; returnvalue=0; } if(toal > 75 && total <= 85) { msg = "CPU : WARINING "; returnvalue = 1; } else if(total > 85)
{
msg = "CPU : CRITICAL";
returnvalue = 2;
}
msg = sprintf("%s %d | rl=%d;2;5 us=%d;85;85 sys=%d;10;20 wa=%d;5;10 total=%d;80;90",msg,returnvalue,rl,us,sys,wa,total);
print msg;
}


  • To include this monitoring script you need to add service for localhost. In nagios 3.x configuration is based on templetes. For defining we need to first add check command in file /opt/nagios/etc/objects/commands.cfg


# 'check_cpu command definition - tushar
define command{
command_name check_cpux
command_line /opt/nagios/libexec/ubuntu-sys-plugins/check_cpu.sh
}

  • Once this is done you need to add service definition in /opt/nagios/etc/objects/localhost.cfg.

define service{
use local-service ; Name of service template to use
host_name localhost
service_description CPU
check_command check_cpux
notifications_enabled 0
}






Restart the Nagios by /etc/init.d/nagios restart and done! You system is now able to monitor cpu information.

Below is the graph drawn in PerfNagios for the same script.





No comments: