Scheduling conditional statements with crontab

 

Recently while trying to auto-restart a daemon which is down by identifying with ps aux and if condition, it is not worked as intended with crontab.

I used following command which is perfectly running in from command line but not through crontab.

if [ `ps aux | grep nrpe | grep -v grep | wc -l` -eq 0 ]; then service nagios-nrpe-server restart ;fi

After trying with different commands following thing worked for me:

pgrep nrpe; [ $? != 0 ] && /etc/init.d/nagios-nrpe-server restart

where pgrep returns non zero exit code if the process nrep is not running and with $? is used to get the exit code of previous command (in this case pgrep) and start the process.

Nagios – Check Ping /bin/ping Unknown status problem

Recently while I am trying to setup Nagios on Ubuntu 14.04 OS I got the error saying /bin/ping Unknown status.

After debugging a wile I got to know that, this issue is because of permissions for /bin/ping script.

To resolve this issue just run following command ( use root or sudo):

$ chmod u+s /bin/ping

After running above command /bin/ping script permissions look like below:

$ ls -l /bin/ping

-rwsr-xr-x 1 root root 44168 Mar 15  2014 /bin/ping*

After a while, nagios should be able to ping your servers without any issue.

Perl/Nagios – Can’t locate utils.pm in @INC

This is one more issue related Nagios Plugin which is written as Perl script.

While trying to use a Nagios plugin I got an error saying that “Can’t locate utils.pm in @INC”.

Following is complete error:

Can’t locate utils.pm in @INC (@INC contains: /root /etc/perl /usr/local/lib/perl/5.14.2 /usr/local/share/perl/5.14.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.14 /usr/share/perl/5.14 /usr/local/lib/site_perl .)

This issue is occurred due wrong library path in Nagios Plugin(Perl Script).

When I checked library path in script its given as ‘use lib “/usr/local/nagios/libexec”;’ where path /usr/local/nagios/libexec is not exists in our os.

All libraries are available in path “/usr/lib/nagios/plugins”, so I changed lib path in my script to “/usr/lib/nagios/plugins”. After changing the path Nagios Plugin is worked without any issue.

Perl/Nagios – Can’t locate Sys/Statistics/Linux.pm in @INC

While trying to use a Nagios plugin I got an error saying that “Can’t locate Sys/Statistics/Linux.pm in @INC”.

Following is complete error:

Can’t locate Sys/Statistics/Linux.pm in @INC (@INC contains: /usr/lib/nagios/plugins /root /etc/perl /usr/local/lib/perl/5.14.2 /usr/local/share/perl/5.14.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.14 /usr/share/perl/5.14 /usr/local/lib/site_perl .)

After debugging about error I got to know that there is a perl module in Linux to get system statistics called “libsys-statistics-linux-perl” and which is missing in my machine.

Since I am using Ubuntu, I installed libsys-statistics-linux-perl module with apt-get by using following command:

$ apt-get install libsys-statistics-linux-perl

After installing libsys-statistics-linux-perl module above issue got resolved for me.

Install nagios nrpe daemon and nagios plugins in CentOS

To install nagios nrpe daemon in CentOS use following command:

$ yum install nagios-nrpe

To install nagios plugins use following command:

$ yum install nagios-plugins nagios-plugins-nrpe

To start start/stop/restart nagios nrpe daemon use following command:

$ service nrpe start

$ service nrpe stop

$ service nrpe restart

To start nrpe daemon on use following command:

$ chkconfig nrpe on

-Sany

Install Nagios nrpe client and plugins in Ubuntu/Debian

To install Nagios nrpe client and plugins in Ubuntu/Debian run following command:

$ apt-get install nagios-nrpe-server nagios-plugins

After installing client and plugins you need to change allowed_hosts configuration in nrpe.cfg file.

By default allowed_hosts value in nrpe.cfg is 127.0.0.1

allowed_hosts=127.0.0.1

You need to replace 127.0.0.1 with Nagios master IP address.

If your Nagios master IP address is 192.168.2.10, then allowed_hosts value looks like below:

allowed_hosts=192.168.2.10

After updating this value restart nagios-nrpe-server daemon with following command:

$ service nagios-nrpe-server restart

Now you need to check if slave is accessible from master or not with following command:

$ /usr/lib/nagios/plugins/check_nrpe -H 192.168.2.11

Output:

NRPE v2.12

where 192.168.2.11 is slave IP address,

and NRPE v2.12 is NRPE plugin version in slave servers

If you wont get NRPE plugin version, mostly the problem is with nrpe.cfg file or your firewall in slave is not allowing to access.

-Sany

Nagios notification interval

 

One of the beautiful feature in Nagios is notification interval.

notification_interval  directive is used to determine the interval at which notifications should be made while this escalation is valid.

If you specify a value of 0 for the interval, Nagios will send the first notification when this escalation definition is valid, but will then prevent any more problem notifications from being sent out for the host.

If notification_interval is grater than 0 (Eg: notification_interval 10)Notifications are sent out for every 10 minutes until the host recovers.

Note: If multiple escalation entries for a host overlap for one or more notification ranges, the smallest notification interval from all escalation entries is used.

In general configuration looks like below:

# Generic host definition template – This is NOT a real host, just a template!

define host{
name generic-host ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
check_command check-host-alive
max_check_attempts 10
notification_interval 0
notification_period 24×7
notification_options d,u,r
contact_groups admins
register 0 ; DONT REGISTER THIS DEFINITION – ITS NOT A REAL HOST, JUST A TEMPLATE!
}

In above example you can find notification_interval as 0, that means all people in admin group will get alert when a service status is changed.

-Sany

By Sandeep Posted in Nagios

NRPE: Command ‘check_command’ not defined

Recently while trying to monitor a process with Nagios, when I ran following command following command from Nagios Server I got an error.

$ check_nrpe -H host.example.com -c check_command

Output:

NRPE: Command ‘check_command’ not defined

The output showing that check_command is not defined in the host.example.com servers nrep_local.cfg file.

When I cross checked about it, above command definition is defined in nrpe_locle.cfg file, even though I am getting the error.

When I am trying to recap I thought most probably the thing I missed is restarting the nagios-nrpe-server daemon.

After restarting nagios-nrpe-server daemon with following command in host.example.com, I am able to detect the process as its running.

$ service nagios-nrpe-server restart

After restarting when I ran same command from Nagios Server got the output as expected:

$ check_nrpe -H host.example.com -c check_command

Output:

status ok

-Sany

By Sandeep Posted in Nagios

Warning duplicate definition found for host – nagios

Recently when I am trying to setup nagios on one of the machine I got the following error message:

Warning: Duplicate definition found for host 'host.example.com' (config file '/etc/nagios/hosts/host.cfg', starting on line 25)
Error: Could not add object property in file '/etc/nagios/servers/host.cfg' on line 30.
Error processing object config files!

After debugging this issue after some time I got to know that this is not because of my host.cfg file, it’s because of /etc/nagios/nagios.cfg file.

The issue with nagios.cfg file is I added following lines in it:

cfg_dir=/etc/nagios/servers
cfg_file=/etc/nagios/servers/host.cfg

Because of above lines when nagios checking nagios.cfg file its getting host.cfg file two times, one is because of cfg_dir and other is because of cfg_file.

So I removed cfg_file=/etc/nagios/servers/host.cfg from nagios.cfg, validated nagios.cfg file again, then its worked without any issue.

-Sany