、安装
1.有关Nagios、Nagios-plugin和nrpe的关系
系统监视的功能是由Nagios-plugin提供的,如可执行文件(check_load)可以用来监视系统负载情况。Nagios把nagios-plugin提供的程序组织起来,若没有nagios-plugin,nagios本身什么也做不了。
我们在一台装有Apache的服务器上安装Nagios和Nagios-plugin ,在其他被监视机器上安装nagios-plugin和nrpe,这样,在10.15.3.170上,磁盘、负载等本地检查程序(check_disk、check_load)被nrpe执行,之后nrpe将结果发送给nagios服务器。
网络服务的监测以及本地的监测不需要nrpe中转。
2.nagios、nagios-plugin和nrpe的安装
nagios的安装参见nagios的官方网站 http://nagios.sourceforge.net/docs/2_0/toc.html
nagios、nagios-plugin和nrpe的安装也可参见LDF帮助文档。
首先确定apache和gdlib已经安装 (apache安装在/usr/local/apache22, 同时/usr/lib下有libgd.so.*)接下来安装nagios
2.1 安装nagios
解压缩包
#tar zxvf nagios-version.tar.gz
#cd nagios-2.8
#adduser nagios
#mkdir /usr/local/nagios
#chown nagios:nagios /usr/local/nagios
确定apache使用的用户
#grep "^User" /usr/local/apache22/conf/httpd.conf
User daemon
将daemon放入nagios组
#usermod -G nagios daemon
#./configure --prefix=/usr/local/nagios
#make all
#make install
安装启动脚本到/etc/init.d/
#make install-init
安装配置文件的示例
#make install-config
#make install-commandmode
配置apache:
在apache的配置文件中添加如下内容:
#vi /usr/local/apache22/conf/http.conf
ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
<Directory "/usr/local/nagios/sbin">
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
</Directory>
Alias /nagios /usr/local/nagios/share
<Directory "/usr/local/nagios/share">
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
</Directory>
为了不与原服务器配置冲突,可以加在虚拟主机中。
#vi /usr/local/apache22/conf/extra/http-vhosts.conf
<VirtualHost 10.15.5.145:80>
DocumentRoot /usr/local/apache22/htdocs
ServerName nagios.mj.dalian
ErrorLog logs/dummy-host2.example.com-error_log
CustomLog logs/dummy-host2.example.com-access_log common
ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
<Directory "/usr/local/nagios/sbin">
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
</Directory>
Alias /nagios /usr/local/nagios/share
<Directory "/usr/local/nagios/share">
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
</Directory>
</VirtualHost>
用apache提供的 htpasswd 为nagios 服务器添加用户:
#htpasswd -c /usr/local/nagios/etc/htpasswd.users MJ
添加更多用户, 因为已经生成htpasswd.users, 故不加-c
#htpasswd /usr/local/nagios/etc/htpasswd.users username
察看/usr/local/nagios/etc/cgi.cfg中use_authentication选项是否为1:
# grep use_authentication etc/cgi.cfg
use_authentication=1
重新启动apache,在浏览器中输入正确的网址,会出现用户名和密码登陆框,登陆后会显示nagios的主页面。
#/usr/local/apache22/bin/apachectl restart
2.2安装nagios-plugin
解压进入nagios-plugin的目录
#tar zxvf nagios-plugin-1.4.tar.gz
#cd nagios-plugin-1.4
#./configure --prefix=/usr/local/nagios/
#make all
#make install
nagios-plugin的可执行程序被安装在了/usr/local/nagios/libexec
须注意的是,在nagios-plugin执行./configure 的时候,若需要编译check_mysql,需要参数 --with-mysql=(mysql安装路径)。我在装有mysql的机器172.18.3.173上编译nagios-plugin然后将check_mysql和mysql运行库(/usr/local/mysql/lib/mysql/libmysqlclient.so.15.0.0) scp到172.18.3.141, 将运行库连接为 /usr/lib/libmysqlclient.so.15,之后check_mysql程序执行正常。
2.3在其他机器上安装nrpe和nagios-plugin
解压进入nrpe源码目录,
#tar zxvf nrpe.tar.gz
#cd nrpe-2.7.1
#./configure --enable-ssl --enable-command-args
#make all
在src中生成两个可执行程序nrpe 和check_nrpe,一个是nrpe本身的可执行程序,一个是nagios插件。
check_nrpe需要放到nagios服务器libexec目录下。
#scp src/check_nrpe 10.15.3.166:/usr/local/nagios/libexec
#mkdir /usr/local/nagios/{bin,etc} -p
#cp src/nrpe /usr/local/nagios/bin
同时,sample-config目录下有nrpe配置文件的示例。
#cp sample-config/* /usr/local/nagios/ect/
nagios-plugin的安装同2.3。
二、配置
1. nagios简单配置
nagios安装好后,/usr/local/nagios/etc目录下会有一些配置文件的示例,如nagios.cfg-sample。将每个文件后缀-sample去掉
#cd /usr/local/nagios/etc
#cp nagios.cfg-sample nagios.cfg
其它同样操作。
nagios的主配置文件是/usr/local/nagios/etc/nagios.cfg。
nagios.cfg中一些参数含义
*_file的选项指明其它配置文件的位置,而且可以多次出现。如
resource_file=/usr/local/nagios/etc/resource.cfg
在resource.cfg中定义了一些宏,如 $USER1$ =/usr/local/nagios/libexec 之后在定义命令和监视任务的时候$USER1$就指nagios-plugin的目录。
cfg_file=/usr/local/nagios/etc/commands.cfg
cfg_file=/usr/local/nagios/etc/monitor.cfg
在commands.cfg 和 monitor.cfg中定义了具体的监视任务和选项。
在每一个cfg_file中,每一个任务是以对象的形式定义的。
在commands.cfg中有如下定义:
define command {
command_name check_local_disk
command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
}
定义了一条命令,命令的名字为check_local_disk,命令的内容为 $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
$USER1$代表nagios-plugin的目录,特殊宏 $AGRn$ 代表传给可执行程序的参数。当nagios执行check_local_disk的时候实际执行的是 /usr/local/nagios/libexec/check_disk文件。可以在linux命令行中手动执行
#/usr/local/nagios/libexec/check_disk --help
得到帮助信息。
在monitor.cfg中有如下定义:
define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
定义了一个时间段,时间段的名字是24x7,后面的定义表明这个时间段覆盖从周一到周日的每天24小时。
define contact{
contact_name MJ
alias MJ
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email [email protected]
}
定义了一个联系人,名称为MJ,service和host出现问题的报警时间段为24x7, 当service状态为warning,unknown,critical,recover时报警,报警方式为邮件。
注意:在本例中,MJ也是apache 页面验证的名字,默认只有登录nagios所用的名字和contact的名字相同时才有足够的权限察看服务状态。
define host{
name mj-host ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
check_period 24x7 ; By default, Linux hosts are checked round the clock
max_check_attempts 10 ; Check each Linux host 10 times (max)
check_command check-host-alive
notification_period workhours ; Linux admins hate to be woken up, so we only notify during the day
notification_interval 120 ; Resend notification every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
contact_groups MJ-SYS ; Notifications get sent to the admins by default
retain_nonstatus_information 1 ; Retain non-status information across program restarts
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
定义了一个host类别,名字为mj-host, 监视时间段为 24x7 (前面定义过的) ,报警时间段为workhours等等。
register 为 0 表明它只是一个模板,可以被其它对象继承,但本身对监视器行为没有影响。
check_command属性指定监视程序检查时执行的命令,本例中为check-host-alive,这条命令在command.cfg中有定义。
define host{
use mj-host ; Name of host template to use
host_name MJ-FRONT
alias MJ-APACHE-TOMCAT
address 10.15.3.166
}
定义了一台host主机,use mj-host表明这台主机的设定继承自刚刚定义过的mj-host模版,实际上相当于把mj-host的设定中除name和register之外的属性复制到了当前配置中。这台主机除拥有mj-host中设定的值之外还有这些额外的属性:host_name MJ-FRONT(显示在监视画面上的), alias以及address.
register不自动继承,并且默认为1。
子类中的属性也可以覆盖父类中的属性。
define service{
name mj-host ; The 'name' of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0 ; The service is not volatile
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
define service{
name MJ-FRONT ; The name of this service template
use mj-host ; Inherit default values from the generic-service definition
check_period 24x7 ; The service can be checked at any time of the day
max_check_attempts 4 ; Re-check the service up to 4 times in order to determine its final (hard) state
normal_check_interval 5 ; Check the service every 5 minutes under normal conditions
retry_check_interval 1 ; Re-check the service every minute until a hard state can be determined
contact_groups MJ-SYS ; Notifications get sent out to everyone in the 'admins' group
notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 60 ; Re-notify about service problems every hour
notification_period 24x7 ; Notifications can be sent out at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
define service{
use MJ-FRONT ; Name of service template to use
host_name MJ-FRONT
service_description DISK
check_command check_local_disk!10%!5%!/
}
其中check_command的内容,check_local_disk为命令名,在command.cfg中定义的,不同的参数由!隔开,它们在command定义中用$ARGn$引用。
normal_check_interval 5
表示每隔5个时间段检查一次服务,时间段长度由nagios.cfg中interval_length=60 决定,默认为60秒。
每一个service通过host_name同一台host相联系。对host状态的检查由host上面的服务检查驱动,如果101上面的http服务应该在5分钟后检查,那么在检查apache时,nagios首先检查101这台host是否能ping通,能则在 监视页面host detail项中显示ok,否则视为这台host已down掉。如果一台主机上不定义任何service,那么这台主机不会被检查。(除非另行设置)
nagios配置好后在启动之前需检查配置的正确性:
#/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
(some output)...........
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
确认无误后启动nagios:
#service nagios start
2 nrpe配置
#ssh 10.15.3.170
#cd /usr/local/nagios/
#vi etc/nrpe.cfg
检查是否有
dont_blame_nrpe=1 (此参数允许nagios向nrpe传递命令参数。)
以及allowed_hosts 中包含nagios服务器ip,或将此行注释。
nrpe接受的命令定义在文件末尾。将191-196注释,并将最后几行注释打开
189 # The following examples use hardcoded command arguments...
190
191 #command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
192 #command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
193 #command[check_disk1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1
194 #command[check_disk2]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hdb1
195 #command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
196 #command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
197
198 # The following examples allow user-supplied arguments and can
199 # only be used if the NRPE daemon was compiled with support for
200 # command arguments *AND* the dont_blame_nrpe directive in this
201 # config file is set to '1'...
202
203 command[check_users]=/usr/local/nagios/libexec/check_users -w $ARG1$ -c $ARG2$
204 command[check_load]=/usr/local/nagios/libexec/check_load -w $ARG1$ -c $ARG2$
205 command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
206 command[check_procs]=/usr/local/nagios/libexec/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
启动nrpe
#/usr/local/nagios/bin/nrpe -n -c /usr/local/nagios/etc/nrpe.cfg -d
-n表示不使用ssl,当遇到ssl错误时加这个选项
-c 指定nrpe配置文件
-d 以守护进程执行。
说明:nrpe提供check_nrpe插件,放在nagios主机的libexec目录下,同其它插件一样使用。
如,nagios执行
check_nrpe -H 192.168.2.3 -c check_mysql
check_nrpe将指令check_mysql发送到192.168.2.3的5666端口(默认),由2.3的nrpe执行,结果返回给nagios.
设定nagios服务器端:
#ssh 10.15.3.166
#cd /usr/local/nagios/etc
#vi commands.cfg
添加
define command{
command_name check_nrpe_load
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -n -c $ARG1$ -a $ARG2$ $ARG3$
}
#vi monitor.cfg
添加
define service{
use MJ-FRONT
host_name MJ-Admin
service_description LOAD
check_command check_nrpe_load!check_load!5,5,5!10,10,10
}
重新启动nagios:
#service nagios restart
Nagios系统及服务
评论
7 views