Archive for the ‘Monitoring’ Category

Graphing Mysql Performance in Groundworks

The Problem

Just installed GroundWorks Community Edition 5.3 and this is a great product but when comparing the built-in graphing capabilities to Hyperics community editon it falls short. Adding monitoring to MySQL, Apache or Tomcat does not give you graphs out-of-the-box but this CAN and will be fixed…

FYI I’m assuming that groundworks is installed and working, and also that you have the ability to monitor cpu, memory and more using ssh to your mysql server. If you don’t have this setup then some steps in this article will NOT work.

The Fix

Quick rundown of fix

  • Create a service for Nagios under the configuration screen.
  • Add the new service to your MySQL server
  • Go to performance and setup a “Graph” template for the MySQL service
  • Wait
  • Watch graphs

Ok that was the quick run-down and that was what I just could NOT get to work the first-time around. I’ll go though how you can add multiple graph points and also how to debug the issue if like happened to me NOTHING is graphed or even that the graph is not created.

Setup check_mysql

First we need a user on the MySQL server that has STATUS only access

mysql> grant usage on *.* to ‘nagios’@’groundworks-host’ identified by ‘password’;

Lets create a service and a command that uses the check_mysql script that is supplied with Groundworks.

Command definition: $USER1$/check_mysql -H $HOSTADDRESS$ -u “$ARG1$” -p “$ARG2$” -P “$ARG3$”
Usage: check_mysql_status!ARG1!ARG2!ARG3
Command line: ?

The add your mysql server in the test box and click “test”

Test: Host:
/usr/local/groundwork/nagios/libexec/check_mysql -H 192.168.2.2 -u "nagios" -p "password" -P "3306"

Uptime: 1862427  Threads: 7  Questions: 195920  Slow queries: 2  Opens: 166  Flush tables: 1  Open tables: 159  Queries per second avg: 0.105

Command returned exit status 0

Nice that works! but that is the easy part. Now we need to look at the “performance” area and actually create the graph setup.

Creating the graph

Go to Performance – configure. You will see a long list of the default graphs and now we need to add our own.

The top one should be Current Load click on the copy button and create a new performance graph that will look like this:

Graph Label: MySQL Stats
Service: mysql_status-port-330[1-9]
Use Service as a Regular Expression ON
Host: *
Status Text Parsing Regular Expression: [\w\d\s:]+Threads: (\d+)[\w\d\s:]+avg: ([\d\.]+)
Use Status Text Parsing instead of Performance Data ON
RRD Name /usr/local/groundwork/rrd/$HOST$_$SERVICE$.rrd
RRD Create Command $RRDTOOL$ create $RRDNAME$ –step 60 –start n-1yr DS:Threads:GAUGE:1800:U:U DS:QPS:GAUGE:1800:U:U RRA:AVERAGE:0.5:1:8640 RRA:AVERAGE:0.5:12:9480 2>&1
RRD Update Command $RRDTOOL$ update $RRDNAME$ $LASTCHECK$:$VALUE1$:$VALUE2$ 2>&1
Custom RRDtool Graph Command
Enable ON

I have MySQL servers listening to multiple ports so when I created the graph I made the service name a regular expression.

mysql_status-port-330[1-9]

This matches the three service names that I have created

mysql_status-port-3306
mysql_status-port-3307
mysql_status-port-3308

The name of the service IS VERY IMPORTANT when you create the graph. When the performance data is imported in the performance graph Groundworks does a search through the database to find the service data using the service name. Since I setup the service with multiple ports for the same mysql_status service it adds the port-3306… to each of the services and that is the name stored in the database. Here is how I setup the multiple ports

In the service check of the hosts mysql_status I added three instances at the bottom of that screen.

Instance Name Suffix Status Arguments
Active
Active
Active

Ok back to the graph setup…

After you have saved the new performance graph there are two directories that you need to worry about.

/usr/local/groundworks/rrd
and
/usr/local/groundworks/nagios/eventhandler

rrd

In the rrd directory you will get a file that is named something like mysqlservername.cadechristian.com_mysql_status-port-3306.rrd, this file is created using the RRD Create Command from the performance template. That will will also be updated using the RRD Update Command.

Lets talk about the RRD Create command

$RRDTOOL$ create $RRDNAME$ –step 60 –start n-1yr DS:Threads:GAUGE:1800:U:U DS:QPS:GAUGE:1800:U:U RRA:AVERAGE:0.5:1:8640 RRA:AVERAGE:0.5:12:9480 2>&1

The important part is

–step (how often to plot on the graph) 300 is a good production number but for testing I like 60

DS:Threads:GAUGE:1800:U:U – First datasource
DS:QPS:GAUGE:1800:U:U – Second Datasource

2>&1 put errors to standard out log.

The datasources are found through the regex that I setup that parses the output from the mysql command ( Each datasource is one regex group ):

[\w\d\s:]+Threads: (\d+)[\w\d\s:]+avg: ([\d\.]+)

The (\d+) is how you create a group with the () around what you are matching. So in this regex we get the first group with the number of threads and the second group with the Queries per second.

Now it is important that we also have 2 values in the RRD update command.

$RRDTOOL$ update $RRDNAME$ $LASTCHECK$:$VALUE1$:$VALUE2$ 2>&1

If you need to change the RRD create command you need to MANUALLY DELETE the RRD file in the /usr/local/groundworks/rrd directory!!!

The rrd file is created in a default setup after 1-2 min and the performance data will be parsed and visible in the graph maybe after 5min.

Debugging

The “event” scripts are located in /usr/local/groundworks/nagios/eventhandlers and inside of the scripts there is a variable called $debug that you should set to 3 ( default = 0 ) if you want to get a more descriptive output in the logs.

The  logfile /usr/local/groundworks/nagios/eventhandlers/service_perfdata.log has very light debug information but it shows when the service is pulled for a name host

1244226293 den2mysql002.openlogic.com mysql_status-port-3306 Uptime: 1864121  Threads: 7  Questions: 196033  Slow queries: 2  Opens: 166  Flush tables: 1  Open tables: 159  Queries per second avg: 0.105
1244226296 den2mysql002.openlogic.com tcp_ssh SSH OK – OpenSSH_4.3 (protocol 2.0)
1244226297 olex-perfdb icmp_ping PING OK – Packet loss = 0%, RTA = 0.13 ms
1244226293 mysqlserver mysql_status-port-3306 Uptime: 1864121  Threads: 7  Questions: 196033  Slow queries: 2  Opens: 166  Flush tables: 1  Open tables: 159  Queries per second avg: 0.105 1244226296 mysqlserver tcp_ssh SSH OK - OpenSSH_4.3 (protocol 2.0) 1244226297 mysqlserver icmp_ping PING OK - Packet loss = 0%, RTA = 0.13 ms

This is ok information BUT it will not tell us if something did NOT work. So lets add $debug = 3 to the process_service_perf(_db AND _db_file).pl file and wait for the process_service_perf.log file to be created in the eventhandlers directory

And here is a good entry from the debug log:

Fri Jun  5 14:43:14 MDT 2009
Host: mysqlserver
Svcdesc: mysql_status-port-3306
Lastcheck: 1244234573
Statustext: Uptime: 1872401  Threads: 7  Questions: 196614  Slow queries: 2  Opens: 166  Flush tables: 1  Open tables: 159  Queries per second avg: 0.105

Perfdata:
No exact service name mysql_status-port-3306. Query database for service pattern matches.mysql_status-port-3306 matches database service pattern mysql_status-port-330[1-9]. Using this entry.

Match in status text for regular expression (?-xism:[\w\d\s:]+Threads: (\d+)[\w\d\s:]+avg: ([\d\.]+))

SQL = SELECT hs.host_service_id FROM host_service as hs, datatype as dt WHERE (hs.host=’mysqlserver’ AND hs.service=’mysql_status-port-3306′ AND dt.type=’RRD’ AND dt.location=’/usr/local/groundwork/rrd/mysqlserver_mysql_status-port-3306.rrd’ AND hs.datatype_id=dt.datatype_id)

Table host_service, host=mysqlserver, service=mysql_status-port-3306 already has an exisitng entry for location /usr/local/groundwork/rrd/mysqlserver_mysql_status-port-3306.rrd. New entry not added.
/usr/local/groundwork/common/bin/rrdtool update /usr/local/groundwork/rrd/mysqlserver_mysql_status-port-3306.rrd 1244234573:7:0.105 2>&1

Return:
Label Threads found in rrd create string: $RRDTOOL$ create $RRDNAME$ –step 60 –start n-1yr:GAUGE:1800:U:U DS:QPS:GAUGE:1800:U:U RRA:AVERAGE:0.5:1:8640 RRA:AVERAGE:0.5:12:9480 2>&1
Label QPS found in rrd create string: $RRDTOOL$ create $RRDNAME$ –step 60 –start n-1yr:GAUGE:1800:U:U:GAUGE:1800:U:U RRA:AVERAGE:0.5:1:8640 RRA:AVERAGE:0.5:12:9480 2>&1

Posting data to Foundation
performancedatalabel=Threads
performancevalue=7
performancedatalabel=QPS
performancevalue=0.105
Elapsed Execution time = 4.50323104858398 seconds

You see that it finds the service name using the regex for the service, then we find two values that are stuffed into the two labels that we have (Threads, QPS)

One of the biggest issues I had was with the collected performance data. After I found that I could use regex everything started working much better.

Conclusion

Using Groundworks to monitor and get graphs is NOW easy but it was a pain to get to this point. I turning point was finding the eventhandler debug log. I hope this will be helpful to someone out there…

check_apache.pl and perl module issue

The problem

[nagios@den2lweb002 libexec]$ ./check_apache.pl
Can’t locate LWP/UserAgent.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7 /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at ./check_apache.pl line 32.
BEGIN failed–compilation aborted at ./check_apache.pl line 32.

[nagios@myserver libexec]$ ./check_apache.pl -H localhost

Can’t locate LWP/UserAgent.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/site_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7 /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.7/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.6/x86_64-linux-thread-multi /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at ./check_apache.pl line 32.

BEGIN failed–compilation aborted at ./check_apache.pl line 32.

The fix

Lets find the missing LWP/UserAgent.pm module. After searching around on Google I could not find that module alone but I found the Bundle::LWP, so I started this command after a quick su to root…

[root@myserver libexec]# perl -MCPAN -e ‘install Bundle::LWP’

And after about 2 minutes the bundle was installed

Then I dropped back to the nagios user and tested the check_apache.pl command again…

[nagios@myserver libexec]$ ./check_apache.pl -H localhost

HTTPD ok: average response time 184 milliseconds

We are back in business!

Versions used

  • Linux Red Hat Enterprise Linux Server release 5.3
  • Perl v5.8.8 built for x86_64-linux-thread-multi
  • Ground Works 5.3 community edition

Problems running check_memory.pl with missing Groundworks/Nagios

This is the error I’m getting

Can’t locate Nagios/Plugin.pm in @INC …

Looks like the script can’t find the Plugin module form Nagios for Perl so we have to install this using this command (as root):
# perl -MCPAN -e ‘install Nagios::Plugin’ (as root)
Update: If this command fails try to delete ~/.cpan directory and retry the command
Now you should be able to login as the nagios user and run the check_memory.pl command…

$ ./check_mem.pl
CHECK_MEM OK – 4664M free | free=4890664960b;;
BAM!!
Here is the link for the check_memory.pl script