Just installed GroundWorks Community Edition 5.3 and this is a great product but when comparing the built-in graphing capabilities to Hyperics community editon it falls short. Adding monitoring to MySQL, Apache or Tomcat does not give you graphs out-of-the-box but this CAN and will be fixed…
FYI I’m assuming that groundworks is installed and working, and also that you have the ability to monitor cpu, memory and more using ssh to your mysql server. If you don’t have this setup then some steps in this article will NOT work.
Quick rundown of fix
- Create a service for Nagios under the configuration screen.
- Add the new service to your MySQL server
- Go to performance and setup a “Graph” template for the MySQL service
- Watch graphs
Ok that was the quick run-down and that was what I just could NOT get to work the first-time around. I’ll go though how you can add multiple graph points and also how to debug the issue if like happened to me NOTHING is graphed or even that the graph is not created.
First we need a user on the MySQL server that has STATUS only access
mysql> grant usage on *.* to ‘nagios’@’groundworks-host’ identified by ‘password’;
Lets create a service and a command that uses the check_mysql script that is supplied with Groundworks.
||$USER1$/check_mysql -H $HOSTADDRESS$ -u “$ARG1$” -p “$ARG2$” -P “$ARG3$”
The add your mysql server in the test box and click “test”
||/usr/local/groundwork/nagios/libexec/check_mysql -H 192.168.2.2 -u "nagios" -p "password" -P "3306"
Uptime: 1862427 Threads: 7 Questions: 195920 Slow queries: 2 Opens: 166 Flush tables: 1 Open tables: 159 Queries per second avg: 0.105
Command returned exit status 0
Nice that works! but that is the easy part. Now we need to look at the “performance” area and actually create the graph setup.
Creating the graph
Go to Performance – configure. You will see a long list of the default graphs and now we need to add our own.
The top one should be Current Load click on the copy button and create a new performance graph that will look like this:
|Use Service as a Regular Expression
|Status Text Parsing Regular Expression:
||[\w\d\s:]+Threads: (\d+)[\w\d\s:]+avg: ([\d\.]+)
|Use Status Text Parsing instead of Performance Data
|RRD Create Command
||$RRDTOOL$ create $RRDNAME$ –step 60 –start n-1yr DS:Threads:GAUGE:1800:U:U DS:QPS:GAUGE:1800:U:U RRA:AVERAGE:0.5:1:8640 RRA:AVERAGE:0.5:12:9480 2>&1
|RRD Update Command
||$RRDTOOL$ update $RRDNAME$ $LASTCHECK$:$VALUE1$:$VALUE2$ 2>&1
|Custom RRDtool Graph Command
I have MySQL servers listening to multiple ports so when I created the graph I made the service name a regular expression.
This matches the three service names that I have created
The name of the service IS VERY IMPORTANT when you create the graph. When the performance data is imported in the performance graph Groundworks does a search through the database to find the service data using the service name. Since I setup the service with multiple ports for the same mysql_status service it adds the port-3306… to each of the services and that is the name stored in the database. Here is how I setup the multiple ports
In the service check of the hosts mysql_status I added three instances at the bottom of that screen.
||Instance Name Suffix
Ok back to the graph setup…
After you have saved the new performance graph there are two directories that you need to worry about.
In the rrd directory you will get a file that is named something like mysqlservername.cadechristian.com_mysql_status-port-3306.rrd, this file is created using the RRD Create Command from the performance template. That will will also be updated using the RRD Update Command.
Lets talk about the RRD Create command
$RRDTOOL$ create $RRDNAME$ –step 60 –start n-1yr DS:Threads:GAUGE:1800:U:U DS:QPS:GAUGE:1800:U:U RRA:AVERAGE:0.5:1:8640 RRA:AVERAGE:0.5:12:9480 2>&1
The important part is
–step (how often to plot on the graph) 300 is a good production number but for testing I like 60
DS:Threads:GAUGE:1800:U:U – First datasource
DS:QPS:GAUGE:1800:U:U – Second Datasource
2>&1 put errors to standard out log.
The datasources are found through the regex that I setup that parses the output from the mysql command ( Each datasource is one regex group ):
[\w\d\s:]+Threads: (\d+)[\w\d\s:]+avg: ([\d\.]+)
The (\d+) is how you create a group with the () around what you are matching. So in this regex we get the first group with the number of threads and the second group with the Queries per second.
Now it is important that we also have 2 values in the RRD update command.
$RRDTOOL$ update $RRDNAME$ $LASTCHECK$:$VALUE1$:$VALUE2$ 2>&1
If you need to change the RRD create command you need to MANUALLY DELETE the RRD file in the /usr/local/groundworks/rrd directory!!!
The rrd file is created in a default setup after 1-2 min and the performance data will be parsed and visible in the graph maybe after 5min.
The “event” scripts are located in /usr/local/groundworks/nagios/eventhandlers and inside of the scripts there is a variable called $debug that you should set to 3 ( default = 0 ) if you want to get a more descriptive output in the logs.
The logfile /usr/local/groundworks/nagios/eventhandlers/service_perfdata.log has very light debug information but it shows when the service is pulled for a name host
1244226293 den2mysql002.openlogic.com mysql_status-port-3306 Uptime: 1864121 Threads: 7 Questions: 196033 Slow queries: 2 Opens: 166 Flush tables: 1 Open tables: 159 Queries per second avg: 0.105
1244226296 den2mysql002.openlogic.com tcp_ssh SSH OK – OpenSSH_4.3 (protocol 2.0)
1244226297 olex-perfdb icmp_ping PING OK – Packet loss = 0%, RTA = 0.13 ms
1244226293 mysqlserver mysql_status-port-3306 Uptime: 1864121 Threads: 7 Questions: 196033 Slow queries: 2 Opens: 166 Flush tables: 1 Open tables: 159 Queries per second avg: 0.105
1244226296 mysqlserver tcp_ssh SSH OK - OpenSSH_4.3 (protocol 2.0)
1244226297 mysqlserver icmp_ping PING OK - Packet loss = 0%, RTA = 0.13 ms
This is ok information BUT it will not tell us if something did NOT work. So lets add $debug = 3 to the process_service_perf(_db AND _db_file).pl file and wait for the process_service_perf.log file to be created in the eventhandlers directory
And here is a good entry from the debug log:
Fri Jun 5 14:43:14 MDT 2009
Statustext: Uptime: 1872401 Threads: 7 Questions: 196614 Slow queries: 2 Opens: 166 Flush tables: 1 Open tables: 159 Queries per second avg: 0.105
No exact service name mysql_status-port-3306. Query database for service pattern matches.mysql_status-port-3306 matches database service pattern mysql_status-port-330[1-9]. Using this entry.
Match in status text for regular expression (?-xism:[\w\d\s:]+Threads: (\d+)[\w\d\s:]+avg: ([\d\.]+))
SQL = SELECT hs.host_service_id FROM host_service as hs, datatype as dt WHERE (hs.host=’mysqlserver’ AND hs.service=’mysql_status-port-3306′ AND dt.type=’RRD’ AND dt.location=’/usr/local/groundwork/rrd/mysqlserver_mysql_status-port-3306.rrd’ AND hs.datatype_id=dt.datatype_id)
Table host_service, host=mysqlserver, service=mysql_status-port-3306 already has an exisitng entry for location /usr/local/groundwork/rrd/mysqlserver_mysql_status-port-3306.rrd. New entry not added.
/usr/local/groundwork/common/bin/rrdtool update /usr/local/groundwork/rrd/mysqlserver_mysql_status-port-3306.rrd 1244234573:7:0.105 2>&1
Label Threads found in rrd create string: $RRDTOOL$ create $RRDNAME$ –step 60 –start n-1yr:GAUGE:1800:U:U DS:QPS:GAUGE:1800:U:U RRA:AVERAGE:0.5:1:8640 RRA:AVERAGE:0.5:12:9480 2>&1
Label QPS found in rrd create string: $RRDTOOL$ create $RRDNAME$ –step 60 –start n-1yr:GAUGE:1800:U:U:GAUGE:1800:U:U RRA:AVERAGE:0.5:1:8640 RRA:AVERAGE:0.5:12:9480 2>&1
Posting data to Foundation
Elapsed Execution time = 4.50323104858398 seconds
You see that it finds the service name using the regex for the service, then we find two values that are stuffed into the two labels that we have (Threads, QPS)
One of the biggest issues I had was with the collected performance data. After I found that I could use regex everything started working much better.
Using Groundworks to monitor and get graphs is NOW easy but it was a pain to get to this point. I turning point was finding the eventhandler debug log. I hope this will be helpful to someone out there…