Introduction

You've a raid and you want to monitor it with FreeBSD. That may or may not be a problem. I'll try to summarise all information I got. If you know that there's something incorrect or outdated, please contact me. In general monitoring the state of a raid may be problematic, if the hardware does not expose the needed information or does just expose it via notification (it sends a messages "raid status changed" through the driver, which you can try to grep out of syslog, but you cannot monitor it actively).

Status of this document

This document was initially written on the 2nd of August 2007. It was migrated to www.nico.schottelius.org on the 12th of May 2009.

You can have a look into git, to see when it was last updated.

List of raid systems and how to monitor them

FreeBSD gmirror software raid

As you might expect, monitoring this raid is pretty easy. We achieved that with the following two scripts:

ddna044% cat /usr/local/scripts/fbsd_raid_monitor/cfs_gmirror.sh 
#!/bin/sh
#==============================================================================
# Copyright (c) 2007, Netstream AG
# Author: Nico Schottelius <nico-freebsd-raid-monitoring <at> schottelius.org>
# Created: 2007-04-23
# Description: Display state of all gmirror devices
# Created-By: /home/user/nico/firmen/netstream/sh/neues_skript.sh
#==============================================================================

gmirror list | \
awk -F: 'BEGIN { print "gmirror devices";
print "---------------";
}
/^Geom name:/ {
name=$2
}
/^State:/ {
print name ":" $2
}'

And the one that is called by cron:

ddna044% cat /usr/local/scripts/fbsd_raid_monitor/cfrib_gmirror.sh  
#!/bin/sh
#==============================================================================
# Copyright (c) 2007, Netstream AG
# Author: Nico Schottelius <nico-freebsd-raid-monitoring <at> schottelius.org>
# Created: 2007-04-23
# Description: Report broken devices.
# Created-By: /home/user/nico/firmen/netstream/sh/neues_skript.sh
#==============================================================================

check=$(dirname $0)/cfs_gmirror.sh

# Skip first two lines: header
"$check" | awk -F": " 'BEGIN { getline; getline } $2 !~ /COMPLETE/ { print $1 ":" $2 }'

LSI / Symbios Megaraid (amr driver)


There are two possibilities to monitor amr-based devices:

  • with megarc
  • with amrstat


The utility "amrstat" is availale in ports as sysutils/amrstat and is FOSS. Calling it reveals all needed information:

ddna044# amrstat 
Logical volume 0: optimal (136.73 GB, RAID0)
Logical volume 1: optimal (136.73 GB, RAID0)
Physical drive 1:1 online
Physical drive 1:2 online


The utility "megarc" is available in ports (sysutils/megarc), which is a closed source binary provided by LSI. I've found two easy to use scripts for this controller written by Scott Mitchell on http://lists.freebsd.org/pipermail/freebsd-questions/2006-June/125470.html:

#!/bin/sh -f
#
# Check status of RAID volumes on amr(4) controllers using the LSI MegaRC
# utility. If any logical drive has a status other than OPTIMAL, or any
# physical disks has a status other that ONLINE, display the full status
# for the adapter. If more than one adapter exists, add additional unit
# numbers to $adapters.
#
# $Id$
#

adapters="0"

for adapter in $adapters; do
status=`/usr/local/sbin/megarc -ldinfo -a${adapter} -Lall -nolog |\
/usr/bin/sed '1,$s/^M//' |\
/usr/bin/sed '1,/Information Of Logical Drive/d'` ||\
echo "Failed to get RAID status for AMR adapter ${adapter}"

echo "${status}" |\
/usr/bin/egrep '^ Logical Drive : .*: Status: .*$' |\
/usr/bin/egrep -qv 'OPTIMAL$'
drives=$?

echo "${status}" |\
/usr/bin/egrep '^ [0-9]+' |\
/usr/bin/egrep -qv 'ONLINE$'
disks=$?

if [ ${drives} -ne 1 -o ${disks} -ne 1 ]; then
echo ""
echo "AMR RAID status (adapter ${adapter}):"
echo "${status}"
fi
done

Warning: The above script may not work when doing copy and paste, as reported by Per olof Ljungmark:

I proceeded to test the scripts but the first one gives you an error due
to what Scott Mitchell wrote in his original mail:
"BTW, the '^M' in the amr-check-status script is a real Control-M
character, and there are embedded tabs in a couple of the egrep patterns,
in case those get lost in transit."


Don't know if ^M will show in a browser but the 16th. line should read:
/usr/bin/sed '1,$s/^M//' |\
otherwise you will get a sed error.

And the other one:


#!/bin/sh -f
#
# Display status of RAID volumes on amr(4) controllers using the LSI MegaRC
# utility. If more than one adapter exists, add additional unit numbers to
# $adapters.
#
# $Id$
#

# If there is a global system configuration file, suck it in.
#
if [ -r /etc/defaults/periodic.conf ]; then
. /etc/defaults/periodic.conf
source_periodic_confs
fi

adapters="0"

rc=0
case "${daily_amr_status_enable:-YES}" in
[Nn][Oo])
;;
*)
for adapter in $adapters; do
echo ""
echo "AMR RAID status (adapter ${adapter}):"
/usr/local/sbin/megarc -ldinfo -a${adapter} -Lall -nolog |\
sed '1,/Information Of Logical Drive/d' || rc=$?
done
;;
esac

exit "$rc"

For more information on supported devices have a look at amr(4).

mpt


mpt based devices can be monitored under Linux with the kernel module "mptctl" and the FOSS tool "mpt-status". There seems to be no support under FreeBSD available currently. For more information about mpt have a look at mpt(4).

ciss

Known tools:

  • camcontrol
  • hpacucli


This driver is used for most HP / Compaq controllers and is (afaik) found in almost all modern SAS/SATA systems provided by HP. As described in http://www.unixadmintalk.com/f41/monitoring-raid-arrays-51889/, you can monitor it via camcontrol:

# camcontrol inquiry da0
pass0: <COMPAQ RAID 1 VOLUME OK> Fixed Direct Access SCSI-0 device
pass0: 135.168MB/s transfers

(This is untested by me, just found it on the net). On http://lists.freebsd.org/pipermail/freebsd-proliant/2006-October/000169.html I also found the relevant strings to look for:

During normal operation of the raid:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1 VOLUME OK> Fixed Direct Access SCSI-0 device

After removing one of the raid member disks:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1 VOLUME inte> Fixed Direct Access SCSI-0 device

After re-inserting the raid member disk:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1 VOLUME reco> Fixed Direct Access SCSI-0 device

And about 45 minutes later:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1 VOLUME OK> Fixed Direct Access SCSI-0 device

You could also use hpacucli, which can be found at http://people.freebsd.org/~jcagle/. I have no experience with it. So if you have, you can send report or scripts to monitor it to me, so I can include it here (the hint to it was send by Jaimie Sirovich.

3ware raid: twa/twe

Install and configure sysutils/3dm. This installs a daemon that provides a webinterface and which is also capable to notify you via e-mail if something happens. This is perhaps the easiest way of monitoring raid in FreeBSD. The other possibility to monitor 3ware raids is via tw_cli.

ataraid

This is a softwareraid driver for many different cards. Have a look at ataraid(4). Somebody in ##freebsd (irc.freenode.org) pasted the url http://www.monkeybrains.net/~rudy/example/raid_status.html, which contains a script that monitors gmirror, 3ware (via tw_cli) and also ataraid (ar0) via atacontrol. For archiving, the script is mirrored below:

#!/bin/sh

# raid_status - check the state of the RAID.

# This script works for various types of RAID devices. (Currently, 3Ware, gmirror, BSd 'ar0' raids)
# WARNING: Install the proper CLI program for your 3ware card, if you use 3ware.

# Set up a cronjob like this:
# */16 * * * * /home/rudy/bin/raid_status CRON

### Copyright (c) 2006, Rudy Rucker All rights reserved.
### Redistribution and use of script, with or without modification, is
### permitted provided that the following condition is met:
### Redistributions of source code must retain the above copyright
### notice, this list of conditions and the following disclaimer.
### THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND
### ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
### IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
### ARE DISCLAIMED.

# ----------- Change Log ------------
# Mon Oct 11 15:20:37 PDT 2004 - rudy
# Original script.
# Tue Feb 7 01:28:07 PST 2006 - rudy
# Added 9500 and 9550 support
# Fri Jun 9 10:38:33 PDT 2006 - rudy
# works for 'ar' and 'tw' mirrored arrays
# Tue Sep 12 10:23:13 PDT 2006 - rudy
# Added gmirror and realized that not all 3ware's are the same...

MODE=$1

TWCLI="/usr/local/bin/tw_cli"
GMIRROR="/sbin/gmirror"
ATACONTROL="/sbin/atacontrol"

AWK="/usr/bin/awk"
GREP="/usr/bin/grep"
MAIL="/usr/bin/mail"

EMAIL="noc@example.com"

# if this is not a 3ware card, check the atacontol
if [ -c /dev/twed0 ] && [ -x $TWCLI ]; then
# 3ware card ... 8000 series
STATUS=`$TWCLI info c0 u0 | $GREP "^Status" | $AWK {'print $2'}`;
VALID='OK'
ESTATUS_CMD="$TWCLI info c0 u0";
# double check the 3ware output incase it returned nada...
# Umm... this is the only raid card I have witness this bug
if [ "X$STATUS" = "X" ]; then
sleep 1;
STATUS=`$TWCLI info c0 u0 | $GREP "^Status" | $AWK {'print $2'}`;
fi
elif [ -c /dev/da0 ] && [ -x $TWCLI ]; then
# Note, there are plenty of other device names that use da0... this script is
# not for those... works with:
# 3ware 9550SX, 9500S
STATUS=`$TWCLI info c0 | $GREP "^u0" | $AWK '{print $3}'`;
VALID='OK'
ESTATUS_CMD="$TWCLI info c0 u0"
elif [ -c /dev/mirror/gm0 ] && [ -x $GMIRROR ]; then
# gmirror /dev/mirror/gm0
STATUS=`$GMIRROR status gm0 | $GREP "^mirror" | $AWK {'print $2'}`;
VALID='COMPLETE'
ESTATUS_CMD="$GMIRROR list";
elif [ -c /dev/ar0 ] && [ -x $ATACONTROL ]; then
# Motherboard promise and others
STATUS=`$ATACONTROL status ar0 | $GREP "status" | $AWK -F 'status: ' '{print $2}'`;
VALID='READY'
ESTATUS_CMD="/sbin/atacontrol status ar0"
else
echo "Unknown Raid type.... ";
if [ -x $TWCLI ]; then
echo " + found $TWCLI";
else
echo " - can't exec $TWCLI";
fi
if [ -x $ATACONTROL ]; then
echo " + found $ATACONTROL";
else
echo " - can't exec $ATACONTROL";
fi
if [ -x $GMIRROR ]; then
echo " + found $GMIRROR";
else
echo " - can't exec $GMIRROR";
fi
exit;
fi

# Okay, we checked the raid status and know what the return code should be.
if [ "$STATUS" = "$VALID" ]; then
if [ "$MODE" = "CRON" ]; then
exit;
fi
echo "OK condition";
$ESTATUS_CMD
exit;
fi

# ERROR! Either print to TTY or send an email, based on MODE (which is arg[1])
if [ "$MODE" = "CRON" ]; then
$ESTATUS_CMD | $MAIL -s "[ERROR] Raid array on $HOST returned $STATUS" $EMAIL
else
echo "ERROR condition"
$ESTATUS_CMD
fi

Adaptec: aac

Jaimie Sirovich reported that you can monitor some adaptec card with the aacli More information and examples are currently missing.

Areca: arcmsr

The areca controller can either be monitored directly from the raid controller (8 and 16 port versions), which has an own nic and rj45 port or via the closed source webserver (which is the same one as running on the controller). It can be downloaded from areca.com. Configuring it means just to click around in the webinterface.

asr

Are reported to be monitorable via asr-utils (confirmation needed).