Modern Hard drives have an internal mechanism called S.M.A.R.T. through which it is possible to know when a hard disk is about to fail. Wouldn’t it be nice of the server to Email you before such a failure?

Overview

Programs like the “mdadm” (for software RAID management) and the “Palimpsest Disk Utility” (used on the Ubuntu LiveCD), use the S.M.A.R.T information to inform you when the disk is about to or has failed. However on a headless server (no GUI) there is no service that will inform you of the pending doom before it is too late. Moreover, how would you know about it without manually logging into the server?

This script, when run once a day with cron, will alert if any of the system’s Hard Drives bad sectors count has reached a limit that is deliberately lower then “the disk is bad” threshold, and email the warning to the machine’s administrator.

Prerequisites and assumptions

You have already setup Email support for the server using the “How To Setup Email Alerts on Linux” guide. You’re using a Debian based system. You’re not using a *hardware RAID controller. You will see me use VIM as the editor program, this is just because I’m used to it… you may use any other editor that you’d like.

*Because it is very possible that the hardware RAID controller blocks the system’s access to this information.

Setup

Install the “smartmontools” package which reads the S.M.A.R.T information from the hard drive controller and presents it to us.

Create the monitor script:

Make this it’s content:

The key points to note are:

smartc_func() { /usr/sbin/smartctl -A /dev/$1 | grep Reallocated_Sector_Ct |tr -s ’ ‘|cut -d’ ’ -f11 }

########End of Functions########

########Set working parameter######## temp_email_file=/tmp/smart_monitor.txt allowed_threshold=5 #set the amount of bad sectors your willing to live with, recommended 5.

########Engine######## for i in sda sdb ; do # Add or subtract disk names from this list as appropriate for your setup. if [[ “smartc_func $i” -ge $allowed_threshold ]] ; then echo Emailing the Administrator email_admin_func “One of the HDs on “hostname”, has reached the upper threshold limit!!! nThe threshold was set to:$allowed_threshold and the $i disk status was: “smartc_func $i”” fi done

Email function – Set the appropriate information like the machine name and administrator email. Allowed threshold – Set this parameter to what you feel is appropriate, I have used 5 because the limit set for the “server grade” hard drives i’v used was 10. (i’v found the threshold for “consumer grade” drives to be as high as 140). Set the devices that you want to monitor by adjusting the enumeration of disk names in the “for” loop. Currently two disks (sda & sdb) are included, so adjust for your setup. You may include all of your disks or just some, if you need to *exclude a disk for some reason.

*in my original setup the first disk was a flash drive so reading its information if at all possible isn’t of much use.

Make the script executable:

The setup is done.

Schedule the script to be run automatically

We want to make the script run automatically so we will create a new Cron job for it. As stated in the “How To Setup Email Alerts on Linux” guide the upshot of doing so, is that if the script itself encounters an error, cron will automatically inform us via email as soon as it happens.

Open the cron job scheduler:

Add this to its content:

This will set the script to be run every morning at 7AM.