If you run any kind of data server at home, like a NAS, or homelab, keeping an eye on your hard drive health is imperative.
smartmontools can provide an indicator of health using the disks’ SMART values. What is SMART? SMART stands for Self-Monitoring, Analysis, and Reporting Technology, most modern hard drives, SSD, and eMMC drives are equipped with this onboard diagnostic. This records and reports various details about your storage media.
smartmontools can also run scheduled health tests, and has a daemon that that can monitor your disk(s) health and notify you via email, should anything happen. Any sysadmin worth their salt cares about their drives, and we are worth our salt, right?
When it comes to installing
smartmontools we have a few options, you can build it from source or use the pre-compiled package if its available on your distro. As always preform a system update before installing anything to ensure a stable, secure system.
wget to get the latest version (as of the time of this writing was v7.3) from sourceforge.
wget <https://phoenixnap.dl.sourceforge.net/project/smartmontools/smartmontools/7.3/smartmontools-7.3.tar.gz> tar zxvf smartmontools-7.3.tar.gz # install a c++ compiler like g++ if you get an error during compiling # e.g sudo apt install g++ cd smartmontools-7.3/ ./configure make sudo make install
From Package Manager
Use the right command for you distro.
#for debian/ubuntu sudo apt install smartmontools #for arch sudo pacman -S smartmontools #for Fedora\RHEL sudo yum install smartmontools
smartmontools comes with two tools,
smartctl, which is the main CLI you use to probe your disks, and
smartd a daemon which monitors your disks in the background. Let’s take a look at what we can do with
Before we can get any useful data from smartctl, you need to specify what drive to inspect first. By simply running
lsblk, you can find all the currently mounted drives
Here’s some example output you could see from
Once you have the name of the drive you wish to inspect you can run one of the following commands to get more information about it. I’ll provide a table with the switches here, note that this is no substitute for
smartmontool’s own docs, which can be brought up at anytime using
man smartctl or using this link:
|-h, –help||Prints out a table with a brief overview of the commands and their syntax|
|–scan||scans all devices and prints out their names, type and protocol.|
|-d TYPE||smartctl will guess the device type, but if you need to specify it manually you can do so with this switch. eg (ata, scsi, nvme, see man page for full list etc)|
|-i, –info||prints various information about the device|
|-H, –health||Reports SMART health status. On “fail” status the drive is already dead or will die very soon.|
|-a, –all||Prints all SMART info about device. I recommended piping the output to less, more, or pager.|
|-s (on, off)||enable/disable SMART on device.|
|-o (on, off)||ATA only, enable/disable SMART automatic offline test (checks disk every 4 hours for defects)|
|-x, –xall||prints all SMART and non-SMART information about the device.|
|-S (on, off)||Enables/disables SMART autosave of device vendor attributes. Needs only be ran once and preserves this data even if disk is powered off.|
The best way to learn is to jump in! Let’s first run a
--scan switch so that we can get the type of drives we have. Some example output might look like this:
From here, you can probe whatever disk you want for more information. Here are some example commands you can use to scope out your drives.
# scans all drives and displays their type smartctl --scan # -i gives a brief overview of the drive sudo smartctl -i /dev/sda # -a will print out a large amount of information and stats; piped to pager sudo smartctl -d ata -a -i /dev/sda | pager #or more or less # enable SMART and offline testing on device sudo smartctl -s on -o on -S on /dev/sda # check smart health sudo smartctl -d scsi -H /dev/sdc
-H switch will produce some output like this:
If your device did not receive an “OK”, your hard drive might be scrap or in the process of becoming a paperweight, in which case it is time to migrate/rescue your data or Office Space your hard drive in a field somewhere.
CLI tools are all well and good, but the real treat that comes with using
smartd. By default
smartd will enable SMART on all devices and poll attached ATA & SCSI devices every 30 minutes, logging SMART errors, or changes in preconfigured attributes via syslog. It can also be configured to send you email alerts, but more on that later. **
smartd is configurable via
/etc/smartd.conf which is heavily commented with examples, I’ll again provide a brief summary of the important ones next:
|-q WHEN||tells smartd when to quit. See full man page for full details. One useful option is onecheck, which registers attached devices and checks the status of the devices exactly once and returns this as text output.|
|-d, –debug||verbose status information, and does not fork into the background, in this mode, CTRL+C will not exit the readout but instead reload the smartdconfig, to exit this mode use CTRL+\.|
|-h||prints help and command summary|
|-i N, –interval N||Sets how often smartd will check disk in seconds, default is 1800 secs or 30 minutes.|
Now let’s start
smartd using using
systemd but feel free to substitute with your own
init manager. These are all self explanatory.
sudo systemctl start smartd.service sudo systemctl stop smartd.service sudo systemctl restart smartd.service
Using the info from the table we can use these starter commands to get some useful monitoring information in real-time.
#register and check all devices once smartd -q onecheck # run in foreground in debug mode, checking polling every device every 15 seconds # exit with ctrl+/ smartd -d -i 15 #running smartd on its own will run in the daemon in the background and log to syslog smartd
Now that the process is running we can set up
/etc/smartd.conf. But before we do that let’s side step and quickly setup the email notification requirements before we hit up the config file.
Setting up Email Notification
smartd can not send mail on its own, by default it will attempt to use whatever mail system is installed on the server, typically
sendmail which if not configured so it will just send these alerts to
/var/spool/mail/username – not super useful. In order to get these alerts off our server we’ll need to have MTA (mail transfer agent) and an SMTP server. In our case, to avoid having to setup an entire SMTP server just to get hard drive alerts, we can just use the packages:
msmtp-mta and use Gmail as an SMTP relay. In other words we will use a Gmail account to forward the
smartd alerts. Note that you can also accomplish this with
exim4 but those require extra configuration.
Setting up Gmail
If you don’t use Gmail feel free to skip this section and move on to installing msmtp section.
To setup a Gmail for this, all you need to do is visit this link. Which after logging in will allow you to create an app specific password. It will then show you the app password, and you must copy this down somewhere because it won’t show you this password again.
msmtp is a very simple, light SMTP mail client, First thing we need to do is install
sudo apt-get install msmtp
Once that installs you must create and edit the
/etc/msmtprc file using your text editor of choice. I’ll use
vim here, but you can use
nano or text editor of choice:
sudo vim /etc/msmtprc
Quick Vim Tips! press
ato enter INSERT mode, then hit
ESCto exit INSERT mode and then type
:wq!to save and exit.
Once there, enter the following in that file. If you want to use something other than Gmail, all you need to do is enter your SMTP providers details instead where noted. For more help check out
msmtp’s ArchWiki entry, linked here.
# /etc/msmtprc defaults auth on tls on tls_trust_file /etc/ssl/certs/ca-certificates.crt logfile ~/.msmtp.log #this can be whatever destination you choose # Gmail configuration account gmail host smtp.gmail.com port 587 from email@example.com user yourusername password app-specific-password # An alternative service, enter your SMTP details here account yourmail host smtp.yourmail.net from firstname.lastname@example.org port 333 user yourusername password app-specific-password account default: gmail # or yourmail
NOTE: This file will store your app password in plain text
After the file is created, it must be given the correct file permissions, otherwise it will not send.
groups msmtp sudo chown msmtp:msmtp /etc/msmtprc sudo chmod 600 /etc/msmtprc
Now that we have
msmtp setup we now must create the /etc/mail.rc file which will tell sendmail to use
msmtp or we can use
msmtp-mta, which symlinks
msmtpa or simply type a command in the terminal. I’ll present all three options below:
#create and edit mail.rc vim /etc/mail.rc # enter the following in /etc/mail.rc or on its own in terminal set sendmail="/usr/bin/msmtp -t" # install msmtp-mta sudo apt install msmtp-mta
Just because I’m a stickler, I did all three methods, redundant I know, but I really wanted to make sure I could get emails out. Once you have that done that, you can send yourself a test mail from the command line with the following:
echo -e "Subject: This is a Test" | sendmail email@example.com -F servers-hostname #can be plain hostname
If it shows up in your inbox you are good to go, if not, troubleshooting you must go.
Now that we have the mail system setup, let’s take a look at
/etc/smartd.conf. which is a heavily commented file with tons of examples for scheduling tests, and setting up various attributes to monitor. Open
/etc/smartd.conf with your preferred editor, then look for line #22.
sudo vim /etc/smartd.conf
You should see something like the following, which can be safely commented out and replaced with the following directive which will among other things, test the email functionality on the reboot of the entire
smartmontools service. It can be triggered manually by using:
sudo systemctl restart smartmontools.
#/etc/smartd.conf #... #DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner DEVICESCAN -a -m firstname.lastname@example.org -M test #...
If you did it all correctly you should get an email for every hard drive installed on the system, in my case I recieved 5 test emails from
msmtp. Now is a good a time as any to cover some more basic directives we can use inside this file. Starting at line #114, there is a very comprehensive list of which I’ll summarize some key ones here:
|-a||equivalent to running -H -f -t -l error -l selftest -C 197 -U 198|
|-H||Monitor SMART Health, and report failures|
|-f||Monitor for failure of any ‘Usage’ Attributes|
|-t||Equivalent to -p and -u|
|-p||Report changes in ‘Prefailure’ attributes|
|-u||Report changes in ‘Usage’ attributes|
|-l TYPE||Monitor SMART log. options are (error | selftest)|
|-C ID||Report Current Pending Sector counts|
|-U ID||Report Offline Uncorrectable counts|
|-m||Send warning email with results of -H, -l, and -f|
|-M Type||see the man page for other options. Using -M test, sends a test email an service start.|
|-d TYPE||specifies type of devices just like smartctl|
|-W (D,I,C)||Monitor temperature Difference, Informal limit, & Critical limit.|
|-s (L/../../7/02)||schedule tests using REGEXP, see man.|
Now that we know our email works we can go ahead and comment out that test directive and setup a real world option using the above table and
man smartd. Let’s say that instead of monitoring every disk, what if I just wanted to be alerted about one specific drive and run Health Checks every Sunday at 2-3am or if I only wanted to be alerted of any SMART errors?
#monitors /dev/sda and preforms a long test every sunday between 2-3am /dev/sda -d scsi -a -s L/../../7/02 # Only reports SMART health failures to a email addres /dev/sdc -H -C 0 -U 0 email@example.com
There are tons of examples in the conf file so I recommend giving it a thorough read as this is only a very small example of what can be done with
Congrats on a successful
smartmontools installation! You’ve now leveled up your sysadmin game, and you can now shoot lightning out of your finger tips! Not really but being able to get an email notification is pretty hardcore if I do say so myself, it sure beats the hell out of manually checking your drives or doing what I do, and just don’t check at all until there’s a failure. Happy sysadmin-ing and may all your hard drives be blessed with many read/write cycles and no bad sectors, amen.
A note on security
Now in order to be good admins, we shouldn’t store plain text files on our servers, so you have to weigh in the benefit having email alerts or implementing something like
pass or a keyring so that you can encrypt that file.