If you’re managing a data center with Linux machines, one critical task is checking the health of your SSDs regularly. Even though SSDs outlast traditional hard drives, they still have a limited lifespan. You definitely want to avoid an unexpected drive failure.
So, how do you check your SSD’s health? In the Linux world, you have options. There’s a GUI tool called GNOME Disks, but I suggest using a command-line tool instead. Many Linux servers don’t have a GUI, and using the command line allows you to connect to your remote server securely and run checks via the terminal.
The tool you’ll want is smartctl, which provides a quick look at your SSD’s health. Keep in mind, how reliable the info is can vary by SSD make and model. Not all SSDs play nicely with S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) data, so you might not get the complete picture of write cycles.
Let’s install and run smartctl.
Installation
I’ll show you how to do this on Ubuntu, but you can adjust the command based on your distribution. You’ll find the necessary package in standard repositories. Install smartmontools with this command:
sudo apt install smartmontools
While this installs additional packages like libgsasl7 and postfix, they’re standard dependencies, so no surprises there. Once it’s installed, you’re ready to check your drives.
Usage
First, gather information about your SSD with:
sudo smartctl -i /dev/sdX
Just replace “sdX” with your drive’s identifier. This command gives you detailed info about the drive.
Next, run a short test to check the drive’s health:
sudo smartctl -t short -a /dev/sdX
This will return valuable data immediately. I recommend running short and long tests at least once a week or month. For a more thorough check, use:
sudo smartctl -t long -a /dev/sdX
You’ll see the results of the S.M.A.R.T. overall health self-assessment. Ideally, it should say “PASSED.” If it doesn’t, you’ve got a problem.
The short test checks three main things:
- Electrical Properties: Tests the drive’s electronics.
- Mechanical Properties: Assesses servos and positioning mechanisms.
- Read/Verify: Reads specific areas to verify data.
The short test takes about two minutes; the long test runs for 20 to 60 minutes, depending on your hardware. To view the results, again run:
sudo smartctl -a /dev/sdX
This prints out both the test results and other info to verify your SSD’s health.
Look for two important values in the output:
- Power_On_Hours: Indicates how many hours the drive has been in use. Most modern SSDs last a long time, but older drives might be closer to the end.
- Wear_Leveling_Count: Shows the drive’s remaining endurance as a percentage, starting at 100 and decreasing over time.
Check the “value” and “worst value” columns. A healthy drive will usually have a Wear_Leveling_Count around 99 or higher.
Remember, not all manufacturers report the same data. For example, I have an older Intel and Kingston SSD, but they lack detailed Wear_Leveling_Count data. Your best bet is to run both the short and long tests to assess health accurately.
Keep These Caveats in Mind
Two things to watch out for with smartctl:
First, it’s easy to misinterpret the data reported. Knowing your drive’s make and model helps you research any quirks.
Second, always utilize the testing tools. Just running the command smartctl -A /dev/sdX
doesn’t give you the same insights as performing the tests. Regularly run both tests to keep tabs on your SSDs.
This approach will help you maintain optimal drive health, allowing your data center to run smoothly.