BashFAQ/094

I want to get an alert when my disk is full (parsing df output).

Sadly, parsing the output of df really is the most reliable way to determine how full a disk is, on most operating systems. However, please note that this is a "least bad" answer, not a "best" answer. Parsing any command-line reporting tool's output in a program is never pretty. The purpose of this FAQ is to try to describe all the problems this approach is known to encounter, and work around them.

The first, biggest problem with df is that it doesn't work the same way on all operating systems. Unix is divided largely into two families -- System V and BSD. On BSD-like systems (including Linux, in this case), df gives a human-readable report:

 ~$ df
 Filesystem           1K-blocks      Used Available Use% Mounted on
 /dev/sda2              8230432   3894324   3918020  50% /
 tmpfs                   253952         8    253944   1% /lib/init/rw
 udev                     10240        44     10196   1% /dev
 tmpfs                   253952         0    253952   0% /dev/shm

However, on System-V-like systems, the output is completely different:

 $ df
 /net/appl/clin   (svr1:/dsk/2/clin/pa1.1-hpux10HP-UXB.10.20):  1301728 blocks            -1 i-nodes
 /net/appl/tool-share (svr2:/dsk/4/dsk3/tool/share): 51100992 blocks       4340921 i-nodes
 /net/appl/netscape (svr2:/dsk/4/dsk3/netscape/pa1.1-hpux10HP-UXB.10.20): 51100992 blocks       4340921 i-nodes
 /net/appl/gcc-3.3 (svr2:/dsk/4/dsk3/gcc-3.3/pa1.1-hpux10HP-UXB.10.20): 51100992 blocks       4340921 i-nodes
 /net/appl/gcc-3.2 (svr2:/dsk/4/dsk3/gcc-3.2/pa1.1-hpux10HP-UXB.10.20): 51100992 blocks       4340921 i-nodes
 /net/appl/tool   (svr2:/dsk/4/dsk3/tool/pa1.1-hpux10HP-UXB.10.20): 51100992 blocks       4340921 i-nodes
 /net/home/wooledg    (/home/wooledg       ):   658340 blocks     87407 i-nodes
 /net/home            (auto.home           ):        0 blocks         0 i-nodes
 /net/hosts           (-hosts              ):        0 blocks         0 i-nodes
 /net/appl            (auto.appl           ):        0 blocks         0 i-nodes
 /net/vol             (auto.vol            ):        0 blocks         0 i-nodes
 /nfs                 (-hosts              ):        0 blocks         0 i-nodes
 /home                (/dev/vg00/lvol5     ):   658340 blocks     87407 i-nodes
 /opt                 (/dev/vg00/lvol6     ):   623196 blocks     83075 i-nodes
 /tmp                 (/dev/vg00/lvol4     ):    86636 blocks     11404 i-nodes
 /usr/local           (/dev/vg00/lvol9     ):   328290 blocks     41392 i-nodes
 /usr                 (/dev/vg00/lvol7     ):   601750 blocks     80228 i-nodes
 /var                 (/dev/vg00/lvol8     ):   110696 blocks     14447 i-nodes
 /stand               (/dev/vg00/lvol1     ):   110554 blocks     13420 i-nodes
 /                    (/dev/vg00/lvol3     ):   190990 blocks     25456 i-nodes

So, your first obstacle will be recognizing that you may need to use a different command depending on which OS you're on (e.g. bdf on HP-UX); and that there may be some OSes where it's simply not possible to do this with a shell script at all.

For the rest of this article, we'll assume that you've got a system with a BSD-like df command.

The next problem is that the output format of df is not consistent across platforms. Some plaforms use 6 columns of output. Some use 7. Some platforms (like Linux) use 1-kilobyte blocks by default when reporting the actual space used or available; others, like OpenBSD or IRIX, use 512-byte blocks by default, and need a -k switch to use kilobytes.

Worse, often a line of output will be split into multiple lines on the screen. For example (Linux):

 Filesystem           1K-blocks      Used Available Use% Mounted on
 ...
 svr2:/dsk/4/dsk3/tool/i686Linux2.4.27-4-686
                       35194552   7856256  25550496  24% /net/appl/tool

If the device name is sufficiently long (very common with network-mounted file systems), df may split the output onto two lines in an attempt to preserve the columns for human readability. Or it may not... see, for example, OpenBSD 4.3:

 ~$ df
 Filesystem  512-blocks      Used     Avail Capacity  Mounted on
 /dev/wd0a       253278    166702     73914    69%    /
 /dev/wd0d      8121774   6904178    811508    89%    /usr
 /dev/wd0e      8121774   6077068   1638618    79%    /var
 /dev/wd0f       507230        12    481858     0%    /tmp
 /dev/wd0g      8121774   5653600   2062086    73%    /home
 /dev/wd0h    125253320 116469168   2521486    98%    /export

 ~$ sudo mount 192.168.2.5:/var/cache/apt/archives /mnt
 ~$ df
 Filesystem                          512-blocks      Used     Avail Capacity  Mounted on
 /dev/wd0a                               253278    166702     73914    69%    /
 /dev/wd0d                              8121774   6904178    811508    89%    /usr
 /dev/wd0e                              8121774   6077806   1637880    79%    /var
 /dev/wd0f                               507230        12    481858     0%    /tmp
 /dev/wd0g                              8121774   5653600   2062086    73%    /home
 /dev/wd0h                            125253320 116469168   2521486    98%    /export
 192.168.2.5:/var/cache/apt/archives    1960616   1638464    222560    88%    /mnt

Most versions of df give you a -P switch which is intended to standardize the output... sort of. Older versions of OpenBSD still split lines of output even when -P is supplied, but Linux will generally force the output for each file system onto a single line.

Therefore, if you want to write something robust, you can't assume the output for a given file system will be on a single line. We'll get back to that later.

You can't assume the columns line up vertically, either:

 ~$ df -P
 Filesystem         1024-blocks      Used Available Capacity Mounted on
 /dev/hda1               180639     93143     77859      55% /
 tmpfs                   318572         4    318568       1% /dev/shm
 /dev/hda5                90297      4131     81349       5% /tmp
 /dev/hda2              5763648    699476   4771388      13% /usr
 /dev/hda3              1829190    334184   1397412      20% /var
 /dev/sdc1            2147341696 349228656 1798113040      17% /data3
 /dev/sde1            2147341696 2147312400     29296     100% /data4
 /dev/sdf1            1264642176 1264614164     28012     100% /data5
 /dev/sdd1            1267823104 1009684668 258138436      80% /hfo
 /dev/sda1            2147341696 2147311888     29808     100% /data1
 /dev/sdg1            1953520032 624438272 1329081760      32% /mnt
 /dev/sdb1            1267823104 657866300 609956804      52% /data2
 imadev:/home/wooledg   3686400   3336736    329184      92% /net/home/wooledg
 svr2:/dsk/4/dsk3/tool/i686Linux2.4.27-4-686  35194552   7856256  25550496      24% /net/appl/tool
 svr2:/dsk/4/dsk3/tool/share  35194552   7856256  25550496      24% /net/appl/tool-share

So, what can you actually do?

Use the -P switch. Even if it doesn't make everything 100% consistent, it generally doesn't hurt. According to the source code of df.c in Linux coreutils, the -P switch does ensure that the output will be on a single line (but that's only for Linux).
Set your locale to C. You don't need non-English column headers complicating the picture.
Consider using "stat --filesystem --format=", if it's available. If portability is not an issue in your case, check the man page of the "stat" command. On many systems you'll be able to print the blocksize, total number of blocks on the disk, and the number of free blocks; all in a user-specified format.
Explicitly select a file system. Don't use df -P | grep /dev/hda2 if you want the results for a specific file system. Give df a directory name or a device name as an argument so you only get that file system's output in the first place.
- ```
  ~$  df -P /
  Filesystem         1024-blocks      Used Available Capacity Mounted on
  /dev/sda2              8230432   3894360   3917984      50% /
```
Count words of output without respecting newlines. This is the workaround for lines being split unpredictably. For example, using a Bash array:
- ```
  ~$ read -d '' -ra df < <(LC_ALL=C df -P /); echo "${df[11]}"
  50%
```
As you can see, we simply slurped the entire output into a single array and then took the 12th word (array indices count from 0). We don't care whether the output got split or not, because that doesn't change the number of words.

Removing the % sign, comparing the number to a specified threshold, scheduling an automatic way to run the script, etc. are left as exercises for you.