[EnglishFrontPage] [TitleIndex] [WordIndex

BASH Frequently Asked Questions

Note: This version is the same as the BashFAQ page but with the full text of the faqs, it is much slower to load but it is easier to use if you want to download the whole FAQ, notably the printable version is handy is you need a local copy

These are answers to frequently asked questions on channel #bash on the freenode IRC network. These answers are contributed by the regular members of the channel (originally heiner, and then others including greycat and r00t), and by users like you. If you find something inaccurate or simply misspelled, please feel free to correct it!

All the information here is presented without any warranty or guarantee of accuracy. Use it at your own risk. When in doubt, please consult the man pages or the GNU info pages as the authoritative references.

BASH is a BourneShell compatible shell, which adds many new features to its ancestor. Most of them are available in the KornShell, too. The answers given in this FAQ may be slanted toward Bash, or they may be slanted toward the lowest common denominator Bourne shell, depending on who wrote the answer. In most cases, an effort is made to provide both a portable (Bourne) and an efficient (Bash, where appropriate) answer. If a question is not strictly shell specific, but rather related to Unix, it may be in the UnixFaq.

This FAQ assumes a certain level of familiarity with basic shell script syntax. If you're completely new to Bash or to the Bourne family of shells, you may wish to start with the (incomplete) BashGuide.

If you can't find the answer you're looking for here, try BashPitfalls. If you want to help, you can add new questions with answers here, or try to answer one of the BashOpenQuestions.

Chet Ramey's official Bash FAQ contains many technical questions not covered here.

Contents

  1. How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
    1. My text files are broken! They lack their final newlines!
    2. How to keep other commands from "eating" the input
  2. How can I store the return value/output of a command in a variable?
  3. How can I find the latest (newest, earliest, oldest) file in a directory?
  4. How can I check whether a directory is empty or not? How do I check for any *.mpg files, or count how many there are?
  5. How can I use array variables?
    1. Intro
    2. Loading values into an array
      1. Loading lines from a file or stream
      2. Reading NUL-delimited streams
      3. Appending to an existing array
    3. Retrieving values from an array
      1. Retrieving with modifications
    4. Using @ as a pseudo-array
  6. See Also
  7. How can I use variable variables (indirect variables, pointers, references) or associative arrays?
    1. Associative Arrays
      1. Associative array hacks in older shells
    2. Indirection
      1. Think before using indirection
      2. Evaluating indirect/reference variables
      3. Assigning indirect/reference variables
    3. See Also
  8. Is there a function to return the length of a string?
  9. How can I recursively search all files for a string?
  10. What is buffering? Or, why does my command line produce no output: tail -f logfile | grep 'foo bar' | awk ...
      1. Eliminate unnecessary commands
      2. Your command may already support unbuffered output
      3. unbuffer
      4. stdbuf
      5. less
      6. coproc
      7. Further reading
  11. How can I recreate a directory hierarchy structure, without the files?
  12. How can I print the n'th line of a file?
    1. See Also
  13. How do I invoke a shell command from a non-shell application?
  14. How can I concatenate two variables? How do I append a string to a variable?
  15. How can I redirect the output of multiple commands at once?
  16. How can I run a command on all files with the extension .gz?
  17. How can I use a logical AND/OR/NOT in a shell pattern (glob)?
  18. How can I group expressions in an if statement, e.g. if (A AND B) OR C?
  19. How can I use numbers with leading zeros in a loop, e.g. 01, 02?
  20. How can I split a file into line ranges, e.g. lines 1-10, 11-20, 21-30?
  21. How can I find and deal with file names containing newlines, spaces or both?
  22. How can I replace a string with another string in a variable, a stream, a file, or in all the files in a directory?
    1. Variables
    2. Streams
    3. Files
  23. How can I calculate with floating point numbers instead of just integers?
  24. I want to launch an interactive shell that has special aliases and functions, not the ones in the user's ~/.bashrc.
  25. I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?
    1. Workarounds
  26. How can I access positional parameters after $9?
  27. How can I randomize (shuffle) the order of lines in a file? (Or select a random line from a file, or select a random file from a directory.)
  28. How can two unrelated processes communicate?
    1. A file
    2. A directory as a lock
    3. Signals
    4. Named Pipes
  29. How do I determine the location of my script? I want to read some config files from the same place.
  30. How can I display the target of a symbolic link?
  31. How can I rename all my *.foo files to *.bar, or convert spaces to underscores, or convert upper-case file names to lower case?
  32. What is the difference between test, [ and [[ ?
    1. Theory
  33. How can I redirect the output of 'time' to a variable or file?
  34. How can I find a process ID for a process given its name?
    1. greycat rant: daemon management
  35. Can I do a spinner in Bash?
  36. How can I handle command-line arguments (options) to my script easily?
    1. Manual loop
    2. getopts
    3. Silly repeated brute-force scanning
    4. Complex nonstandard add-on utilities
  37. How can I get all lines that are: in both of two files (set intersection) or in only one of two files (set subtraction).
  38. How can I print text in various colors?
    1. Discussion
  39. How do Unix file permissions work?
  40. What are all the dot-files that bash reads?
  41. How do I use dialog to get input from the user?
  42. How do I determine whether a variable contains a substring?
  43. How can I find out if a process is still running?
  44. Why does my crontab job fail? 0 0 * * * some command > /var/log/mylog.`date +%Y%m%d`
  45. How do I create a progress bar? How do I see a progress indicator when copying/moving files?
    1. When copying/moving files
  46. How can I ensure that only one instance of a script is running at a time (mutual exclusion)?
    1. Discussion
  47. I want to check to see whether a word is in a list (or an element is a member of a set).
  48. Bulk comparison
  49. How can I redirect stderr to a pipe?
  50. Eval command and security issues
    1. Examples of bad use of eval
    2. Examples of good use of eval
    3. Alternatives to eval
    4. Robust eval usage
  51. How can I view periodic updates/appends to a file? (ex: growing log file)
  52. I'm trying to put a command in a variable, but the complex cases always fail!
    1. I'm trying to save a command so I can run it later without having to repeat it each time
    2. I'm constructing a command based on information that is only known at run time
    3. I want to generalize a task, in case the low-level tool changes later
    4. I want a log of my script's actions
  53. I want history-search just like in tcsh. How can I bind it to the up and down keys?
  54. How do I convert a file from DOS format to UNIX format (remove CRs from CR-LF line terminators)?
  55. I have a fancy prompt with colors, and now bash doesn't seem to know how wide my terminal is. Lines wrap around incorrectly.
  56. How can I tell whether a variable contains a valid number?
    1. Hand parsing
    2. Using the parsing done by [ and printf (or "using eq")
    3. Using the integer type
  57. Tell me all about 2>&1 -- what's the difference between 2>&1 >foo and >foo 2>&1, and when do I use which?
    1. If you're still confused...
  58. How can I untar (or unzip) multiple tarballs at once?
  59. How can I group entries (in a file by common prefixes)?
  60. Can bash handle binary data?
  61. I saw this command somewhere: :(){ :|:& } (fork bomb). How does it work?
  62. I'm trying to write a script that will change directory (or set a variable), but after the script finishes, I'm back where I started (or my variable isn't set)!
  63. Is there a list of which features were added to specific releases (versions) of Bash?
  64. How do I create a temporary file in a secure manner?
  65. My ssh client hangs when I try to logout after running a remote background job!
  66. Why is it so hard to get an answer to the question that I asked in #bash?
  67. Is there a "PAUSE" command in bash like there is in MSDOS batch scripts? To prompt the user to press any key to continue?
  68. I want to check if [[ $var == foo || $var == bar || $var == more ]] without repeating $var n times.
  69. How can I trim leading/trailing white space from one of my variables?
  70. How do I run a command, and have it abort (timeout) after N seconds?
  71. I want to automate an ssh (or scp, or sftp) connection, but I don't know how to send the password....
  72. How do I convert Unix (epoch) times to human-readable values?
  73. How do I convert an ASCII character to its decimal (or hexadecimal) value and back?
    1. More complete examples (with UTF-8 support)
      1. Note about Ext Ascii and UTF-8 encoding
  74. How can I ensure my environment is configured for cron, batch, and at jobs?
  75. How can I use parameter expansion? How can I get substrings? How can I get a file without its extension, or get just a file's extension?
    1. Examples of Filename Manipulation
    2. Bash 4
    3. Parameter Expansion on Arrays
    4. Portability
  76. How do I get the effects of those nifty Bash Parameter Expansions in older shells?
  77. How do I use 'find'? I can't understand the man page at all!
  78. How do I get the sum of all the numbers in a column?
    1. BASH Alternatives
  79. How do I log history or "secure" bash against history removal?
  80. I want to set a user's password using the Unix passwd command, but how do I script that? It doesn't read standard input!
    1. Construct your own hashed password and write it to some file
    2. Fool the computer into thinking you are a human
    3. Find some magic system-specific tool
    4. Don't rely on /dev/tty for security
  81. How can I grep for lines containing foo AND bar, foo OR bar? Or for files containing foo AND bar, possibly on separate lines?
    1. foo AND bar on the same line
    2. foo OR bar on the same line
    3. foo AND bar in the same file, not necessarily on the same line
  82. How can I make an alias that takes an argument?
  83. How can I determine whether a command exists anywhere in my PATH?
  84. Why is $(...) preferred over `...` (backticks)?
    1. Important differences
    2. Other advantages
    3. See also:
  85. How do I determine whether a variable is already defined? Or a function?
  86. How do I return a string (or large number, or negative number) from a function? "return" only lets me give a number from 0 to 255.
  87. How to write several times to a fifo without having to reopen it?
  88. How to ignore aliases or functions when running a command?
  89. How can I get a file's permissions (or other metadata) without parsing ls -l output?
  90. How can I avoid losing any history lines?
    1. Archiving History Files
  91. I'm reading a file line by line and running ssh or ffmpeg, but everything after the first line is eaten!
  92. How do I prepend a text to a file (the opposite of >>)?
  93. I'm trying to get the number of columns or lines of my terminal but the variables COLUMNS / LINES are always empty
  94. How do I write a CGI script that accepts parameters?
    1. The Dangerous Way
    2. A Safer Way
    3. Associative Arrays
  95. How can I set the contents of my terminal's title bar?
  96. I want to get an alert when my disk is full (parsing df output).
  97. I'm getting "Argument list too long". How can I process a large list in chunks?
  98. ssh eats my word boundaries! I can't do ssh remotehost make CFLAGS="-g -O"!
  99. How do I determine whether a symlink is dangling (broken)?
  100. How to add localization support to your bash scripts
    1. First, some variables you must understand
    2. Marking strings as translatable
    3. Generating and/or merging PO files
    4. Translate the strings
    5. Install MO files
    6. Test!
    7. References
  101. How can I get the newest (or oldest) file from a directory?
  102. How do I do string manipulations in bash?
    1. Parameter expansion syntax
    2. Length of a string
    3. Checking for substrings
    4. Substituting part of a string
    5. Removing part of a string
    6. Extracting parts of strings
    7. Splitting a string into fields
    8. Joining fields together
    9. Default or alternate values
    10. See Also
  103. Common utility functions (warn, die)
  104. How to get the difference between two dates
  105. How do I check whether my file was modified in a certain month or date range?
  106. Why doesn't foo=bar echo "$foo" print bar?
  107. Why doesn't set -e (or set -o errexit, or trap ERR) do what I expected?
  108. I want to tee my stdout to a log file from inside the script. And maybe stderr too.
  109. How do I add a timestamp to every line of a stream?
  110. How do I wait for several spawned processes?
  111. How can I tell whether my script was sourced (dotted in) or executed?

How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?

Don't try to use "for". Use a while loop and the read command:

    while read -r line
    do
        echo "$line"
    done < "$file"

The -r option to read prevents backslash interpretation (usually used as a backslash newline pair, to continue over multiple lines). Without this option, any backslashes in the input will be discarded. You should always use the -r option with read.

line is a variable name, chosen by you. You can use any valid shell variable name there.

The redirection < "$file" tells the while loop to read from the file whose name is in the variable file. If you would prefer to use a literal pathname instead of a variable, you may do that as well. If your input source is the script's standard input, then you don't need any redirection at all.

If your input source is the contents of a variable/parameter, BASH can iterate over its lines using a "here string":

    while read -r line; do
        echo "$line"
    done <<< "$var"

The same can be done in any Bourne-type shell by using a "here document" (although read -r is POSIX, not Bourne):

    while read -r line; do
        echo "$line"
    done <<EOF
$var
EOF

If avoiding comments starting with # is desired, you can simply skip them inside the loop:

    # Bash
    while read -r line
    do
        [[ $line = \#* ]] && continue
        echo "$line"
    done < "$file"

If you want to operate on individual fields within each line, you may supply additional variables to read:

    # Input file has 3 columns separated by white space.
    while read -r first_name last_name phone; do
      ...
    done < "$file"

If the field delimiters are not whitespace, you can set IFS (internal field separator):

    while IFS=: read -r user pass uid gid gecos home shell; do
      ...
    done < /etc/passwd

For tab-delimited files, use IFS=$'\t'.

You do not necessarily need to know how many fields each line of input contains. If you supply more variables than there are fields, the extra variables will be empty. If you supply fewer, the last variable gets "all the rest" of the fields after the preceding ones are satisfied. For example,

    read -r first last junk <<< 'Bob Smith 123 Main Street Elk Grove Iowa 123-555-6789'

    # first will contain "Bob", and last will contain "Smith".
    # junk holds everything else.

Some people use the throwaway variable _ as a "junk variable" to ignore fields. It (or indeed any variable) can also be used more than once in a single read command, if we don't care what goes into it:

    read -r _ _ first middle last _ <<< "$record"

    # We skip the first two fields, then read the next three.
    # Remember, the final _ can absorb any number of fields.
    # It doesn't need to be repeated there.

The read command modifies each line read; by default it removes all leading and trailing whitespace characters (spaces and tabs, or any whitespace characters present in IFS). If that is not desired, the IFS variable has to be cleared:

    # Exact lines, no trimming
    while IFS= read -r line
    do
        printf '%s\n' "$line"
    done < "$file"

One may also read from a command instead of a regular file:

    some command | while read -r line; do
       other commands
    done

This method is especially useful for processing the output of find with a block of commands:

    find . -type f -print0 | while IFS= read -r -d '' file; do
        mv "$file" "${file// /_}"
    done

This reads one filename at a time from the find command and renames the file, replacing spaces with underscores.

Note the usage of -print0 in the find command, which uses NUL bytes as filename delimiters; and -d '' in the read command to instruct it to read all text into the file variable until it finds a NUL byte. By default, find and read delimit their input with newlines; however, since filenames can potentially contain newlines themselves, this default behaviour will split up those filenames at the newlines and cause the loop body to fail. Additionally it is necessary to set IFS to an empty string, because otherwise read would still strip leading and trailing whitespace. See FAQ #20 for more details.

Using a pipe to send find's output into a while loop places the loop in a SubShell and may therefore cause problems later on if the commands inside the body of the loop attempt to set variables which need to be used after the loop; in that case, see FAQ 24, or use a ProcessSubstitution like:

    while read -r line; do
        other commands
    done < <(some command)

If you want to read lines from a file into an array, see FAQ 5.

1. My text files are broken! They lack their final newlines!

If there are some characters after the last line in the file (or to put it differently, if the last line is not terminated by a newline character), then read will read it but return false, leaving the broken partial line in the read variable(s). You can process this after the loop:

    # Emulate cat
    while IFS= read -r line
    do
        printf '%s\n' "$line"
    done < "$file"
    [ -n "$line" ] && printf %s "$line"

Or:

    # This does not work:
    printf 'line 1\ntruncated line 2' | while read -r line; do echo $line; done

    # This does not work either:
    printf 'line 1\ntruncated line 2' | while read -r line; do echo "$line"; done; [[ $line ]] && echo -n "$line"

    # This works:
    printf 'line 1\ntruncated line 2' | (while read -r line; do echo "$line"; done; [[ $line ]] && echo "$line")

For a discussion of why the second example above does not work as expected, see FAQ #24.

2. How to keep other commands from "eating" the input

Some commands greedily eat up all available data on standard input. The examples above do not take precautions against such programs. For example,

    while read -r line
    do
        cat > ignoredfile
        echo "$line"
    done < "$file"

will only print the contents of the first line, with the remaining contents going to "ignoredfile", as cat slurps up all available input.

One workaround is to use a numeric FileDescriptor rather than standard input:

    # Bash
    while read -r -u9 line
    do
        cat > ignoredfile
        echo "$line"
    done 9< "$file"

Or:

    # Bourne
    exec 9< "$file"
    while read line <&9
    do
      ...
    done
    exec 9<&-

This example will wait for the user to type something into the file ignoredfile at each iteration instead of eating up the loop input.

You might need this, for example, with mencoder which will accept user input if there is any, but will continue silently if there isn't. Other commands that act this way include ssh and ffmpeg. Additional workarounds for this are discussed in FAQ #89.


CategoryShell

How can I store the return value/output of a command in a variable?

Well, that depends on whether you want to store the command's output (either stdout, or stdout + stderr) or its exit status (0 to 255, with 0 typically meaning "success").

If you want to capture the output, you use command substitution:

    output=$(command)      # stdout only; stderr remains uncaptured
    output=$(command 2>&1) # both stdout and stderr will be captured

If you want the exit status, you use the special parameter $? after running the command:

    command
    status=$?

If you want both:

    output=$(command)
    status=$?

The assignment to output has no effect on command's exit status, which is still in $?.

If you don't actually want to store the exit status, but simply want to take an action upon success or failure, just use if:

    if command; then
        echo "it succeeded"
    else
        echo "it failed"
    fi

Or if you want to capture stdout as well as taking action on success/failure, without explicitly storing or checking $?:

    if output=$(command); then
        echo "it succeeded"
    ...

What if you want the exit status of one command from a pipeline? If you want the last command's status, no problem -- it's in $? just like before. If you want some other command's status, use the PIPESTATUS array (BASH only). Say you want the exit status of grep in the following:

    grep foo somelogfile | head -5
    status=${PIPESTATUS[0]}

Bash 3.0 added a pipefail option as well, which can be used if you simply want to take action upon failure of the grep:

    set -o pipefail
    if ! grep foo somelogfile | head -5; then
        echo "uh oh"
    fi

Now, some trickier stuff. Let's say you want only the stderr, but not stdout. Well, then first you have to decide where you do want stdout to go:

    output=$(command 2>&1 >/dev/null)  # Save stderr, discard stdout.
    output=$(command 2>&1 >/dev/tty)   # Save stderr, send stdout to the terminal.
    output=$(command 3>&2 2>&1 1>&3-)  # Save stderr, send stdout to script's stderr.

It's possible, although considerably harder, to let stdout "fall through" to wherever it would've gone if there hadn't been any redirection. This involves "saving" the current value of stdout, so that it can be used inside the command substitution:

    exec 3>&1                    # Save the place that stdout (1) points to.
    output=$(command 2>&1 1>&3)  # Run command.  stderr is captured.
    exec 3>&-                    # Close FD #3.

    # Or this alternative, which captures stderr, letting stdout through:
    { output=$(command 2>&1 1>&3-) ;} 3>&1

In the last example above, note that 1>&3- duplicates FD 3 and stores a copy in FD 1, and then closes FD 3. It could also be written 1>&3 3>&-.

What you cannot do is capture stdout in one variable, and stderr in another, using only FD redirections. You must use a temporary file (or a named pipe) to achieve that one.

Well, you can use a horrible hack like:

   result=$( { stdout=$(cmd) ; } 2>&1; echo "this line is the separator"; echo "$stdout")
   var_out=${result#*this line is the separator$'\n'}
   var_err=${result%$'\n'this line is the separator*}

Obviously, this is not robust, because either the standard output or the standard error of the command could contain whatever separator string you employ.

And if you want the exit code of your cmd (here a modification in the case of if the cmd stdout nothing)

   cmd() { curl -s -v http://www.google.fr; }

   result=$( { stdout=$(cmd); returncode=$?; } 2>&1; echo -n "this is the separator"; echo "$stdout"; exit $returncode)
   returncode=$?

   var_out=${result#*this is the separator}
   var_err=${result%this is the separator*}

Note: the original question read, "How can I store the return value of a command in a variable?" This was, verbatim, an actual question asked in #bash, ambiguity and all.


CategoryShell

How can I find the latest (newest, earliest, oldest) file in a directory?

The tempting solution is to use ls to output sorted filenames and take the first result. As usual, the ls approach cannot be made robust and should never be used in scripts due in part to the possibility of arbitrary characters (including newlines) present in filenames. Therefore, we need some other way to compare file metadata.

The most common requirement is to get the most or least recently modified files in a directory. Bash and all ksh variants can compare modification times (mtime) using the -nt and -ot operators of the conditional expression compound command:

# Bash/ksh
unset -v latest
for file in "$dir"/*; do
  [[ $file -nt $latest ]] && latest=$file
done

Or to find the oldest:

# Bash/ksh
unset -v oldest
for file in "$dir"/*; do
  [[ -z $oldest || $file -ot $oldest ]] && oldest=$file
done

Keep in mind that mtime on directories is that of the most recently added, removed, or renamed file in that directory. Also note that -nt and -ot are not specified by POSIX test, however many shells such as dash include them anyway. No bourne-like shell has analogous operators for comparing by atime or ctime, so one would need external utilities for that; however, it's nearly impossible to either produce output which can be safely parsed, or handle said output in a shell without using nonstandard features on both ends.

If the sorting criteria are different from "oldest or newest file by mtime", then GNU find and GNU sort may be used together to produce a sorted list of filenames + timestamps, delimited by NUL characters. This will of course operate recursively (GNU find's -maxdepth operator can prevent that); Here are a few possibilities, which can be modified as necessary to use atime or ctime, or to sort in reverse order:

# Bash + GNU find + GNU sort (To the precision possible on the given OS, but returns only one result)
IFS= read -r -d '' latest \
  < <(find "$dir" -type f -printf '%T@ %p\0' | sort -znr)
latest=${latest#* }   # remove timestamp + space

# GNU find + Bash w/ arrays (To the nearest 1s, using an undocumented "find -printf" format (%Ts).)
while IFS= read -rd '' 'latest[$(read -rd "" y; echo $y)]'
    do :
done < <(find "$dir" -type f -printf '%p\0%Ts\0')
latest=${latest[-1]}

# GNU stat + Bash /w arrays (non-recursive w/o globstar, to the nearest 1s)
while IFS= read -rd '' 'latest[$(read -rd "" y; echo $y)]'
    do :
done < <(stat '--printf=%n\0%Y\0' "$dir"/*)
latest=${latest[-1]}

One disadvantage to these approaches is that the entire list is sorted, whereas simply iterating through the list to find the minimum or maximum timestamp (assuming we want just one file) would be faster, however, depending on the size of the job the algorithmic disadvantage of sorting may be negligible in comparison to the overhead of using a shell.

# Bash + GNU find
unset -v latest time
while IFS= read -r -d '' line; do
  t=${line%% *} t=${t%.*}   # truncate fractional seconds
  ((t > time)) && { latest=${line#* } time=$t; }
done < <(find "$dir" -type f -printf '%T@ %p\0')

Lastly, here's a more verbose variant for use in a library or .bashrc which can either return a result or assign directly to a variable:

latest() {
    if [[ $FUNCNAME == ${FUNCNAME[1]} ]]; then
        unset -v x latest
        printf ${2:+'-v' "$2"} '%s' "$1"
        return
    fi

    if (($# > 2)); then
        echo $'Usage: latest <glob> <varname>\nError: Takes at most 2 arguments. Glob defaults to *'
        return 1
    fi >&2

    if ! shopt -q nullglob; then
        trap 'shopt -u nullglob; trap - RETURN' RETURN
        shopt -s nullglob
    fi

    local x latest

    for x in ${1-*}; do
        [[ -d $x || $x -nt $latest ]] && latest=$x
    done

    latest "$latest" ${2+"$2"}
}

Readers who are asking this question in order to rotate their log files may wish to look into logrotate(1) instead, if their operating system provides it.


CategoryShell

How can I check whether a directory is empty or not? How do I check for any *.mpg files, or count how many there are?

In Bash, you can do this safely and easily with the nullglob and dotglob options (which change the behaviour of globbing), and an array:

    # Bash
    shopt -s nullglob dotglob
    files=(*)
    (( ${#files[*]} )) || echo directory is empty
    shopt -u nullglob dotglob

Of course, you can use any glob you like instead of *. E.g. *.mpg or /my/music/*.mpg works fine.

Bear in mind that you need read permission on the directory, or it will always appear empty.

Some people dislike nullglob because having unmatched globs vanish altogether confuses programs like ls. Mistyping ls *.zip as ls *.zpi may cause every file to be displayed (for such cases consider setting failglob). Setting nullglob in a SubShell avoids accidentally changing its setting in the rest of the shell, at the price of an extra fork(). If you'd like to avoid having to set and unset shell options, you can pour it all into a SubShell:

    # Bash
    if (shopt -s nullglob dotglob; f=(*); ((! ${#f[@]}))); then
        echo "The current directory is empty."
    fi

The other disadvantage of this approach (besides the extra fork()) is that the array is lost when the subshell exits. If you planned to use those filenames later, then they have to be retrieved all over again.

Both of these examples expand a glob and store the resulting filenames into an array, and then check whether the number of elements in the array is 0. If you actually want to see how many files there are, just print the array's size instead of checking whether it's 0:

    # Bash
    shopt -s nullglob dotglob
    files=(*)
    echo "The current directory contains ${#files[@]} things."

You can also avoid the nullglob if you're OK with putting a non-existing filename in the array should no files match (instead of an empty array):

    # Bash
    shopt -s dotglob
    files=(*)
    if [[ -e ${files[0]} || -L ${files[0]} ]]; then
        echo "The current directory is not empty.  It contains:"
        printf '%s\n' "${files[@]}"
    fi

Without nullglob, if there are no files in the directory, the glob will be added as the only element in the array. Since * is a valid filename, we can't simply check whether the array contains a literal *. So instead, we check whether the thing in the array exists as a file. The -L test is required because -e fails if the first file is a dangling symlink.

If your script needs to run with various non-Bash shell implementations, you can try using an external program like python, perl, or find; or you can try one of these:

    # POSIX
    # Clobbers the positional parameters, so make sure you don't need them.
    set -- *
    if test -e "$1" || test -L "$1"; then
        echo "directory is non-empty"
    fi

At this stage, the positional parameters have been loaded with the contents of the directory, and can be used for processing.

In the Bourne shell, it's even worse, because there is no test -e or test -L:

    # Bourne
    # (Of course, the system must have printf(1).)
    if test "`printf '%s %s %s' .* *`" = '. .. *' && test ! -f '*'
    then
        echo "directory is empty"
    fi

Of course, that fails if * exists as something other than a plain file (such as a directory or FIFO). The absence of a -e test really hurts.

Never try to parse ls output. Even ls -A solutions can break (e.g. on HP-UX, if you are root, ls -A does the exact opposite of what it does if you're not root -- and no, I can't make up something that incredibly stupid).

In fact, one may wish to avoid the direct question altogether. Usually people want to know whether a directory is empty because they want to do something involving the files therein, etc. Look to the larger question. For example, one of these find-based examples may be an appropriate solution:

   # Bourne
   find "$somedir" -type f -exec echo Found unexpected file {} \;
   find "$somedir" -maxdepth 0 -empty -exec echo {} is empty. \;  # GNU/BSD
   find "$somedir" -type d -empty -exec cp /my/configfile {} \;   # GNU/BSD

Most commonly, all that's really needed is something like this:

    # Bourne
    for f in ./*.mpg; do
        test -f "$f" || continue
        mympgviewer "$f"
    done

In other words, the person asking the question may have thought an explicit empty-directory test was needed to avoid an error message like mympgviewer: ./*.mpg: No such file or directory when in fact no such test is required.

Support for a nullglob-like feature is inconsistent. In ksh93 it can be done on a per-pattern basis by prefixing with ~(N)1:

    # ksh93
    for f in ~(N)*; do
        ....
    done


CategoryShell

5. How can I use array variables?

This answer assumes you have a basic understanding of what arrays are in the first place. If you're new to this kind of programming, you may wish to start with the guide's explanation. This page is more detailed and thorough.

5.1. Intro

BASH and KornShell have one-dimensional arrays indexed by a numerical expression, e.g.:

# Bash
host=(mickey minnie goofy)
n=${#host[*]}
for ((i=0;i<n;i++)); do
    echo "host number $i is ${host[i]}"
done

The indexing always begins with 0, unless you specifically choose otherwise. The awkward expression ${#host[*]} or ${#host[@]} returns the number of elements for the array host. (We'll go into more detail on syntax below.)

Ksh93, Zsh and Bash 4.0 have Associative Arrays as well. These are not available in Bourne, ash, ksh88 or older bash shells and are not specified by POSIX.

POSIX and Bourne shells are not guaranteed to have arrays at all.

BASH and Korn shell arrays are sparse. Elements may be added and deleted out of sequence.

# Bash/ksh
arr[0]=0
arr[1]=1
arr[2]=2
arr[42]="what was the question?"
unset 'arr[2]'
echo "${arr[*]}"
# prints 0 1 what was the question?

You should try to write your code in such a way that it can handle sparse arrays, unless you know in advance that an array will never have holes.

5.2. Loading values into an array

Assigning one element at a time is simple, and portable:

# Bash/ksh
arr[0]=0
arr[42]='the answer'

It's possible to assign multiple values to an array at once, but the syntax differs across shells.

# Bash/ksh93
array=(zero one two three four)

# Korn
set -A array -- zero one two three four

When initializing in this way, the first index will be 0.

You can also initialize an array using a glob (see also NullGlob):

# Bash/ksh93
oggs=(*.ogg)

# Korn
set -A oggs -- *.ogg

or using a substitution of any kind:

# Bash
words=($sentence)
letters=({a..z})    # Bash 3.0 or higher

# Korn
set -A words -- $sentence

When the arrname=(...) syntax is used, any unquoted substitutions inside the parentheses undergo WordSplitting and glob expansion according to the regular shell rules. In the first example above, if any of the words in $sentence contain glob characters, filename expansion may occur.

set -f and set +f may be used to disable and re-enable glob expansion, respectively, so that words like * will not be expanded into filenames. In some scripts, set -f may be in effect already, and therefore running set +f may be undesirable. This is something you must manage properly yourself; there is no easy or elegant way to "store" the glob expansion switch setting and restore it later. (And don't try to say parsing the output of set -o is easy, because it's not.)

5.2.1. Loading lines from a file or stream

In bash 4, the mapfile command (also known as readarray) accomplishes this:

# Bash 4
mapfile -t lines < myfile

# or
mapfile -t lines < <(some command)

See ProcessSubstitution and FAQ #24 for more details on the <() syntax.

mapfile handles blank lines (it inserts them as empty array elements), and it also handles missing final newlines from the input stream. Both those things become problematic when reading data in other ways, as we shall see momentarily.

mapfile does have one serious drawback: it can only handle newlines as line terminators. It can't, for example, handle NUL-delimited files from find -print0.

When mapfile is not available, we have to work very hard to try to duplicate it. There are a great number of ways to almost get it right, but fail in subtle ways.

These examples will duplicate most of mapfile's basic functionality:

# Bash 2.04+, Ksh93
unset lines i
while IFS= read -r; do lines[i++]=$REPLY; done < <(your command)   # or < file
[[ $REPLY ]] && lines[i++]=$REPLY

# Ksh88
unset lines; i=0
while IFS= read -r; do lines[i]=$REPLY; i=$((i+1)); done < file
[ "$REPLY" ] && lines[i]=$REPLY i=$((i+1))

Now let's look at some simpler cases that fail, so you can see why we used such a complicated solution.

Some people might start out like this:

# These examples only work with certain kinds of input files.

# Bash
set -f; IFS=$'\n' lines=($(< myfile)); unset IFS; set +f

# Ksh
set -f; IFS='
'; set -A lines -- $(< myfile); unset IFS; set +f

That's a literal newline (and nothing else) between the single quotes in the Korn shell example.

We use IFS (setting it to a newline) because we want each line of input to become an array element, not each word.

However, relying on IFS WordSplitting causes issues if you have repeated whitespace delimiters, because they will be consolidated. E.g., a file with blank lines will have repeated newline characters. If you wanted the blank lines to be stored as empty array elements, IFS's behavior will backfire on you; the blank lines will disappear. There is no clean workaround for this other than to scrap the whole approach.

# bash
# \v is a vertical tab and rarely/never used, so we can use it to mark empty lines
# additionally bash won't collapse multiple \v .
# now empty lines are preserved in array as empty elements
set -f; IFS=$'\n\v' eval lines='( $(sed -re 's/^$/\v/' myfile) )'; set +f

A second approach would be to read the elements one by one, using a loop. This one does not work (with normal input; ironically, it works with some degenerate inputs):

# Does not work!
unset arr i
while IFS= read -r 'arr[i++]'; do :; done < file

Why doesn't it work? It puts a blank element at the end of the array, because the read -r arr[i++] is executed one extra time after the end of file. However, we'll revisit this approach later.

This one gets us much closer:

# Bash
unset arr i
while read -r; do arr[i++]=$REPLY; done < yourfile

# or
while read -r; do arr[i++]=$REPLY; done < <(your command)

The square brackets create a math context. Inside them, i++ works as a C programmer would expect. (That shortcut works in ksh93, but not in ksh88.)

This approach handles blank lines, but it fails if your file or stream is missing its final newline. So we need to handle that case specially:

# Bash
unset arr i
while read -r; do arr[i++]=$REPLY; done < <(your command)
# Append unterminated data line if there was one.
[[ $REPLY ]] && arr[i++]=$REPLY

This is the "final solution" we gave earlier, handling both blank lines inside the file, and an unterminated final line.

Our second try above (the read -r 'arr[i++]' one) works great if there's an unterminated line (since the array element is populated with the partial data before the exit status of read is checked). Unfortunately, it puts an empty element on the end of the array if the data stream is correctly terminated. So to fix that one, we need to remove the empty element after the loop:

# Bash
unset arr i
while IFS= read -r 'arr[i++]'; do :; done < <(your command)
# Remove trailing empty element, if any.
if [[ ${arr[i-1]} = "" ]]; then unset 'arr[--i]'; fi

This is also a working solution. Whether you prefer to read too many and then have to remove one, or read too few and then have to add one, is a personal choice.

NOTE: it is necessary to quote the 'arr[i++]' passed to read, so that the square brackets aren't interpreted as globs. This is also true for other non-keyword builtins that take a subscripted variable name, such as let and unset.

5.2.2. Reading NUL-delimited streams

If you are trying to deal with records that might have embedded newlines, you will be using an alternative delimiter such as the NUL character ( \0 ) to separate the records. In that case, you'll need to use the -d argument to read as well:

# Bash
unset arr i
while IFS= read -rd '' 'arr[i++]'; do :; done < <(find . -name '*.ugly' -print0)
if [[ ${arr[i-1]} = "" ]]; then unset 'arr[--i]'; fi

# or
while read -rd ''; do arr[i++]=$REPLY; done < <(find . -name '*.ugly' -print0)
[[ $REPLY ]] && arr[i++]=$REPLY

read -d '' tells Bash to keep reading until a NUL byte; normally it reads until a newline. There is no equivalent in Korn shell as far as we're aware.

5.2.3. Appending to an existing array

If you wish to append data to an existing array, there are several approaches. The most flexible is to keep a separate index variable:

# Bash/ksh93
arr[i++]="new item"

If you don't want to keep an index variable, but you happen to know that your array is not sparse, then you can use the highest existing index:

# Bash/ksh
# This will FAIL if the array has holes (is sparse).
arr[${#arr[*]}]="new item"

If you don't know whether your array is sparse or not, but you don't mind re-indexing the entire array (and also being very slow), then you can use:

# Bash
arr=("${arr[@]}" "new item")

# Ksh
set -A arr -- "${arr[@]}" "new item"

If you're in bash 3.1 or higher, then you can use the += operator:

# Bash 3.1
arr+=("new item")

NOTE: the parentheses are required, just as when assigning to an array. (Or you will end up appending to ${arr[0]} which $arr is a synonym for.)

For examples of using arrays to hold complex shell commands, see FAQ #50 and FAQ #40.

5.3. Retrieving values from an array

${#arr[*]} or ${#arr[@]} gives the number of elements in an array:

# Bash
shopt -s nullglob
oggs=(*.ogg)
echo "There are ${#oggs[*]} Ogg files."

* is reported to be quicker than @ when testing on Bash 3. * and @ both seem to work at the same speed when testing on Bash 4.1.

Single elements are retrieved by index:

echo "${foo[0]} - ${bar[j+1]}"

The square brackets are a math context. Arithmetic can be done there, and parameter expansions are done even without $.

Using array elements en masse is one of the key features of shell arrays. In exactly the same way that "$@" is expanded for positional parameters, "${arr[@]}" is expanded to a list of words, one array element per word. For example,

# Korn/Bash
for x in "${arr[@]}"; do
  echo "next element is '$x'"
done

This works even if the elements contain whitespace. You always end up with the same number of words as you have array elements.

If one simply wants to dump the full array, one element per line, this is the simplest approach:

# Bash/ksh
printf "%s\n" "${arr[@]}"

For slightly more complex array-dumping, "${arr[*]}" will cause the elements to be concatenated together, with the first character of IFS (or a space if IFS isn't set) between them. As it happens, "$*" is expanded the same way for positional parameters.

# Bash
arr=(x y z)
IFS=/; echo "${arr[*]}"; unset IFS
# prints x/y/z

Unfortunately, you can't put multiple characters in between array elements using that syntax. You would have to do something like this instead:

# Bash/ksh
arr=(x y z)
tmp=$(printf "%s<=>" "${arr[@]}")
echo "${tmp%<=>}"    # Remove the extra <=> from the end.
# prints x<=>y<=>z

BASH 3.0 added the ability to retrieve the list of index values in an array, rather than just iterating over the elements:

# Bash 3.0 or higher
arr=(0 1 2 3) arr[42]='what was the question?'
unset 'arr[2]'
echo "${!arr[@]}"
# prints 0 1 3 42

Retrieving the indices is extremely important in certain kinds of tasks, such as maintaining parallel arrays with the same indices (a cheap way to mimic having an array of structs in a language with no struct):

# Bash 3.0 or higher
unset file title artist i
for f in ./*.mp3; do
  file[i]=$f
  title[i]=$(mp3info -p %t "$f")
  artist[i++]=$(mp3info -p %a "$f")
done

# Later, iterate over every song.
# This works even if the arrays are sparse, just so long as they all have
# the SAME holes.
for i in "${!file[@]}"; do
  echo "${file[i]} is ${title[i]} by ${artist[i]}"
done

5.3.1. Retrieving with modifications

Bash's Parameter Expansions may be performed on array elements en masse:

# Bash
arr=(abc def ghi jkl)
echo "${arr[@]#?}"          # prints bc ef hi kl
echo "${arr[@]/[aeiou]/}"   # prints bc df gh jkl

Parameter Expansion can also be used to extract elements from an array. Some people call this slicing:

# Bash
echo "${arr[@]:1:3}"        # three elements starting at #1 (second element)
echo "${arr[@]:(-2)}"       # last two elements
echo "${@:(-1)}"            # last positional parameter
echo "${@:(-2):1}"          # second-to-last positional parameter

5.4. Using @ as a pseudo-array

As we see above, the @ array (the array of positional parameters) can be used almost like a regularly named array. This is the only array available for use in POSIX or Bourne shells. It has certain limitations: you cannot individually set or unset single elements, and it cannot be sparse. Nevertheless, it still makes certain POSIX shell tasks possible that would otherwise require external tools:

# POSIX
set -- *.mp3
if [ -e "$1" ]; then
  echo "there are $# MP3 files"
else
  echo "there are 0 MP3 files"
fi

# POSIX
...
# Add an option to our dynamically generated list of options
set -- "$@" -f "$somefile"
...
foocommand "$@"

(Compare to FAQ #50's dynamically generated commands using named arrays.)

6. See Also


CategoryShell

7. How can I use variable variables (indirect variables, pointers, references) or associative arrays?

This is a complex page, because it's a complex topic. It's been divided into roughly three parts: associative arrays, evaluating indirect variables, and assigning indirect variables. There are discussions of programming issues and concepts scattered throughout.

7.1. Associative Arrays

We introduce associative arrays first, because in the majority of cases where people are trying to use indirect variable assignments/evaluations, they ought to be using associative arrays instead. For instance, we frequently see people asking how they can have a buch of related variables like IPaddr_hostname1, IPaddr_hostname2 and so on. A more appropriate way to store this data would be in an associative array named IPaddr which is indexed by the hostname.

To map from one string to another, you need arrays indexed by a string instead of a number. These exists in AWK as "associative arrays", in Perl as "hashes", and in Tcl simply as "arrays". They also exist in ksh93, where you'd use them like this:

  •  # ksh93
     typeset -A homedir             # Declare ksh93 associative array
     homedir[jim]=/home/jim
     homedir[silvia]=/home/silvia
     homedir[alex]=/home/alex
    
     for user in "${!homedir[@]}"   # Enumerate all indices (user names)
     do
         echo "Home directory of user $user is ${homedir[$user]}"
     done

BASH supports them from version 4 and up:

  •  # Bash 4 and up
     declare -A homedir
     homedir[jim]=/home/jim
     # or
     homedir=( [jim]=/home/jim
               [silvia]=/home/silvia
               [alex]=/home/alex )
     ...

Prior to Bash 4 or if you can't use ksh93, your options are limited. Either move to another interpreter (awk, perl, python, ruby, tcl, ...) or re-evaluate your problem to simplify it.

There are certain tasks for which associative arrays are a powerful and completely appropriate tool. There are others for which they are overkill, or simply unsuitable.

Suppose we have several subservient hosts with slightly different configuration, and that we want to ssh to each one and run slightly different commands. One way we could set it up would be to hard-code a bunch of ssh commands in per-hostname functions in a single script and just run them in series or in parallel. (Don't reject this out of hand! Simple is good.) Another way would be to store each group of commands as an element of an associative array keyed by the hostname:

  •  source "$conf"
     for host in "${!commands[@]}"; do
         ssh "$host" "${commands[$host]}"
     done
    
     # Where "$conf" is a file like this:
     declare -A commands
     commands=( [host1]="mvn clean install && cd webapp && mvn jetty:run"
                [host2]="..."
     )

This is the kind of approach we'd expect in a high-level language, where we can store hierarchical information in advanced data structures. The difficulty here is that we really want each element of the associative array to be a list or another array of command strings. But the shell simply doesn't permit that kind of data structure.

So, often it pays to step back and think in terms of shells rather than other programming languages. Aren't we just running a script on a remote host? Then why don't we just store the configuration sets as scripts? Then it's simple:

  •  # A series of conf files named for the hosts we need to run our commands on:
     for conf in /etc/myapp/*; do
         host=${conf##*/}
         ssh "$host" bash < "$conf"
     done
    
     # /etc/myapp/hostname is just a script:
     mvn clean install &&
     cd webapp &&
     mvn jetty:run

Now we've removed the need for associative arrays, and also the need to maintain a bunch of extremely horrible quoting issues. It is also easy to parallelize using GNU Parallel:

  •  parallel ssh {/} bash "<" {} ::: /etc/myapp/*

7.1.1. Associative array hacks in older shells

Before you think of using eval to mimic associative arrays in an older shell (probably by creating a set of variable names like homedir_alex), try to think of a simpler or completely different approach that you could use instead. If this hack still seems to be the best thing to do, consider the following disadvantages:

  1. It's really hard to read, to keep track of, and to maintain.
  2. The variable names must match the RegularExpression ^[a-zA-Z_][a-zA-Z_0-9]* -- i.e., a variable name cannot contain arbitrary characters but only letters, digits, and underscores. We cannot have a variable's name contain Unix usernames, for instance -- consider a user named hong-hu. A dash '-' cannot be part of a variable name, so the entire attempt to make a variable named homedir_hong-hu is doomed from the start.

  3. Quoting is hard to get right. If a content string (not a variable name) can contain whitespace characters and quotes, it's hard to quote it right to preserve it through both shell parsings. And that's just for constants, known at the time you write the program. (Bash's printf %q helps, but nothing analogous is available in POSIX shells.)

  4. If the program handles unsanitized user input, it can be VERY dangerous!

Read BashGuide/Arrays or BashFAQ/005 for a more in-depth description and examples of how to use arrays in Bash.

If you need an associative array but your shell doesn't support them, please consider using AWK instead.

7.2. Indirection

7.2.1. Think before using indirection

Putting variable names or any other bash syntax inside parameters is generally a bad idea. It violates the separation between code and data, and as such puts you on a slippery slope toward bugs, security issues, etc. Even when you know you "got it right", because you "know and understand exactly what you're doing", bugs happen to all of us and it pays to respect separation practices to minimize the extent of damage they can cause.

Aside from that, it also makes your code non-obvious and non-transparent.

Normally, in bash scripting, you won't need indirect references at all. Generally, people look at this for a solution when they don't understand or know about Bash Arrays or haven't fully considered other Bash features such as functions.

7.2.2. Evaluating indirect/reference variables

BASH allows you to expand a parameter indirectly -- that is, one variable may contain the name of another variable:

  •  # Bash
     realvariable=contents
     ref=realvariable
     echo "${!ref}"   # prints the contents of the real variable

KornShell (ksh93) has a completely different, more powerful syntax -- the nameref command (also known as typeset -n):

  •  # ksh93
     realvariable=contents
     nameref ref=realvariable
     echo "$ref"      # prints the contents of the real variable

Unfortunately, for shells other than Bash and ksh93, there is no syntax for evaluating a referenced variable. You would have to use eval, which means you would have to undergo extreme measures to sanitize your data to avoid catastrophe.

It's difficult to imagine a practical use for this that wouldn't be just as easily performed by using an associative array. But people ask it all the time (it is genuinely a frequently asked question).

ksh93's nameref allows us to work with references to arrays, as well as regular scalar variables. For example,

  •  # ksh93
     myfunc() {
       nameref ref=$1
       echo "array $1 has ${#ref[*]} elements"
     }
     realarray=(...)
     myfunc realarray

We are not aware of any trick that can duplicate that functionality in POSIX or Bourne shells (short of using eval, which is extremely difficult to do securely). Bash can almost do it -- some indirect array tricks work, and others do not, and we do not know whether the syntax involved will remain stable in future releases. So, consider this a use at your own risk hack.

  •  # Bash -- trick #1.  Seems to work in bash 2 and up.
     realarray=(...) ref=realarray; index=2
     tmp="$ref[$index]"
     echo "${!tmp}"            # gives array element [2]
    
     # Bash -- trick #2.  Seems to work in bash 3 and up.
     # Does NOT work in bash 2.05b.
     tmp="$ref[@]"
     printf "<%s> " "${!tmp}"; echo    # Iterate whole array.

It is not possible to retrieve array indices directly using the Bash ${!var} indirect expansion.

7.2.3. Assigning indirect/reference variables

Sometimes you'd like to "point" from one variable to another, for purposes of writing information to a dynamically configurable place. Typically this happens when you're trying to write a "reusable" function, and you want it to put its output in a variable of the caller's choice instead of the function's choice. (Reusability of shell functions is dubious at best, so this is something that should not happen often.)

Assigning a value "through" a reference (or pointer, or indirect variable, or whatever you want to call it -- I'm going to use "ref" from now on) is more widely possible, but the means of doing so are extremely shell-specific.

Before we begin, we must point out that you must control the value of the ref. That is, you should only use a ref whose value you assign within a program, or from trusted input. If an end user can populate the ref variable with arbitrary strings, the result can be unexpected code injection. We'll show an example of this at the end.

In ksh93, we can just use nameref again:

  •  # ksh93
     nameref ref=realvariable
     ref="contents"
     # realvariable now contains the string "contents"

In Bash, we can use read and Bash's here string syntax:

  •  # Bash
     ref=realvariable
     IFS= read -r $ref <<< "contents"
     # realvariable now contains the string "contents"

However, this only works if there are no newlines in the content. If you need to assign multiline values, keep reading.

A similar trick works for Bash array variables too:

  •  # Bash
     aref=realarray
     read -r -a $aref <<< "words go into array elements"
     echo "${realarray[1]}"   # prints "go"

(Again, newlines in the input will break this trick. IFS is used to delimit words, so you may or may not need to set that.)

Another trick is to use Bash's printf -v (only available in recent versions):

  •  # Bash 3.1 or higher
     ref=realvariable
     printf -v $ref %s "contents"

The printf -v trick is handy if your contents aren't a constant string, but rather, something dynamically generated. You can use all of printf's formatting capabilities. This trick also permits any string content, including embedded newlines (but not NUL bytes - no force in the universe can put NUL bytes into shell strings usefully). This is the best trick to use if you're in bash 3.1 or higher.

Yet another trick is Korn shell's typeset or Bash's declare. These are roughly equivalent to each other. Both of them cause a variable to become locally scoped to a function, if used inside a function; but if used outside a function, they can operate on global variables.

  •  # Korn shell (all versions):
     typeset $ref="contents"
    
     # Bash:
     declare $ref="contents"

Bash 4.2 adds declare -g which can put variables in the global context, even from inside a function.

The advantage of using typeset or declare over eval is that the right hand side of the assignment is not parsed by the shell. If you used eval here, you would have to sanitize/escape the entire right hand side first. This trick also preserves the contents exactly, including newlines, so this is the best trick to use if you're in bash older than 3.1 (or ksh88) and don't need to worry about accidentally changing your variable's scope (i.e., you're not using it inside a function).

However, with bash, you must still be careful about what is on the left-hand side of the assignment. Inside square brackets, expansions are still performed; thus, with a tainted ref, declare can be just as dangerous as eval:

  •  # Bash:
     ref='x[$(touch evilfile; echo 0)]'
     ls -l evilfile   # No such file or directory
     declare "$ref=value"
     ls -l evilfile   # It exists now!

This problem also exists with typeset in mksh and pdksh, but apparently not ksh93. This is why the value of ref must be under your control at all times.

If you aren't using Bash or Korn shell, you can do assignments to referenced variables using HereDocument syntax:

  •  # Bourne
     ref=realvariable
     read $ref <<EOF
     contents
     EOF

(Alas, read means we're back to only getting at most one line of content. This is the most portable trick, but it's limited to single-line content.)

Remember that, when using a here document, if the sentinel word (EOF in our example) is unquoted, then parameter expansions will be performed inside the body. If the sentinel is quoted, then parameter expansions are not performed. Use whichever is more convenient for your task.

Finally, some people just cannot resist throwing eval into the picture:

  •  # Bourne
     ref=myVar
     eval "$ref=\$value"

This expands to the statement that is executed:

  •  myVar=$value

The right-hand side is not parsed by the shell, so there is no danger of unwanted side effects. The drawback, here, is that every single shell metacharacter on the right hand side of the = must be escaped carefully. In the example shown here, there was only one. In a more complex situation, there could be dozens.

The good news is that if you can sanitize the right hand side correctly, this trick is fully portable, has no variable scope issues, and allows all content including newlines. The bad news is that if you fail to sanitize the right hand side correctly, you have a massive security hole. Use eval at your own risk.

7.3. See Also


CategoryShell

8. Is there a function to return the length of a string?

The fastest way, not requiring external programs (but not usable in Bourne shells):

# POSIX
${#varname}

or for Bourne shells:

# Bourne
expr "$varname" : '.*'

(expr prints the number of characters matching the pattern .*, which is the length of the string.)

or:

# Bourne, with GNU expr(1)
expr length "$varname"

(BSD/GNU expr only)

This second version is not specified in POSIX, so is not portable across all platforms. However, if $varname expands to "length", the first version will fail with BSD/GNU expr.

A portable way is:

expr \( "X$varname" : ".*" \) - 1

One may also use awk:

# Bourne
awk -v x="$varname" 'BEGIN {print length(x)}'

Though that one will fail for values of $varname that contain backslash characters, so you may prefer:

# Bourne with POSIX awk
awk  'BEGIN {print length(ARGV[1]);exit}' "$varname"


Similar needs:

# Korn/Bash
${#arrayname[@]}

Returns the number of elements in an array.

# Korn/Bash
${#arrayname[i]}

Returns the length of the array's element i.


CategoryShell

9. How can I recursively search all files for a string?

90% of the time, all you need is one of these:

# Recurse and print matching lines (GNU grep):
grep -r -- "$search" .

# Recurse and print only the filenames (GNU grep):
grep -r -l -- "$search" .

You can use find if your grep lacks a -r option, or if you want to avoid traversing symbolic links:

find . -type f -exec grep -l -- "$search" {} \;

The {} characters will be replaced with the current file name.

This command is slower than it needs to be, because find will call grep with only one file name, resulting in many grep invocations (one per file). Since grep accepts multiple file names on the command line, find can be instructed to call it with several file names at once:

find . -type f -exec grep -l -- "$search" {} +

The trailing '+' character instructs find to call grep with as many file names as possible, saving processes and resulting in faster execution. This example works for POSIX find, e.g. with Solaris, as well as very recent GNU find.

Traditional Unix has a helper program called xargs for the same purpose:

find . -type f | xargs grep -l -- "$search"

However, if your filenames contain spaces or other metacharacters, you'll need to use the BSD/GNU -print0 option:

find . -type f -print0 | xargs -0 grep -l -- "$search"

The -print0 / -0 options ensure that any file name can be processed, even one containing blanks, TAB characters, or newlines.


CategoryShell

10. What is buffering? Or, why does my command line produce no output: tail -f logfile | grep 'foo bar' | awk ...

Most standard Unix commands buffer their output when used non-interactively. This means that they don't write each character (or even each line) immediately, but instead collect a larger number of characters (often 4 kilobytes) before printing anything at all. In the case above, the grep command buffers its output, and therefore awk only gets its input in large chunks.

Buffering greatly increases the efficiency of I/O operations, and it's usually done in a way that doesn't visibly affect the user. A simple tail -f from an interactive terminal session works just fine, but when a command is part of a complicated pipeline, the command might not recognize that the final output is needed in (near) real time. Fortunately, there are several techniques available for controlling I/O buffering behavior.

The most important thing to understand about buffering is that it's the writer who's doing it, not the reader.

10.0.1. Eliminate unnecessary commands

In the question, we have the pipeline tail -f logfile | grep 'foo bar' | awk ... (with the actual AWK command being unspecified). There is no problem if we simply run tail -f logfile, because tail -f never buffers its output. Nor is there a problem if we run tail -f logfile | grep 'foo bar' interactively, because grep does not buffer its output if its standard output is a terminal. However, if the output of grep is being piped into something else (such as an AWK command), it starts buffering to improve efficiency.

In this particular example, the grep is actually redundant. We can remove it, and have AWK perform the filtering in addition to whatever else it's doing:

tail -f logfile | awk '/foo bar/ ...'

In other cases, this sort of consolidation may not be possible. But you should always look for the simplest solution first.

10.0.2. Your command may already support unbuffered output

Some programs provide special command line options specifically for this sort of problem:

grep (e.g. GNU version 2.5.1)

--line-buffered

sed (e.g. GNU version 4.0.6)

-u,--unbuffered

awk (some GNU versions)

-W interactive, or use the fflush() function

tcpdump, tethereal

-l

Each command that writes to a pipe would have to be told to disable buffering, in order for the entire pipeline to run in (near) real time. The last command in the pipeline, if it's writing to a terminal, will not typically need any special consideration.

10.0.3. unbuffer

The expect package has an unbuffer program which effectively tricks other programs into always behaving as if they were being used interactively (which may often disable buffering). Here's a simple example:

tail -f logfile | unbuffer grep 'foo bar' | awk ...

expect and unbuffer are not standard POSIX tools, but they may already be installed on your system.

10.0.4. stdbuf

Recent versions of GNU coreutils (from 7.5 onwards) come with a nice utility called stdbuf, which can be used among other things to "unbuffer" the standard output of a command. Here's the basic usage for our example:

tail -f logfile | stdbuf -oL grep 'foo bar' | awk ...

In the above code, "-oL" makes stdout line buffered; you can even use "-o0" to entirely disable buffering. The man and info pages have all the details.

stdbuf is not a standard POSIX tool, but it may already be installed on your system (if you're using a recent Linux distribution, it will probably be present).

10.0.5. less

If you simply wanted to highlight the search term, rather than filter out non-matching lines, you can use the less program instead of a filtered tail -f:

$ less program.log
  • Inside less, start a search with the '/' command (similar to searching in vi). Or start less with the -p pattern option.

  • This should highlight any instances of the search term.
  • Now put less into "follow" mode, which by default is bound to shift+f.

  • You should get an unfiltered tail of the specified file, with the search term highlighted.

"follow" mode is stopped with an interrupt, which is probably control+c on your system. The '/' command accepts regular expressions, so you could do things like highlight the entire line on which a term appears. For details, consult man less.

10.0.6. coproc

If you're using ksh or Bash 4.0+, whatever you're really trying to do with tail -f might benefit from using coproc and fflush() to create a coprocess. Note well that coproc does not itself address buffering issues (in fact it's prone to buffering problems -- hence the reference to fflush). coproc is only mentioned here because whenever someone is trying to continuously monitor and react to a still-growing file (or pipe), they might be trying to do something which would benefit from coprocesses.

10.0.7. Further reading


CategoryShell

11. How can I recreate a directory hierarchy structure, without the files?

With the cpio program:

 cd "$srcdir" &&
 find . -type d -print | cpio -pdumv "$dstdir"

or with the pax program:

 cd "$srcdir" &&
 find . -type d -print | pax -rwdv  "$dstdir"

or with zsh's special globbing:

 zsh -ec '
 cd -- "$srcdir"
 dirs=(**/*(ND))
 cd -- "$dstdir"
 mkdir -p -- $dirs'

or with GNU tar, and more verbose syntax:

 cd "$srcdir" &&
 find . -type d -print | tar c --files-from - --no-recursion |
   tar x --directory "$dstdir"

This creates a list of directory names with find, non-recursively adds just the directories to an archive, and pipes it to a second tar instance to extract it at the target location. As you can see, tar is the least suited to this task, but people just adore it, so it has to be included here to appease the tar fanboy crowd. (Note: you can't even do this at all with a typical Unix tar. Also note: there is no such thing as "standard tar", as both tar and cpio were intentionally omitted from POSIX in favor of pax.)

All but the zsh solution above will fail if directory names contain newline characters. On many modern BSD/GNU systems, at least, they can be trivially modified to cope with that, by using find -print0 and one of pax -0 or cpio -0 or tar --null (check your system documentation to see which of these commands you have, and which extensions are available).

If you want to create stub files instead of full-sized files, with GNU find(1), the following is likely to be the simplest solution. The find command recreates the regular files using "dummy" files (empty files with the same timestamps):

 cd "$srcdir" &&
 # use one of the above commands first, to make the directories, then:
 find . -type f -exec touch -r {} "$destination"/{} \;

Be aware, though, that according to POSIX, the behaviour of find is unspecified when {} is not standing alone in an argument. Because of this, the following solution is more portable (and probably faster...) than the previous:

 dstdir=whatever; export dstdir
 find . -type f -exec sh -c 'for i; do touch -r "$i" "$dstdir"/"$i"; done' _ {} +

If your find can't handle -exec + then you can use \; instead of + at the end of the command. See UsingFind for explanations.


CategoryShell

12. How can I print the n'th line of a file?

One dirty (but not quick) way is:

    sed -n ${n}p "$file"

But this reads the entire file even if only the third line is desired, which can be avoided by printing line $n using the "p" command, followed by a "q" to exit the script:

    sed -n "$n{p;q;}" "$file"

Another method is to grab the last line from a listing of the first n lines:

   head -n $n $file | tail -n 1

Another approach, using AWK:

   awk "NR==$n{print;exit}" file

If more than one line is needed, it's easy to adapt any of the previous methods:

   x=3 y=4
   sed -n "$x,${y}p;${y}q;" "$file"                # Print lines $x to $y; quit after $y.
   head -n $y "$file" | tail -n $((y - x + 1))     # Same
   head -n $y "$file" | tail -n +$x                # If your tail supports it
   awk "NR>=$x{print} NR==$y{exit}" "$file"        # Same

In Bash 4, a pure-bash solution can be achieved succinctly using the mapfile builtin. More than one line can be read into the array MAPFILE by adjusting the argument to mapfile's -n option:

   mapfile -ts $((n-1)) -n 1 < "$file"
   echo "${MAPFILE[0]}"

mapfile can also be used similarly to head while avoiding buffering issues in the event input is a pipe:

   { mapfile -n $n; head -n 1; } <"$file"

12.1. See Also


CategoryShell

13. How do I invoke a shell command from a non-shell application?

You can use the shell's -c option to run the shell with the sole purpose of executing a short bit of script:

    sh -c 'echo "Hi!  This is a short script."'

This is usually pretty useless without a means of passing data to it. The best way to pass bits of data to your shell is to pass them as positional arguments:

    sh -c 'echo "Hi! This short script was run with the arguments: $@"' -- "foo" "bar"

Notice the -- before the actual positional parameters. The first argument you pass to the shell process (that isn't the argument to the -c option) will be placed in $0. Positional parameters start at $1, so we put a little placeholder in $0. This can be anything you like; in the example, we use the generic --.

This technique is used often in shell scripting, when trying to have a non-shell CLI utility execute some bash code, such as with find(1):

    find /foo -name '*.bar' -exec bash -c 'mv "$1" "${1%.bar}.jpg"' -- {} \;

Here, we ask find to run the bash command for every *.bar file it finds, passing it to the bash process as the first positional parameter. The bash process runs the mv command after doing some Parameter Expansion on the first positional parameter in order to rename our file's extension from bar to jpg.

Alternatively, if your non-shell application allows you to set environment variables, you can do that, and then read them using normal variables of the same name.

Similarly, suppose a program (e.g. a file manager) lets you define an external command that an argument will be appended to, but you need that argument somewhere in the middle. In that case:

    #!/bin/sh
    sh -c 'command foo "$1" bar' -- "$@"


CategoryShell

14. How can I concatenate two variables? How do I append a string to a variable?

There is no (explicit) concatenation operator for strings (either literal or variable dereferences) in the shell; you just write them adjacent to each other:

    var=$var1$var2

If the right-hand side contains whitespace characters, it needs to be quoted:

    var="$var1 - $var2"

If you're appending a string that doesn't "look like" part of a variable name, you just smoosh it all together:

    var=$var1/.-

Otherwise, braces or quotes may be used to disambiguate the right-hand side:

    var=${var1}xyzzy
    # Without braces, var1xyzzy would be interpreted as a variable name

    var="$var1"xyzzy
    # Alternative syntax

CommandSubstitution can be used as well. The following line creates a log file name logname containing the current date, resulting in names like e.g. log.2004-07-26:

    logname="log.$(date +%Y-%m-%d)"

There's no difference when the variable name is reused, either. A variable's value (the string it holds) may be reassigned at will:

    string="$string more data here"

Bash 3.1 has a new += operator that you may see from time to time:

    string+=" more data here"     # EXTREMELY non-portable!

It's generally best to use the portable syntax.


CategoryShell

15. How can I redirect the output of multiple commands at once?

Redirecting the standard output of a single command is as easy as:

    date > file

To redirect standard error:

    date 2> file

To redirect both:

    date > file 2>&1

or, a fancier way:

    # Bash only.  Equivalent to date > file 2>&1 but non-portable.
    date &> file

Redirecting an entire loop:

    for i in $list; do
        echo "Now processing $i"
        # more stuff here...
    done > file 2>&1

However, this can become tedious if the output of many programs should be redirected. If all output of a script should go into a file (e.g. a log file), the exec command can be used:

    # redirect both standard output and standard error to "log.txt"
    exec > log.txt 2>&1
    # all output including stderr now goes into "log.txt"

Otherwise, command grouping helps:

    {
        date
        # some other commands
        echo done
    } > messages.log 2>&1

In this example, the output of all commands within the curly braces is redirected to the file messages.log.

More discussion

In-depth: Illustrated Tutorial


CategoryShell

16. How can I run a command on all files with the extension .gz?

Often a command already accepts several files as arguments, e.g.

    zcat -- *.gz

On some systems, you would use gzcat instead of zcat. If neither is available, or if you don't care to play guessing games, just use gzip -dc instead.

The -- prevents a filename beginning with a hyphen from causing unexpected results.

If an explicit loop is desired, or if your command does not accept multiple filename arguments in one invocation, the for loop can be used:

    # Bourne
    for file in ./*.gz
    do
        echo "$file"
        # do something with "$file"
    done

To do it recursively, use find:

    # Bourne
    find . -name '*.gz' -type f -exec do-something {} \;

If you need to process the files inside your shell for some reason, then read the find results in a loop:

    # Bash
    while IFS= read -r file; do
        echo "Now processing $file"
        # do something fancy with "$file"
    done < <(find . -name '*.gz' -print)

This example uses ProcessSubstitution (see also FAQ #24), although a pipe may also be suitable in many cases. However, it does not correctly handle filenames that contain newlines. To handle arbitrary filenames, see FAQ #20.


CategoryShell

17. How can I use a logical AND/OR/NOT in a shell pattern (glob)?

"Globs" are simple patterns that can be used to match filenames or strings. They're generally not very powerful. If you need more power, there are a few options available.

If you want to operate on all the files that match glob A or glob B, just put them both on the same command line:

rm -- *.bak *.old

If you want to use a logical OR in just part of a glob (larger than a single charcter -- for which square-bracketed character classes suffice), in Bash, you can use BraceExpansion:

rm -- *.{bak,old}

If you need something still more general/powerful, in KornShell or BASH you can use extended globs. In Bash, you'll need the extglob option to be set. It can be checked with:

shopt extglob

and set with:

shopt -s extglob

To warm up, we'll move all files starting with foo AND not ending with .d to directory foo_thursday.d:

mv foo!(*.d) foo_thursday.d

A more complex example -- delete all files containing Pink_Floyd AND not containing The_Final_Cut:

rm !(!(*Pink_Floyd*)|*The_Final_Cut*)

By the way: these kind of patterns can be used with the KornShell, too. They don't have to be enabled there, but are the default patterns.


CategoryShell

18. How can I group expressions in an if statement, e.g. if (A AND B) OR C?

The portable (POSIX or Bourne) way is to use multiple test (or [) commands:

    # Bourne
    if test A && test B || test C; then
    ...

The grouping is implicit in this case, because AND (&&) has a higher precedence than OR (||). If we need explicit grouping, then we can use curly braces:

    # Bourne(?)
    if test A && { test B || test C; }; then
    ...

What we should not do is try to use the -a or -o operators of the test command, because the results are undefined.

BASH and KornShell have different, more powerful comparison commands with slightly different (easier) quoting:

Examples:

    # Bash/ksh
    if (( (n>0 && n<10) || n == -1 ))
    then echo "0 < $n < 10, or n==-1"
    fi

or

    # Bash/ksh
    if [[ ( -f $localconfig && -f $globalconfig ) || -n $noconfig ]]
    then echo "configuration ok (or not used)"
    fi

Note that the distinction between numeric and string comparisons is strict. Consider the following example:

    n=3
    if [[ $n>0 && $n<10 ]]
    then echo "$n is between 0 and 10"
    else echo "ERROR: invalid number: $n"
    fi

The output will be "ERROR: ....", because in a string comparision "3" is bigger than "10", because "3" already comes after "1", and the next character "0" is not considered. Changing the square brackets to double parentheses (( makes the example work as expected.


CategoryShell

19. How can I use numbers with leading zeros in a loop, e.g. 01, 02?

As always, there are different ways to solve the problem, each with its own advantages and disadvantages.

Bash version 4 allows zero-padding and ranges in its BraceExpansion:

    # Bash 4
    echo {01..10}

    for i in {01..10}; do ...

All of the other solutions on this page will assume Bash earlier than 4.0, or a non-Bash shell.

If there are not many numbers, BraceExpansion can be used:

    # Bash
    for i in 0{1,2,3,4,5,6,7,8,9} 10
    do
        echo $i
    done

In Bash 3, you can use ranges inside brace expansion (but not zero-padding). Thus, the same thing can be accomplished more concisely like this:

    # Bash 3
    for i in 0{1..9} 10
    do
        echo $i
    done

Another example, for output of 0000 to 0034:

    # Bash 3
    for i in {000{0..9},00{10..34}}
    do
        echo $i
    done

    # using the outer brace instead of just adding them one next to the other
    # allows to use the expansion, for instance, like this:
    wget 'http://foo.com/adir/thepages'{000{0..9},00{10..34}}'.html'

Some may prefer the following quick & dirty solution (producing "001" through "015"):

    # Bash 3
    for i in {1000..1015}
    do
      echo "${i:1}"    # or "${i#1}"
    done

This gets tedious for large sequences, but there are other ways, too. If you have the printf command (which is a Bash builtin, and is also POSIX standard), it can be used to format a number:

    # Bash
    for ((i=1; i<=10; i++))
    do
        printf "%02d " "$i"
    done

Also, since printf will implicitly loop if given more arguments than format specifiers, you can simplify this enormously:

   # Bash 3
   printf "%03d\n" {1..300}

If you don't know in advance what the starting and ending values are:

   # Bash 3
   # start and end are variables containing integers
   eval printf '"%03d\n"' {$start..$end}

The eval is needed here because you cannot have variables in a brace expansion -- only constants. The extra quotes are required by the eval so that our \n isn't changed to an n. Given how messy that eval solution is, please give serious thought to using the for loop instead.

The KornShell has the typeset command to specify the number of leading zeros:

    # Korn
    $ typeset -Z3 i=4
    $ echo $i
    004

If the command seq(1) is available (it's part of GNU sh-utils/coreutils), you can use it as follows:

    seq -w 1 10

or, for arbitrary numbers of leading zeros (here: 3):

    seq -f "%03g" 1 10

Combining printf with seq(1), you can do things like this:

   # POSIX shell, GNU utilities
   printf "%03d\n" $(seq 300)

(That may be helpful if you are not using Bash, but you have seq(1), and your version of seq(1) lacks printf-style format specifiers. That's a pretty odd set of restrictions, but I suppose it's theoretically possible. Since seq is a nonstandard external tool, it's good to keep your options open.)

Be warned however that using seq might be considered bad style; it's even mentioned in Don't Ever Do These.

Some BSD-derived systems have jot(1) instead of seq(1). In accordance with the glorious tradition of Unix, it has a completely incompatible syntax:

   # POSIX shell, OpenBSD et al.
   printf "%02d\n" $(jot 10 1)

   # Bourne shell, OpenBSD (at least)
   jot -w %02d 10 1

Finally, the following example works with any BourneShell derived shell (which also has expr and sed) to zero-pad each line to three bytes:

   # Bourne
   i=0
   while test $i -le 10
   do
       echo "00$i"
       i=`expr $i + 1`
   done |
       sed 's/.*\(...\)$/\1/g'

In this example, the number of '.' inside the parentheses in the sed command determines how many total bytes from the echo command (at the end of each line) will be kept and printed.

But if you're going to rely on an external Unix command, you might as well just do the whole thing in awk in the first place:

   # Bourne
   # count variable contains an integer
   awk -v count="$count" 'BEGIN {for (i=1;i<=count;i++) {printf("%03d\n",i)} }'

   # Bourne, with Solaris's decrepit and useless awk:
   awk "BEGIN {for (i=1;i<=$count;i++) {printf(\"%03d\\n\",i)} }"


Now, since the number one reason this question is asked is for downloading images in bulk, you can use the examples above with xargs(1) and wget(1) to fetch files:

   almost any example above | xargs -i% wget $LOCATION/%

The xargs -i% will read a line of input at a time, and replace the % at the end of the command with the input.

Or, a simpler example using a for loop:

   # Bash 3
   for i in {1..100}; do
      wget "$prefix$(printf %03d $i).jpg"
      sleep 5
   done

Or, avoiding the subshells (requires bash 3.1):

   # Bash 3.1
   for i in {1..100}; do
      printf -v n %03d $i
      wget "$prefix$n.jpg"
      sleep 5
   done


CategoryShell

20. How can I split a file into line ranges, e.g. lines 1-10, 11-20, 21-30?

Some Unix systems provide the split utility for this purpose:

    split --lines 10 --numeric-suffixes input.txt output-

For more flexibility you can use sed. The sed command can print e.g. the line number range 1-10:

    sed -n -e '1,10p' -e '10q'

This stops sed from printing each line (-n). Instead it only processes the lines in the range 1-10 ("1,10"), and prints them ("p"). The command will quit after reading line 10 ("10q").

We can now use this to print an arbitrary range of a file (specified by line number):

# POSIX shell
file=/etc/passwd
range=10
cur=1
last=$(wc -l < "$file") # count number of lines
chunk=1
while [ $cur -lt $last ]
do
    endofchunk=$(($cur + $range - 1))
    sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > chunk.$(printf %04d $chunk)
    chunk=$(($chunk + 1))
    cur=$(($cur + $range))
done

The previous example uses POSIX arithmetic, which older Bourne shells do not have. In that case the following example should be used instead:

# legacy Bourne shell; assume no printf either
file=/etc/passwd
range=10
cur=1
last=`wc -l < "$file"` # count number of lines
chunk=1
while test $cur -lt $last
do
    endofchunk=`expr $cur + $range - 1`
    sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > chunk.$chunk
    chunk=`expr $chunk + 1`
    cur=`expr $cur + $range`
done

Awk can also be used to produce a more or less equivalent result:

 awk -v range=10 '{print > FILENAME "." (int((NR -1)/ range)+1)}' file


CategoryShell

21. How can I find and deal with file names containing newlines, spaces or both?

First and foremost, to understand why you're having trouble, read Arguments to get a grasp on how the shell understands the statements you give it. It is vital that you grasp this matter well if you're going to be doing anything with the shell.

The preferred method to deal with arbitrary filenames is still to use find(1):

find ... -exec command {} \;

or, if you need to handle filenames en masse:

find ... -exec command {} +

xargs is rarely ever more useful than the above, but if you really insist, remember to use -0:

# Requires GNU/BSD find and xargs
find ... -print0 | xargs -0 command

# Never use xargs without -0 or similar extensions!

Use one of these unless you really can't.

Another way to deal with files with spaces in their names is to use the shell's filename expansion (globbing). This has the disadvantage of not working recursively (except with zsh's extensions or bash 4's globstar), but if you just need to process all the files in a single directory, it works fantastically well.

For example, this code renames all the *.mp3 files in the current directory to use underscores in their names instead of spaces:

# Bash/ksh
for file in ./*\ *.mp3; do
  mv "$file" "${file// /_}"
done

For more examples of renaming files, see FAQ #30.

Remember, you need to quote all your Parameter Expansions using double quotes. If you don't, the expansion will undergo WordSplitting (see also argument splitting and BashPitfalls). Also, always prefix globs with "./"; otherwise, if there's a file with "-" as the first character, the expansions might be misinterpreted as options.

Another way to handle filenames recursively involves using the -print0 option of find (a GNU/BSD extension), together with bash's -d option for read:

# Bash
unset a i
while IFS= read -r -d $'\0' file; do
  a[i++]="$file"        # or however you want to process each file
done < <(find /tmp -type f -print0)

The preceding example reads all the files under /tmp (recursively) into an array, even if they have newlines or other whitespace in their names, by forcing read to use the NUL byte (\0) as its line delimiter. Since NUL is not a valid byte in Unix filenames, this is the safest approach besides using find -exec. IFS= is required to avoid trimming leading/trailing whitespace, and -r is needed to avoid backslash processing. In fact, $'\0' is equivalent to '' so we could also write it like this:

# Bash
unset a i
while IFS= read -r -d '' file; do
  a[i++]="$file"
done < <(find /tmp -type f -print0)

So, why doesn't this work?

# DOES NOT WORK
unset a i
find /tmp -type f -print0 | while IFS= read -r -d '' file; do
  a[i++]="$file"
done

Because of the pipeline, the entire while loop is executed in a SubShell and therefore the array assignments will be lost after the loop terminates. (For more details about this, see FAQ #24.)


CategoryShell

How can I replace a string with another string in a variable, a stream, a file, or in all the files in a directory?

There are a number of tools available for this. Which one to use depends on a lot of factors, the biggest of which is of course what we're editing.

First, we'll start with how to replace strings in:

1. Variables

If it's a variable, this can (and should) be done very simply with parameter expansion. Forking an external tool for string manipulation is extremely slow and unnecessary.

var='some string'; search=some; rep=another

# Bash
var=${var//"$search"/$rep}


# POSIX function

# usage: string_rep SEARCH REPL STRING
# replaces all instances of SEARCH with REPL in STRING
string_rep() {
  # initialize vars
  in=$3
  unset out

  # SEARCH must not be empty
  [[ $1 ]] || return

  while true; do
    # break loop if SEARCH is no longer in "$in"
    case "$in" in
      *"$1"*) : ;;
      *) break;;
    esac

    # append everything in "$in", up to the first instance of SEARCH, and REP, to "$out"
    out=$out${in%%"$1"*}$2
    # remove everything up to and including the first instance of SEARCH from "$in"
    in=${in#*"$1"}
  done

  # append whatever is left in "$in" after the last instance of SEARCH to out, and print
  printf '%s%s\n' "$out" "$in"
}

var=$(string_rep "$search" "$rep" "$var")

# Note: POSIX does not have a way to localize variables. Most shells (even dash and 
# busybox), however, do. Feel free to localize the variables if your shell supports
# it. Even if it does not, if you call the function with var=$(string_rep ...), the
# function will be run in a subshell and any assignments it makes will not persist.

In the bash example, the quotes around "$search" prevent the contents of the variable to be treated as a shell pattern (also called a "glob"). Of course, if pattern matching is intended, do not include the quotes. If "$rep" were quoted, however, the quotes would be treated as literal.

Parameter expansions like this are discussed in more detail in Faq #100.

2. Streams

If it's a file or a stream, things get a little bit trickier. The standard tools available for this are sed or AWK (for streams), and ed (for files).

Of course, you could do it in bash itself, by combining the previous method with Faq #1:

search=foo; rep=bar

while IFS= read -r line; do
  printf '%s\n' "${line//"$search"/$rep}"
done < <(some_command)

some_command | while IFS= read -r line; do
  printf '%s\n' "${line//"$search"/$rep}"
done

If you want to do more processing than just a simple search/replace, this may be the best option. Note that the last example runs the loop in a subshell. See Faq #24 for more information on that.

Another option would, of course, be sed:

# replaces all instances of "search" with "replace" in the output of "some_command"
some_command | sed 's/search/replace/g'

sed uses regular expressions. Unlike the bash, "search" and "replace" would have to be rigorously escaped in order to treat the values as literal strings. This is very impractical, and attempting to do so will make your code extremely prone to bugs. Embedding shell variables in sed is never a good idea.

You may notice, however, that the bash loop above is very slow for large data sets. So how do we find something faster, that can replace literal strings? Well, you could use AWK. The following function replaces all instances of STR with REP, reading from stdin and writing to stdout.

# usage: gsub_literal STR REP
# replaces all instances of STR with REP. reads from stdin and writes to stdout.
gsub_literal() {
  # STR cannot be empty
  [[ $1 ]] || return

  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    {
      # empty the output string
      out = "";

      # continue looping while the search string is in the line
      while (i = index($0, str)) {
        # append everything up to the search string, and the replacement string
        out = out substr($0, 1, i-1) rep;

        # remove everything up to and including the first instance of the
        # search string from the line
        $0 = substr($0, i + len);
      }

      # append whatever is left
      out = out $0;

      print out;
    }
  '
}

some_command | gsub_literal "$search" "$rep"


# condensed as a one-liner:
some_command | awk -v s="${search//\\/\\\\}" -v r="${rep//\\/\\\\}" 'BEGIN {l=length(s)} {o="";while (i=index($0, s)) {o=o substr($0,1,i-1) r; $0=substr($0,i+l)} print o $0}'

3. Files

Actually editing files gets even trickier. The only tool listed that actually edits a file is ed. The other methods could be used, but to do so involves a temp file and mv (or POSIX extensions).

ed is the standard UNIX command-based editor. Here are some commonly-used syntaxes for replacing the string olddomain.com by the string newdomain.com in a file named file. All four commands do the same thing, with varying degrees of portability and efficiency:

# Bash
ed -s file <<< $'g/olddomain\\.com/s//newdomain.com/g\nw\nq'

# Bourne (with printf)
printf '%s\n' 'g/olddomain\.com/s//newdomain.com/g' w q | ed -s file

printf 'g/olddomain\\.com/s//newdomain.com/g\nw\nq' | ed -s file

# Bourne (without printf)
ed -s file <<!
g/olddomain\\.com/s//newdomain.com/g
w
q
!

To replace a string in all files of the current directory:

for file in ./*; do
    [[ -f $file ]] && ed -s "$file" <<< $'g/old/s//new/g\nw\nq'
done

To do this recursively, the easy way would be to enable globstar in bash 4 (shopt -s globstar, a good idea to put this in your ~/.bashrc) and use:

for file in ./**/*; do
    [[ -f $file ]] && ed -s "$file" <<< $'g/old/s//new/g\nw\nq'
done

If you don't have bash 4, you can use find. Unfortunately, it's a bit tedious to feed ed stdin for each file hit:

find . -type f -exec bash -c 'printf "%s\n" "g/old/s//new/g" w q | ed -s "$1"' _ {} \;

sed is a Stream EDitor, not a file editor. Nevertheless, people everywhere tend to abuse it for trying to edit files. It doesn't edit files. GNU sed (and some BSD seds) have a -i option that makes a copy and replaces the original file with the copy. An expensive operation, but if you enjoy unportable code, I/O overhead and bad side effects (such as destroying symlinks), this would be an option:

sed -i    's/old/new/g' ./*  # GNU
sed -i '' 's/old/new/g' ./*  # BSD

# POSIX sed, uses a temp file and mv:

# remove all temp files on exit, in case sed fails and they weren't moved
trap 'rm -f "${temps[@]}"' EXIT

temps=()
for file in ./*; do
  if [[ -f $file ]]; then
    tmp=$(mktemp) || exit
    temps+=("$tmp")

    sed 's/old/new/g' "$file" > "$tmp" &&
    mv "$tmp" "$file"
  fi
done

Those of you who have perl 5 can accomplish the same thing using this code:

perl -pi -e 's/old/new/g' ./*

Moreover, perl can be used to pass variables into both search and replace strings with no unquoting or potential for conflict with sigil characters:

in="input (/string" out="output string" perl -pi -e $'$quoted_in=quotemeta($ENV{\'in\'}); s/$quoted_in/$ENV{\'out\'}/g' ./*

Recursively using find:

find . -type f -exec perl -pi -e 's/old/new/g' {} \;   # if your find doesn't have + yet
find . -type f -exec perl -pi -e 's/old/new/g' {} +    # if it does

If you want to delete lines instead of making substitutions:

# Deletes any line containing the perl regex foo
perl -ni -e 'print unless /foo/' ./*

To replace for example all "unsigned" with "unsigned long", if it is not "unsigned int" or "unsigned long" ...:

find . -type f -exec perl -i.bak -pne \
    's/\bunsigned\b(?!\s+(int|short|long|char))/unsigned long/g' {} \;


All of the tools listed above use regular expressions, which means they have the same issue as the sed code earlier; trying to embed shell variables in them is a terrible idea, and treating an arbitrary value as a literal string is painful at best. This brings us back to our while read loop, or the awk function above.

The while read loop:

# overwrite a single file
tmp=$(mktemp) || exit
trap 'rm -f "$tmp"' EXIT

while IFS= read -r line; do
  printf '%s\n' "${line//"$search"/$rep}"
done < "$file" > "$tmp" && mv "$tmp" "$file"

Replaces all files in a directory:

trap 'rm -f "${temps[@]}"' EXIT

temps=()
for f in ./*; do
  if [[ -f $f ]]; then
    tmp=$(mktemp) || exit
    temps+=("$tmp")

    while IFS= read -r line; do
      printf '%s\n' "${line//"$search"/$rep}"
    done < "$f" > "$tmp" &&
    mv "$tmp" "$f"
  fi
done

The above glob could be changed to './**/*' in order to use globstar (mentioned above) to be recursive, or of course we could use find:

# this example uses GNU find's -print0. Using POSIX find -exec is left as an exercise to the reader
trap 'rm -f "${temps[@]}"' EXIT

temps=()
while IFS= read -rd '' f <&3; do
  tmp=$(mktemp) || exit
  temps+=("$tmp")

  while IFS= read -r line; do
    printf '%s\n' "${line//"$search"/$rep}"
  done < "$f" > "$tmp" &&
  mv "$tmp" "$f"
done 3< <(find . -type f -print0)

And of course, we can adapt the AWK function above. The following function replaces all instances of STR with REP in FILE, actually overwriting FILE:

# usage: gsub_literal_f STR REP FILE
# replaces all instances of STR with REP in FILE
gsub_literal_f() {
  local tmp
  # make sure FILE exists, is a regular file, and is readable and writable
  if ! [[ -f $3 && -r $3 && -w $3 ]]; then
    printf '%s does not exist or is not readable or writable\n' "$3" >&2
    return 1
  fi

  # STR cannot be empty
  [[ $1 ]] || return

  tmp=$(mktemp) || return
  trap 'rm -f "$tmp"' RETURN

  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    {
      # empty the output string
      out = "";

      # continue looping while the search string is in the line
      while (i = index($0, str)) {
        # append everything up to the search string, and the replacement string
        out = out substr($0, 1, i-1) rep;

        # remove everything up to and including the first instance of the
        # search string from the line
        $0 = substr($0, i + len);
      }

      # append whatever is left
      out = out $0;

      print out;
    }
  ' "$3" > "$tmp" && mv "$tmp" "$3"
}

This function, of course, could be called on all of the files in a dir, or recursively.


Notes:

For more information on sed or awk, you can visit the ##sed and #awk channels on freenode, respectively.

mktemp(1), used in many of the examples above, is not completely portable. While it will work on most systems, more information on safely creating temp files can be found in Faq #62.


CategoryShell

How can I calculate with floating point numbers instead of just integers?

BASH's builtin arithmetic uses integers only:

$ echo $((10/3))
3

For most operations involving floating-point numbers, an external program must be used, e.g. bc, AWK or dc:

$ echo "scale=3; 10/3" | bc
3.333

The "scale=3" command notifies bc that three digits of precision after the decimal point are required.

Same example with dc (reversed polish calculator, lighter than bc):

$ echo "3 k 10 3 / p" | dc
3.333

k sets the precision to 3, and p prints the value of the top of the stack with a newline. The stack is not altered, though.

If you are trying to compare floating point numbers (less-than or greater-than), and you have GNU bc, you can do this:

# Bash
if (( $(bc <<< "1.4 < 2.5") )); then
  echo "1.4 is less than 2.5."
fi

However, x < y is not supported by all versions of bc:

# This would work with some versions, but not HP-UX 10.20.
imadev:~$ bc <<< '1 < 2'
syntax error on line 1,

If you want to be portable, you need something more subtle:

# POSIX
case $(echo "1.4 - 2.5" | bc) in
  -*) echo "1.4 is less than 2.5";;
esac

This example subtracts 2.5 from 1.4, and checks the sign of the result. If it is negative, the first number is less than the second. We aren't actually treating bc's output as a number; we're treating it as a string, and only looking at the first character.

Legacy (Bourne) version:

# Bourne
case "`echo "1.4 - 2.5" | bc`" in
  -*) echo "1.4 is less than 2.5";;
esac

AWK can be used for calculations, too:

$ awk 'BEGIN {printf "%.3f\n", 10 / 3}'
3.333

There is a subtle but important difference between the bc and the awk solution here: bc reads commands and expressions from standard input. awk on the other hand evaluates the expression as part of the program. Expressions on standard input are not evaluated, i.e. echo 10/3 | awk '{print $0}' will print 10/3 instead of the evaluated result of the expression.

Newer versions of zsh and the KornShell have built-in floating point arithmetic, together with mathematical functions like sin() or cos(). So many of these calculations can be done natively in ksh:

# ksh93
$ echo $((3.00000000000/7))
0.428571428571428571

Comparing two floating-point numbers for equality is actually an unwise thing to do; two calculations that should give the same result on paper may give ever-so-slightly-different floating-point numeric results due to rounding/truncation issues. If you wish to determine whether two floating-point numbers are "the same", you may either:

  • Round them both to a desired level of precision, and then compare the rounded results for equality; or
  • Subtract one from the other and compare the absolute value of the difference against an epsilon value of your choice.

One of the very few things that Bash actually can do with floating-point numbers is round them, using printf:

# Bash 3.1
# See if a and b are close to each other.
# Round each one to two decimal places and compare results as strings.
a=3.002 b=2.998
printf -v a1 %.2f $a
printf -v b1 %.2f $b
if [[ $a1 = "$b1" ]]; then echo "a and b are the same, roughly"; fi

Caveat: Many problems that look like floating point arithmetic can in fact be solved using integers only, and thus do not require these tools -- e.g., problems dealing with rational numbers. For example, to check whether two numbers x and y are in a ratio of 4:3 or 16:9 you may use something along these lines:

# Bash
# Variables x and y are integers
if (( $x*9-$y*16==0 )) ; then
   echo "16:9."
elif (( $x*3-$y*4==0 )) ; then
   echo "4:3."
else
   echo "Neither 16:9 nor 4:3."
fi

A more elaborate test could tell if the ratio is closest to 4:3 or 16:9 without using floating point arithmetic. Note that this very simple example that apparently involves floating point numbers and division is solved with integers and no division. If possible, it's usually more efficient to convert your problem to integer arithmetic than to use floating point arithmetic.


CategoryShell

I want to launch an interactive shell that has special aliases and functions, not the ones in the user's ~/.bashrc.

Just specify a different start-up file:

bash --rcfile /my/custom/bashrc

Variant question: I have a script that sets up an environment, and I want to give the user control at the end of it.

Put exec bash at the end of it to launch an interactive shell. This shell will inherit the environment (which does not include aliases, but that's OK, because aliases suck). Of course, you must also make sure that your script runs in a terminal -- otherwise, you must create one, for example, by using exec xterm -e bash.


CategoryShell

I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?

In most shells, each command of a pipeline is executed in a separate SubShell. Non-working example:

    # Works only in ksh88/ksh93, or bash 4.2 with lastpipe enabled
    # In other shells, this will print 0
    linecnt=0
    printf '%s\n' foo bar | while read -r line
    do
        linecnt=$((linecnt+1))
    done
    echo "total number of lines: $linecnt"

The reason for this potentially surprising behaviour, as described above, is that each SubShell introduces a new variable context and environment. The while loop above is executed in a new subshell with its own copy of the variable linecnt created with the initial value of '0' taken from the parent shell. This copy then is used for counting. When the while loop is finished, the subshell copy is discarded, and the original variable linecnt of the parent (whose value hasn't changed) is used in the echo command.

Different shells exhibit different behaviors in this situation:

  • BourneShell creates a subshell when the input or output of anything (loops, case etc..) but a simple command is redirected, either by using a pipeline or by a redirection operator ('<', '>').

  • BASH creates a new process only if the loop is part of a pipeline.

  • KornShell creates it only if the loop is part of a pipeline, but not if the loop is the last part of it. The read example above actually works in ksh88 and ksh93! (but not mksh)

  • POSIX specifies the bash behaviour, but as an extension allows any or all of the parts of the pipeline to run without a subshell (thus permitting the KornShell behaviour, as well).

More broken stuff:

    # Bash 4
    # The problem also occurs without a loop
    printf '%s\n' foo bar | mapfile -t line  
    printf 'total number of lines: %s\n' "${#line[@]}" # prints 0

    f() {
        if [[ -t 0 ]]; then
            echo "$1"
        else
            read -r var
        fi
    };

    f 'hello' | f
    echo "$var" # prints nothing

Again, in both cases the pipeline causes read or some containing command to run in a subshell, so its effect is never witnessed in the parent process.

  • It should be stressed that this issue isn't specific to loops. It's a general property of all pipes, though the "while/read" loop might be considered the canonical example that crops up over and over when people read the help or manpage description of the read builtin and notice that it accepts data on stdin. They might recall that data redirected into a compound command is available throughout that command, but not understand why all the fancy process substitutions and redirects they run across in places like FAQ #1 are necessary. Naturally they proceed to put their funstuff directly into a pipeline, and confusion ensues.

1. Workarounds

  • If the input is a file, a simple redirect will suffice:
        # POSIX
        while read -r line; do linecnt=$(($linecnt+1)); done < file
        echo $linecnt

    Unfortunately, this doesn't work with a Bourne shell; see sh(1) from the Heirloom Bourne Shell for a workaround.

  • Use command grouping and do everything in the subshell:

        # POSIX
        linecnt=0
        cat /etc/passwd | {
        while read -r line ; do
            linecnt=$((linecnt+1))
        done
        echo "total number of lines: $linecnt"
        }
    This doesn't really change the subshell situation, but if nothing from the subshell is needed in the rest of your code then destroying the local environment after you're through with it could be just what you want anyway.
  • Use ProcessSubstitution (Bash only):

        # Bash
        while read -r line; do
            ((linecnt++))
        done < <(grep PATH /etc/profile)
        echo "total number of lines: $linecnt"
    This is essentially identical to the first workaround above. We still redirect a file, only this time the file happens to be a named pipe temporarily created by our process substitution to transport the output of grep.
  • Use a named pipe:

        # POSIX
        mkfifo mypipe
        grep PATH /etc/profile > mypipe &
        while read -r line;do
            linecnt=$(($linecnt+1))
        done < mypipe
        echo "total number of lines: $linecnt"
  • Use a coprocess (ksh, even pdksh, bash 4, oksh, mksh..):

        # ksh
        grep PATH /etc/profile |&
        while read -r -p line; do
            linecnt=$((linecnt+1))
        done
        echo "total number of lines: $linecnt"
  • Use a HereString (Bash only):

         read -ra words <<< 'hi ho hum'
         printf 'total number of words: %d' "${#words[@]}"

    The <<< operator is specific to bash (2.05b and later), however it is a very clean and handy way to specify a small string of literal input to a command.

  • With a POSIX shell, or for longer multi-line data, you can use a here document instead:
        # Bash
        declare -i linecnt
        while read -r; do
            ((linecnt++))
        done <<EOF
        hi
        ho
        hum
        EOF
        printf 'total number of lines: %d' "$linecnt"
  • Use lastpipe (Bash 4.2)
         # Bash 4.2
         set +m
         shopt -s lastpipe
    
         printf '%s\n' hi{,,,,,} | while read -r "lines[x++]"; do :; done
         printf 'total number of lines: %d' "${#lines[@]}"
    Bash 4.2 introduces the aforementioned ksh-like behavior to Bash. The one caveat is that job control must not be enabled, thereby limiting its usefulness in an interactive shell.

For more related examples of how to read input and break it into words, see FAQ #1.


CategoryShell

How can I access positional parameters after $9?

Use ${10} instead of $10. This works for BASH and KornShell, but not for older BourneShell implementations. Another way to access arbitrary positional parameters after $9 is to use for, e.g. to get the last parameter:

    # Bourne
    for last
    do
        : # nothing
    done

    echo "last argument is: $last"

To get an argument by number, we can use a counter:

    # Bourne
    n=12        # This is the number of the argument we are interested in
    i=1
    for arg
    do
        if test $i -eq $n
        then
            argn=$arg
            break
        fi
        i=`expr $i + 1`
    done
    echo "argument number $n is: $argn"

This has the advantage of not "consuming" the arguments. If this is no problem, the shift command discards the first positional arguments:

    shift 11
    echo "the 12th argument is: $1"

In addition, bash and ksh93 treat the set of positional parameters as an array, and you may use parameter expansion syntax to address those elements in a variety of ways:

    # Bash, ksh93
    for x in "${@:(-2)}"    # iterate over the last 2 parameters
    for y in "${@:2}"       # iterate over all parameters starting at $2
                            # which may be useful if we don't want to shift

Although direct access to any positional argument is possible this way, it's seldom needed. The common alternative is to use getopts to process options (e.g. "-l", or "-o filename"), and then use either for or while to process all the remaining arguments in turn. An explanation of how to process command line arguments is available in FAQ #35, and another is found at http://www.shelldorado.com/goodcoding/cmdargs.html


CategoryShell

How can I randomize (shuffle) the order of lines in a file? (Or select a random line from a file, or select a random file from a directory.)

To randomize the lines of a file, here is one approach. This one involves generating a random number, which is prefixed to each line; then sorting the resulting lines, and removing the numbers.

    #bash
    randomize() {
        while IFS='' read -r l ; do printf "$RANDOM\t%s\n" "$l"; done |
        sort -n |
        cut -f2-
    }

RANDOM is supported by BASH, KornShell but is not defined by posix.

Here's the same idea (printing random numbers in front of a line, and sorting the lines on that column) using other programs:

    # Bourne
    awk '
        BEGIN { srand() }
        { print rand() "\t" $0 }
    ' |
    sort -n |    # Sort numerically on first (random number) column
    cut -f2-     # Remove sorting column

This is (possibly) faster than the previous solution, but will not work for very old AWK implementations (try "nawk", or "gawk", or /usr/xpg4/bin/awk if available). (Note that awk use the epoch time as a seed for srand(), which might not be random enough for you)

A generalized version of this question might be, How can I shuffle the elements of an array? If we don't want to use the rather clumsy approach of sorting lines, this is actually more complex than it appears. A naive approach would give us badly biased results. A more complex (and correct) algorithm looks like this:

    # Uses a global array variable.  Must be compact (not a sparse array).
    # Bash syntax.
    shuffle() {
       local i tmp size max rand

       # $RANDOM % (i+1) is biased because of the limited range of $RANDOM
       # Compensate by using a range which is a multiple of the array size.
       size=${#array[*]}
       max=$(( 32768 / size * size ))

       for ((i=size-1; i>0; i--)); do
          while (( (rand=$RANDOM) >= max )); do :; done
          rand=$(( rand % (i+1) ))
          tmp=${array[i]} array[i]=${array[rand]} array[rand]=$tmp
       done
    }

This function shuffles the elements of an array in-place using the Knuth-Fisher-Yates shuffle algorithm.

Another question we frequently see is, How can I print a random line from a file? The problem here is that you need to know in advance how many lines the file contains. Lacking that knowledge, you have to read the entire file through once just to count them -- or, you have to suck the entire file into memory. Let's explore both of these approaches.

   # Bash
   n=$(wc -l < "$file")        # Count number of lines.
   r=$((RANDOM % n + 1))       # Random number from 1..n. (See below)
   sed -n "$r{p;q;}" "$file"   # Print the r'th line.

   #posix with awk
   awk -v n="$(wc -l<"$file")" 'BEGIN{srand();l=int((rand()*n)+1)} NR==l{print;exit}' "$file"

(see this faq for more info about printing the n'th line.)

The next example sucks the entire file into memory. This approach saves time reopening the file, but obviously uses more memory. (Arguably: on systems with sufficient memory and an effective disk cache, you've read the file into memory by the earlier methods, unless there's insufficient memory to do so, in which case you shouldn't, QED.)

   # Bash
   unset lines i
   while IFS= read -r 'lines[i++]'; do :; done < "$file"   # See FAQ 5
   n=${#lines[@]}
   r=$((RANDOM % n))   # see below
   echo "${lines[r]}"

Note that we don't add 1 to the random number in this example, because the array of lines is indexed counting from 0.

Also, some people want to choose a random file from a directory (for a signature on an e-mail, or to choose a random song to play, or a random image to display, etc.). A similar technique can be used:

    # Bash
    files=(*.ogg)                  # Or *.gif, or *
    n=${#files[@]}                 # For aesthetics
    xmms -- "${files[RANDOM % n]}" # Choose a random element

Note that these last few examples use a simple modulus of the RANDOM variable, so the results are biased. If this is a problem for your application, then use the anti-biasing technique from the Knuth-Fisher-Yates example, above.

Other non portable utilities:

  • GNU Coreutils shuf (in recent enough coreutils)

  • GNU sort -R

Speaking of GNU coreutils, as of version 6.9 GNU sort has the -R (aka --random-sort) flag. Oddly enough, it only works for the generic locale:

     LC_ALL=C sort -R file     # output the lines in file in random order
     LC_ALL=POSIX sort -R file # output the lines in file in random order
     LC_ALL=en_US sort -R file # effectively ignores the -R option

For more details, see info coreutils sort or an equivalent manual.

Behavior described appears reversed in current versions of mksh, ksh93, Bash, and Zsh. Still something to keep in mind for legacy. -ormaaj


CategoryShell

How can two unrelated processes communicate?

Two unrelated processes cannot use the arguments, the environment or stdin/stdout to communicate; some form of inter-process communication (IPC) is required.

1. A file

Process A writes in a file, and Process B reads the file. This method is not synchronized and therefore is not safe if B can read the file while A writes in it. A lockdir or a signal can probably help.

2. A directory as a lock

mkdir can be used to test for the existence of a dir and create it in one atomic operation; it thus can be used as a lock, although not a very efficient one.

Script A:

    until mkdir /tmp/dir;do # wait until we can create the dir
      sleep 1
    done
    echo foo > file         # write in the file this section is critical
    rmdir /tmp/dir          # remove the lock

Script B:

    until mkdir /tmp/dir;do #wait until we can create the dir
      sleep 1
    done
    read var < file         # read in the file this section is, critical
    echo "$var"             # Script A cannot write in the file
    rmdir /tmp/dir          # remove the lock

See Faq #45 and mutex for more examples with a lock directory.

3. Signals

Signals are probably the simplest form of IPC:

ScriptA:

    trap 'flag=go' USR1 #set up the signal handler for the USR1 signal

    # echo $$ > /tmp/ScriptA.pid #if we want to save the pid in a file

    flag=""
    while [[ $flag != go ]]; do # wait for the green light from Script B
      sleep 1;
    done
    echo we received the signal

You must find or know the pid of the other script to send it a signal using kill:

     #kill all the
     pkill -USR1 -f ScriptA 
     
     #if ScriptA saved its pid in a file
     kill -USR1 $(</var/run/ScriptA.pid)

     #if ScriptA is a child:
     ScriptA & pid=$!
     kill -USR1 $pid

The first 2 methods are not bullet proof and will cause trouble if you run more than one instance of scriptA.

4. Named Pipes

Named pipes are a much richer form of IPC. They are described on their own page: NamedPipes.


CategoryShell

How do I determine the location of my script? I want to read some config files from the same place.

This topic comes up frequently. This answer covers not only the expression used above ("configuration files"), but also several variant situations. If you've been directed here, please read this entire answer before dismissing it.

This is a complex question because there's no single right answer to it. Even worse: it's not possible to find the location reliably in 100% of all cases. All ways of finding a script's location depend on the name of the script, as seen in the predefined variable $0. But providing the script name in $0 is only a (very common) convention, not a requirement.

The suspect answer is "in some shells, $0 is always an absolute path, even if you invoke the script using a relative path, or no path at all". But this isn't reliable across shells; some of them (including BASH) return the actual command typed in by the user instead of the fully qualified path. And this is just the tip of the iceberg!

Your script may not actually be on a locally accessible disk at all. Consider this:

  ssh remotehost bash < ./myscript

The shell running on remotehost is getting its commands from a pipe. There's no script anywhere on any disk that bash can see.

Moreover, even if your script is stored on a local disk and executed, it could move. Someone could mv the script to another location in between the time you type the command and the time your script checks $0. Or someone could have unlinked the script during that same time window, so that it doesn't actually have a link within a file system any more.

Even in the cases where the script is in a fixed location on a local disk, the $0 approach still has some major drawbacks. The most important is that the script name (as seen in $0) may not be relative to the current working directory, but relative to a directory from the program search path $PATH (this is often seen with KornShell). Or (and this is most likely problem by far...) there might be multiple links to the script from multiple locations, one of them being a simple symlink from a common PATH directory like /usr/local/bin, which is how it's being invoked. Your script might be in /opt/foobar/bin/script but the naive approach of reading $0 won't tell you that -- it may say /usr/local/bin/script instead.

(For a more general discussion of the Unix file system and how symbolic links affect your ability to know where you are at any given moment, see this Plan 9 paper.)

Having said all that, if you still want to make a whole slew of naive assumptions, and all you want is the fully qualified version of $0, you can use something like this (BASH syntax):

  [[ $0 == /* ]] && echo "$0" || echo "${PWD}/${0#./}"

Or the BourneShell version:

  case $0 in /*) echo "$0";; *) echo "`pwd`/$0";; esac

Or a shell-independent variant (needs a readlink(1) supporting -f, though, so it's OS-dependent):

  readlink -f "$0"

In Bash, version 4.1.7(1)-release, on Linux, it seems bash always opens the script with fd 255 so you can just do:

  HOME="$(dirname "$(readlink /proc/$$/fd/255)")"

If we want to account for the cases where the script's relative pathname (in $0) may be relative to a $PATH component instead of the current working directory (as mentioned above), we can try to search for the script in all the directories of $PATH.

The following script shows how this could be done:

   1 #!/bin/bash
   2 
   3 myname=$0
   4 if [[ -s "$myname" ]] && [[ -x "$myname" ]]; then
   5     # $myname is already a valid file name
   6 
   7     mypath=$myname
   8 else
   9     case "$myname" in
  10     /*) exit 1;;             # absolute path - do not search PATH
  11     *)
  12         # Search all directories from the PATH variable. Take
  13         # care to interpret leading and trailing ":" as meaning
  14         # the current directory; the same is true for "::" within
  15         # the PATH.
  16 
  17         # Replace leading : with . in PATH, store in p
  18         p=${PATH/#:/.:}
  19         # Replace trailing : with .
  20         p=${p//%:/:.}
  21         # Replace :: with :.:
  22         p=${p//::/:.:}
  23         # Temporary input field separator, see FAQ #1
  24         OFS=$IFS IFS=:
  25         # Split the path on colons and loop through each of them
  26         for dir in $p; do
  27                 [[ -f "$dir/$myname" ]] || continue # no file
  28                 [[ -x "$dir/$myname" ]] || continue # not executable
  29                 mypath=$dir/$myname
  30                 break           # only return first matching file
  31         done
  32         # Restore old input field separator
  33         IFS=$OFS
  34         ;;
  35     esac
  36 fi
  37 
  38 if [[ ! -f "$mypath" ]]; then
  39     echo >&2 "cannot find full path name: $myname"
  40     exit 1
  41 fi
  42 
  43 echo >&2 "path of this script: $mypath"

Note that $mypath is not necessarily an absolute path name. It still can contain relative paths like ../bin/myscript, because $PATH could contain those. If you want to get the directory only from that string, check FAQ 73.

Are you starting to see how ridiculously complex this problem is becoming? And this is still just the simplistic case where we've made a lot of assumptions about the script not moving and not being piped in!

Generally, storing data files in the same directory as their programs is a bad practise. The Unix file system layout assumes that files in one place (e.g. /bin) are executable programs, while files in another place (e.g. /etc) are data files. (Let's ignore legacy Unix systems with programs in /etc for the moment, shall we....)

Here are some common sense alternatives you should consider, instead of attempting to perform the impossible:

  • It really makes the most sense to keep your script's configuration in a single, static location such as /etc/foobar.conf.

  • If you need to define multiple configuration files, then you can have a directory (say, /var/lib/foobar/ or /usr/local/lib/foobar/), and read that directory's location from a fixed place such as /etc/foobar.conf.

  • If you don't even want that much to be hard-coded, you could pass the location of foobar.conf (or of your configuration directory itself) as a parameter to the script.

  • If you need the script to assume certain default in the absence of /etc/foobar.conf, you can put defaults in the script itself, or fall back to something like $HOME/.foobar.conf if /etc/foobar.conf is missing.

  • When you install the script on a target system, you could put the script's location into a variable in the script itself. The information is available at that point, and as long as the script doesn't move, it will always remain correct for each installed system.
  • In most cases, it makes more sense to abort gracefully if your configuration data can't be found by obvious means, rather than going through arcane processes and possibly coming up with wrong answers.
    • BASH_SOURCE is probably a much better idea than $0, gives better results and is better defined. This article should probably be rewritten with BASH_SOURCE in mind. --Lhunath


CategoryShell

How can I display the target of a symbolic link?

The nonstandard external command readlink(1) can be used to display the target of a symbolic link:

$ readlink /bin/sh
bash

If you don't have readlink, you can use Perl:

perl -e 'print readlink "/bin/sh", "\n"'

You can also use GNU find's -printf %l directive, which is especially useful if you need to resolve links in batches:

$ find /bin/ -type l -printf '%p points to %l\n'
/bin/sh points to bash
/bin/bunzip2 points to bzip2
...

If your system lacks both readlink and Perl, you can use a function like this one:

# Bash
readlink() {
    local path=$1 ll

    if [ -L "$path" ]; then
        ll=$(LC_ALL=C ls -l "$path" 2>/dev/null) &&
        printf '%s\n' "${ll#* -> }"
    else
        return 1
    fi
}

However, this can fail if a symbolic link contains " -> " in its name.


CategoryShell

How can I rename all my *.foo files to *.bar, or convert spaces to underscores, or convert upper-case file names to lower case?

Some GNU/Linux distributions have a rename(1) command, which you can use for the former; however, the syntax differs from one distribution to the next, so it's not a portable answer....

Consult your system's man pages if you want to learn how to use your rename command, if you have one at all. It's often perfectly good for one-shot interactive renames, just not in portable scripts. We don't include any rename examples here because it's too confusing -- there are two common versions of it and they're totally incompatible with each other.

You can do non-recursive mass renames portably with a loop and some Parameter Expansions, like this:

# POSIX
# Rename all *.foo to *.bar
for f in *.foo; do mv -- "$f" "${f%.foo}.bar"; done

# POSIX
# This removes the extension .zip from all the files.
for file in ./*.zip; do mv "$file" "${file%.zip}"; done

The "--" and "./*" are to protect from problematic filenames that begin with "-". You only need one or the other, not both, so pick your favorite.

Here are some similar examples, using Bash-specific parameter expansions:

# Bash
# Replace all spaces with underscores
for f in *\ *; do mv -- "$f" "${f// /_}"; done

# Bash
# Rename all "foo" to "bar"
for file in ./*foo*; do mv "$file" "${file//foo/bar}"; done

All the above examples invoke the external command mv(1) once for each file, so they may not be as efficient as some of the rename implementations.

If you want to rename files recursively, then it becomes much more challenging. This example renames *.foo to *.bar:

# Bash
# Also requires GNU or BSD find(1)
# Recursively change all *.foo files to *.bar

find . -type f -name '*.foo' -print0 | while IFS= read -r -d '' f; do
  mv -- "$f" "${f%.foo}.bar"
done

This example uses Bash 4's globstar instead of GNU find:

# Bash 4 which requires globstar to be enabled. NOT portable!
# Rename all "foo" files to "bar" recursively.
# "foo" must NOT appear in a directory name.

shopt -s globstar; for file in /path/to/**/*foo*; do mv -- "$file" "${file//foo/bar}"; done

To check what will be the output of the above command, you can add an echo before the mv so you will get an idea.

For more techniques on dealing with files with inconvenient characters in their names, see FAQ #20.

The trickiest part of recursive renames is ensuring that you do not change the directory component of a pathname, because something like this is doomed to failure:

mv "./FOO/BAR/FILE.TXT" "./foo/bar/file.txt"

Therefore, any recursive renaming command should only change the filename component of each pathname. If you need to rename the directories as well, those should be done separately. Furthermore, recursive directory renaming should either be done depth-first (changing only the last component of the directory name in each instance), or in several passes. Depth-first works better in the general case.

Here's an example script that uses depth-first recursion (changes spaces in names to underscores, but you just need to change the ren() function to do anything you want) to rename both files and directories (again, it's easy to modify to make it act only on files or only on directories, or to act only on files with a certain extension, to avoid or force overwriting files, etc.):

# Bash
ren() {
  local newname
  newname=${1// /_}
  [ "$1" != "$newname" ] && mv -- "$1" "$newname"
}

traverse() {
  local i
  cd -- "$1" || exit 1
  for i in *; do
    [ -d "$i" ] && traverse "$i"
    ren "$i"
  done
  cd .. || exit 1
}

# main program
shopt -s nullglob
traverse /path/to/startdir

Here is another way to recursively rename all directories and files with spaces in their names:

find . -depth -name "* *" -exec bash -c 'dir=${1%/*} base=${1##*/}; mv "$1" "$dir/${base// /_}"' _ {} \;

or, if your version of find accepts it, this is more efficient as it runs one bash for many files instead of one bash per file:

find . -depth -name "* *" -exec bash -c 'for f; do dir=${f%/*} base=${f##*/}; mv "$f" "$dir/${base// /_}"; done' _ {} +

To convert filenames to lower case, if you have the utility mmv(1) on your machine, you could simply do:

# convert all filenames to lowercase
mmv "*" "#l1"

Otherwise, you need something that can take a mixed-case filename as input and give back the lowercase version as output. In Bash 4 and higher, there is a parameter expansion that can do it:

# Bash 4
for f in *[[:upper:]]*; do mv -- "$f" "${f,,*}"; done

Otherwise, tr(1) may be helpful:

# tolower - convert file names to lower case
# POSIX
for file in "$@"
do
    [ -f "$file" ] || continue                # ignore non-existing names
    newname=$(echo "$file" | tr '[:upper:]' '[:lower:]')     # lower case
    [ "$file" = "$newname" ] && continue      # nothing to do
    [ -f "$newname" ] && continue             # don't overwrite existing files
    mv -- "$file" "$newname"
done

We use the fancy range notation, because tr can behave very strangely when using the A-Z range on some locales:

imadev:~$ echo Hello | tr A-Z a-z
hÉMMÓ

To make sure you aren't caught by surprise when using tr with ranges, either use the fancy range notations, or set your locale to C.

imadev:~$ echo Hello | LC_ALL=C tr A-Z a-z
hello
imadev:~$ echo Hello | tr '[:upper:]' '[:lower:]'
hello
# Either way is fine here.

This technique can also be used to replace all unwanted characters in a file name, e.g. with '_' (underscore). The script is the same as above, with only the "newname=..." line changed.

# renamefiles - rename files whose name contain unusual characters
# POSIX
for file in "$@"
do
    [ -f "$file" ] || continue            # ignore non-regular files, etc.
    newname=$(echo "$file" | sed 's/[^[:alnum:]_.]/_/g')
    [ "$file" = "$newname" ] && continue  # nothing to do
    [ -f "$newname" ] && continue         # do not overwrite existing files
    mv -- "$file" "$newname"
done

The character class in [] contains all the characters we want to keep (after the ^); modify it as needed. The [:alnum:] range stands for all the letters and digits of the current locale.

Here's an example that does the same thing, but this time using Parameter Expansion instead of sed:

# renamefiles (more efficient, less portable version)
# Bash
for file in "$@"; do
   [ -f "$file" ] || continue
   newname=${f//[^[:alnum:]_.]/_}
   [ "$file" = "$newname" ] && continue
   [ -f "$newname" ] && continue
   mv -- "$file" "$newname"
done

It should be noted that all these examples contain a race condition -- an existing file could be overwritten if it is created in between the [ -f "$newname" ... and mv "$file" ... commands. Solving this issue is beyond the scope of this page, however.

One final note about changing the case of filenames: when using GNU mv, on many file systems, attempting to rename a file to its lowercase or uppercase equivalent will fail. (This applies to Cygwin on DOS/Windows systems using FAT or NTFS file systems; to GNU mv on Mac OS X systems using HFS+ in case-insensitive mode; as well as to Linux systems which have mounted Windows/Mac file systems, and possibly many other setups.) GNU mv checks both the target names before attempting a rename, and due to the file system's mapping, it thinks that the destination "already exists":

mv README Readme    # fails with GNU mv on FAT file systems, etc.

The workaround for this is to rename the file twice: first to a temporary name which is completely different from the original name, then to the desired name.

mv README tempfilename &&
mv tempfilename Readme


CategoryShell

What is the difference between test, [ and [[ ?

[ ("test" command) and [[ ("new test" command) are used to evaluate expressions. [[ works only in Bash and Korn shell, and is more powerful; [ and test are available in POSIX shells. Here are some examples:

    if [ -z "$variable" ]
    then
        echo "variable is empty!"
    fi

    if [ ! -f "$filename" ]
    then
        echo "not a valid, existing file name: $filename"
    fi

and

    if [[ ! -e $file ]]
    then
        echo "directory entry does not exist: $file"
    fi

    if [[ $file0 -nt $file1 ]]
    then
        echo "file $file0 is newer than $file1"
    fi

To cut a long story short: test implements the old, portable syntax of the command. In almost all shells (the oldest Bourne shells are the exception), [ is a synonym for test (but requires a final argument of ]). Although all modern shells have built-in implementations of [, there usually still is an external executable of that name, e.g. /bin/[. POSIX defines a mandatory feature set for [, but almost every shell offers extensions to it. So, if you want portable code, you should be careful not to use any of those extensions.

[[ is a new improved version of it, and is a keyword, not a program. This makes it easier to use, as shown below. [[ is understood by KornShell and BASH (e.g. 2.03), but not by the older POSIX or BourneShell.

Although [ and [[ have much in common, and share many expression operators like "-f", "-s", "-n", "-z", there are some notable differences. Here is a comparison list:

Feature

new test [[

old test [

Example

string comparison

>

\> (*)

[[ a > b ]] || echo "a does not come before b"

<

\< (*)

[[ az < za ]] && echo "az comes before za"

= (or ==)

=

[[ a == a ]] && echo "a equals a"

!=

!=

[[ a != b ]] && echo "a is not equal to b"

integer comparison

-gt

-gt

[[ 5 -gt 10 ]] || echo "5 is not bigger than 10"

-lt

-lt

[[ 8 -lt 9 ]] && echo "8 is less than 9"

-ge

-ge

[[ 3 -ge 3 ]] && echo "3 is greater than or equal to 3"

-le

-le

[[ 3 -le 8 ]] && echo "3 is less than or equal to 8"

-eq

-eq

[[ 5 -eq 05 ]] && echo "5 equals 05"

-ne

-ne

[[ 6 -ne 20 ]] && echo "6 is not equal to 20"

conditional evaluation

&&

-a (**)

[[ -n $var && -f $var ]] && echo "$var is a file"

||

-o (**)

[[ -b $var || -c $var ]] && echo "$var is a device"

expression grouping

(...)

\( ... \) (**)

[[ $var = img* && ($var = *.png || $var = *.jpg) ]] &&
echo "$var starts with img and ends with .jpg or .png"

Pattern matching

= (or ==)

(not available)

[[ $name = a* ]] || echo "name does not start with an 'a': $name"

RegularExpression matching

=~

(not available)

[[ $(date) =~ ^Fri\ ...\ 13 ]] && echo "It's Friday the 13th!"

(*) This is an extension to the POSIX standard; some shells may have it, and some may not.

(**) The -a and -o operators, and ( ... ) grouping, are defined in POSIX but only for strictly limited cases. Use of these operators is discouraged; you should use multiple [ commands instead:

  • if [ "$a" = a ] && [ "$b" = b ]; then ...

  • if { [ "$a" = a ] || [ "$b" = b ] ; } && [ "$c" = c ]; then ...

Special primitives that [[ is defined to have, but [ may be lacking (depending on the implementation):

Description

Primitive

Example

entry (file or directory) exists

-e

[[ -e $config ]] && echo "config file exists: $config"

file is newer/older than other file

-nt / -ot

[[ $file0 -nt $file1 ]] && echo "$file0 is newer than $file1"

two files are the same

-ef

[[ $input -ef $output ]] && { echo "will not overwrite input file: $input"; exit 1; } 

negation

!

[[ ! -u $file ]] && echo "$file is not a setuid file"

But there are more subtle differences.

  • No WordSplitting or glob expansion will be done for [[ (and therefore many arguments need not be quoted):

     file="file name"
     [[ -f $file ]] && echo "$file is a file"

    will work even though $file is not quoted and contains whitespace. With [ the variable needs to be quoted:

     file="file name"
     [ -f "$file" ] && echo "$file is a file"

    This makes [[ easier to use and less error-prone.

  • Parentheses in [[ do not need to be escaped:

     [[ -f $file1 && ( -d $dir1 || -d $dir2) ]]
     [ -f "$file1" -a \( -d "$dir1" -o -d "$dir2" \) ]
  • As of bash 4.1, string comparisons using < or > respect the current locale when done in [[, but not in [ or test. In fact, [ and test have never used locale collating order even though past man pages said they did. Bash versions prior to 4.1 do not use locale collating order for [[ either.

As a rule of thumb, [[ is used for strings and files. If you want to compare numbers, use an ArithmeticExpression, e.g.

# Bash
i=0
while ((i<10)); do ...

When should the new test command [[ be used, and when the old one [? If portability to the BourneShell is a concern, the old syntax should be used. If on the other hand the script requires BASH or KornShell, the new syntax is much more flexible.

See the Tests and Conditionals chapter in the BashGuide.

1. Theory

The theory behind all of this is that [ is a simple command, whereas [[ is a compound command. [ receives its arguments as any other command would, but most compound commands introduce a special parsing context which is performed before any other processing. Typically this step looks for special reserved words or control operators specific to each compound command which split it into parts or affect control-flow. The Bash test expression's logical and/or operators can short-circuit because they are special in this way (as are e.g. ;;, elif, and else). Contrast with ArithmeticExpression, where all expansions are performed left-to-right in the usual way, with the resulting string being subject to interpretation as arithmetic.

  • The arithmetic compound command has no special operators. It has only one evaluation context - a single arithmetic expression. Arithmetic expressions have operators too, some of which affect control flow during the arithmetic evaluation step (which happens last).
        # Bash
        (( 1 + 1 == 2 ? 1 : $(echo "This doesn't do what you think..." >&2; echo 1) ))
  • Test expressions on the other hand do have "operators" as part of their syntax, which lie on the other end of the spectrum (evaluated first).
        # Bash
        [[ '1 + 1' -eq 2 && $(echo "...but this probably does what you expect." >&2) ]]
  • Old-style tests have no way of controlling evaluation because its arguments aren't special.
        # Bash
        [ $((1 + 1)) -eq 2 -o $(echo 'No short-circuit' >&2) ]
  • Different error handling is made possible by searching for special compound command tokens before performing expansions. [[ can detect the presence of expansions that don't result in a word yet still throw an error if none are specified. Ordinary commands can't.

        # Bash 
        ( set -- $(echo 'Unquoted null expansions don't result in "null" parameters.' >&2); echo $# )
        [[ -z $(:) ]] && echo '-z was supplied an arg and evaluated empty.'
        [ -z ] && echo '-z wasn't supplied an arg, and no errors are reported. There's no possible way Bash could enforce specifying an argument here.'
        [[ -z ]] # This will cause an error that ordinary commands can't detect.
  • For the very same reason, because ['s operators are just "arguments", unlike [[, you can specify operators as parameters to an ordinary test command. This might be seen as a limitation of [[, but the downsides outweigh the good almost always.

        # ksh93
    
        args=('0' '-gt' '1')
     
        (( $(print '0 > 1') )) # Valid command, Exit status is 1 as expected.
        [ "${args[@]}" ]       # Also exit 1. 
        [[ ${args[@]} ]]       # Valid command, but is misleading. Exit status 0. set -x reveals the resulting command is [[ -n '0 -gt 1 ]]
  • Do keep in mind which operators belong to which shell constructs. Order of expansions can cause surprising results especially when mixing and nesting different evaluation contexts!
        # ksh93
        typeset -i x=0
        
        ( print "$(( ++x, ${ x+=1; print $x >&2;}1, x ))"      ) # Prints 1, 2
        ( print "$(( $((++x)), ${ x+=1; print $x >&2;}1, x ))" ) # Prints 2, 2 - because expansions are performed first.


CategoryShell

How can I redirect the output of 'time' to a variable or file?

Bash's time keyword uses special trickery, so that you can do things like

   time find ... | xargs ...

and get the execution time of the entire pipeline, rather than just the simple command at the start of the pipe. (This is different from the behavior of the external command time(1), for obvious reasons.)

Because of this, people who want to redirect time's output often encounter difficulty figuring out where all the file descriptors are going. It's not as hard as most people think, though -- the trick is to call time in a SubShell or block, and then capture stderr of the subshell or block (which will contain time's results). If you need to redirect the actual command's stdout or stderr, you do that inside the subshell/block. For example:

  • File redirection:
       bash -c "time ls" 2>time.output      # Explicit, but inefficient.
       ( time ls ) 2>time.output            # Slightly more efficient.
       { time ls; } 2>time.output           # Most efficient.
    
       # The general case:
       { time some command >stdout 2>stderr; } 2>time.output
  • CommandSubstitution:

       foo=$( bash -c "time ls" 2>&1 )       # Captures *everything*.
       foo=$( { time ls; } 2>&1 )            # More efficient version.
    
       # Keep stdout unmolested.
       exec 3>&1
       foo=$( { time bar 1>&3; } 2>&1 )      # Captures stderr and time.
       exec 3>&-
    
       # Keep both stdout and stderr unmolested.
       exec 3>&1 4>&2
       foo=$( { time bar 1>&3 2>&4; } 2>&1 )  # Captures time only.
       exec 3>&- 4>&-
    
       # same thing without exec
       { foo=$( { time bar 1>&3- 2>&4-; } 2>&1 ); } 3>&1 4>&2

A similar construct can be used to capture "core dump" messages, which are actually printed by the shell that launched a program, not by the program that just dumped core:

   ./coredump >log 2>&1           # Fails to capture the message
   { ./coredump; } >log 2>&1      # Captures the message


CategoryShell

How can I find a process ID for a process given its name?

Usually a process is referred to using its process ID (PID), and the ps(1) command can display the information for any process given its process ID, e.g.

    $ echo $$         # my process id
    21796
    $ ps -p 21796
      PID TTY          TIME CMD
    21796 pts/5    00:00:00 ksh

But frequently the process ID for a process is not known, but only its name. Some operating systems, e.g. Solaris, BSD, and some versions of Linux have a dedicated command to search a process given its name, called pgrep(1):

    $ pgrep init
    1

Often there is an even more specialized program available to not just find the process ID of a process given its name, but also to send a signal to it:

    $ pkill myprocess

Some systems also provide pidof(1). It differs from pgrep in that multiple output process IDs are only space separated, not newline separated.

    $ pidof cron
    5392

If these programs are not available, a user can search the output of the ps command using grep.

The major problem when grepping the ps output is that grep may match its own ps entry (try: ps aux | grep init). To make matters worse, this does not happen every time; the technical name for this is a RaceCondition. To avoid this, there are several ways:

  • Using grep -v at the end
         ps aux | grep name | grep -v grep
    will throw away all lines containing "grep" from the output. Disadvantage: You always have the exit state of the grep -v, so you can't e.g. check if a specific process exists.
  • Using grep -v in the middle
         ps aux | grep -v grep | grep name
    This does exactly the same, except that the exit state of "grep name" is accessible and a representation for "name is a process in ps" or "name is not a process in ps". It still has the disadvantage of starting a new process (grep -v).
  • Using [] in grep
         ps aux | grep [n]ame

    This spawns only the needed grep-process. The trick is to use the []-character class (regular expressions). To put only one character in a character group normally makes no sense at all, because [c] will always match a "c". In this case, it's the same. grep [n]ame searches for "name". But as grep's own process list entry is what you executed ("grep [n]ame") and not "grep name", it will not match itself.

1. greycat rant: daemon management

All the stuff above is OK if you're at an interactive shell prompt, but it should not be used in a script. It's too unreliable.

Most of the time when someone asks a question like this, it's because they want to manage a long-running daemon using primitive shell scripting techniques. Common variants are "How can I get the PID of my foobard process.... so I can start one if it's not already running" or "How can I get the PID of my foobard process... because I want to prevent the foobard script from running if foobard is already active." Both of these questions will lead to seriously flawed production systems.

If what you really want is to restart your daemon whenever it dies, just do this:

while true; do
   mydaemon --in-the-foreground
done

where --in-the-foreground is whatever switch, if any, you must give to the daemon to PREVENT IT from automatically backgrounding itself. (Often, -d does this and has the additional benefit of running the daemon with increased verbosity.) Self-daemonizing programs may or may not be the target of a future greycat rant....

If that's too simplistic, look into daemontools or runit, which are programs for managing services.

If what you really want is to prevent multiple instances of your program from running, then the only sure way to do that is by using a lock. For details on doing this, see ProcessManagement or FAQ 45.

ProcessManagement also covers topics like "I want to divide my batch job into 5 'threads' and run them all in parallel." Please read it.


CategoryShell

Can I do a spinner in Bash?

Sure!

i=1
sp="/-\|"
echo -n ' '
while true
do
    printf "\b${sp:i++%${#sp}:1}"
done

Each time the loop iterates, it displays the next character in the sp string, wrapping around as it reaches the end. (i is the position of the current character to display and ${#sp} is the length of the sp string).

The \b string is replaced by a 'backspace' character. Alternatively, you could play with \r to go back to the beginning of the line.

If you want it to slow down, put a sleep command inside the loop (after the printf).

If you already have a loop which does a lot of work, you can call the following function at the beginning of each iteration to update the spinner:

sp="/-\|"
sc=0
spin() {
   printf "\b${sp:sc++:1}"
   ((sc==${#sp})) && sc=0
}
endspin() {
   printf "\r%s\n" "$@"
}

until work_done; do
   spin
   some_work ...
done
endspin

A similar technique can be used to build progress bars.


CategoryShell

How can I handle command-line arguments (options) to my script easily?

Well, that depends a great deal on what you want to do with them. There are several approaches, each with its strengths and weaknesses.

1. Manual loop

This approach handles any arbitrary set of options, because you're writing the parser yourself. For 90% of programs, this is the simplest approach (because you rarely need fancy stuff).

This example will handle a combination of short and long options. Notice how both "--file" and "--file=FILE" are handled.

   1 #!/bin/sh
   2 # (POSIX shell syntax)
   3 
   4 # Reset all variables that might be set
   5 file=""
   6 verbose=0
   7 
   8 while :
   9 do
  10     case $1 in
  11         -h | --help | -\?)
  12             #  Call your Help() or usage() function here.
  13             exit 0      # This not an error, User asked help. Don't do "exit 1"
  14             ;;
  15         -f | --file)
  16             file=$2     # You might want to check if you really got FILE
  17             shift 2
  18             ;;
  19         --file=*)
  20             file=${1#*=}        # Delete everything up till "="
  21             shift
  22             ;;
  23         -v | --verbose)
  24             # Each instance of -v adds 1 to verbosity
  25             verbose=$((verbose+1))
  26             shift
  27             ;;
  28         --) # End of all options
  29             shift
  30             break
  31             ;;
  32         -*)
  33             echo "WARN: Unknown option (ignored): $1" >&2
  34             shift
  35             ;;
  36         *)  # no more options. Stop while loop
  37             break
  38             ;;
  39     esac
  40 done
  41 
  42 # Suppose some options are required. Check that we got them.
  43 
  44 if [ ! "$file" ]; then
  45     echo "ERROR: option '--file FILE' not given. See --help" >&2
  46     exit 1
  47 fi
  48 
  49 # Rest of the program here.
  50 # If there are input files (for example) that follow the options, they
  51 # will remain in the "$@" positional parameters.
  52 

This parser does not handle separate options concatenated together (like -xvf being understood as -x -v -f). This could be added with effort, but this is left as an exercise for the reader.

Some Bash programmers like to write this at the beginning of their scripts to guard against unused variables:

    set -u     # or, set -o nounset

The use of this breaks the loop above, as "$1" may not be set upon entering the loop. There are two solutions to this issue:

  1. Stop using -u

  2. Replace case $1 in with case ${1+$1} in (as well as bandaging all the other code that set -u breaks).

2. getopts

Never use getopt(1). getopt cannot handle empty arguments strings, or arguments with embedded whitespace. Please forget that it ever existed.

The POSIX shell (and others) offer getopts which is safe to use instead. Here is a simplistic getopts example:

   1 #!/bin/sh
   2 
   3 # A POSIX variable
   4 OPTIND=1         # Reset in case getopts has been used previously in the shell.
   5 
   6 # Initialize our own variables:
   7 output_file=""
   8 verbose=0
   9 
  10 while getopts "h?vf:" opt; do
  11     case "$opt" in
  12         h|\?)
  13             show_help
  14             exit 0
  15             ;;
  16         v)  verbose=1
  17             ;;
  18         f)  output_file=$OPTARG
  19             ;;
  20     esac
  21 done
  22 
  23 shift $((OPTIND-1))
  24 
  25 [ "$1" = "--" ] && shift
  26 
  27 echo "verbose=$verbose, output_file='$output_file', Leftovers: $@"
  28 
  29 # End of file
  30 

The disadvantage of getopts is that it can only handle short options (-h) without trickery. It handles -vf filename in the expected Unix way, automatically. getopts is a good candidate because it is portable and e.g. also works in dash.

There is a getopts tutorial which explains what all of the syntax and variables mean. In bash, there is also help getopts, which might be informative.

There is also still the disadvantage that options are coded in at least 2, probably 3 places - in the call to getopts, in the case statement that processes them and presumably in the help message that you are going to get around to writing one of these days. This is a classic opportunity for errors to creep in as the code is written and maintained - often not discovered till much, much later. This can be avoided by using callback functions, but this approach kind of defeats the purpose of using getopts at all.

Here is an example which claims to parse long options with getopts. The basic idea is quite simple: just put "-:" into the optstring. This trick requires a shell which permits the option-argument (i.e. the filename in "-f filename") to be concatenated to the option (as in "-ffilename"). The POSIX standard says there must be a space between them; bash and dash permit the "-ffilename" variant, but one should not rely on this leniency if attempting to write a portable script.

   1 #!/bin/bash
   2 # Uses bash extensions.  Not portable as written.
   3 
   4 optspec=":h-:"
   5 
   6 while getopts "$optspec" optchar
   7 do
   8     case "${optchar}" in
   9         -)
  10             case "${OPTARG}" in
  11               loglevel)
  12                   eval val="\$${OPTIND}"; OPTIND=$(( $OPTIND + 1 ))
  13                   echo "Parsing option: '--${OPTARG}', value: '${val}'" >&2
  14                   ;;
  15               loglevel=*)
  16                   val=${OPTARG#*=}
  17                   opt=${OPTARG%=$val}
  18                   echo "Parsing option: '--${opt}', value: '${val}'" >&2
  19                   ;;
  20             esac
  21             ;;
  22         h)
  23             echo "usage: $0 [--loglevel[=]<value>]" >&2
  24             exit 2
  25             ;;
  26     esac
  27 done
  28 
  29 # End of file
  30 

In practice, this example is so obfuscated that it may be preferable to add concatenated option support (like -vf filename) to a manual parsing loop instead, if that was the only reason for using getopts.

Here's an improved and more generalized version of above attempt to add support for long options when using getopts:

   1 #!/bin/bash
   2 # Uses bash extensions.  Not portable as written.
   3 
   4 declare -A longoptspec
   5 longoptspec=( [loglevel]=1 ) #use associative array to declare how many arguments a long option expects, in this case we declare that loglevel expects/has one argument, long options that aren't listed in this way will have zero arguments by default
   6 optspec=":h-:"
   7 while getopts "$optspec" opt; do
   8 while true; do
   9     case "${opt}" in
  10         -) #OPTARG is name-of-long-option or name-of-long-option=value
  11             if [[ "${OPTARG}" =~ .*=.* ]] #with this --key=value format only one argument is possible
  12             then
  13                 opt=${OPTARG/=*/}
  14                 OPTARG=${OPTARG#*=}
  15                 ((OPTIND--))
  16             else #with this --key value1 value2 format multiple arguments are possible
  17                 opt="$OPTARG"
  18                 OPTARG=(${@:OPTIND:$((longoptspec[$opt]))})
  19             fi
  20             ((OPTIND+=longoptspec[$opt]))
  21             continue #now that opt/OPTARG are set we can process them as if getopts would've given us long options
  22             ;;
  23         loglevel)
  24           loglevel=$OPTARG
  25             ;;
  26         h|help)
  27             echo "usage: $0 [--loglevel[=]<value>]" >&2
  28             exit 2
  29             ;;
  30     esac
  31 break; done
  32 done
  33 
  34 # End of file
  35 

With this version you can have long and short options side by side and you shouldn't need to modify the code from line 10 to 22. This solution can also handle multiple arguments for long options, just use ${OPTARG} or ${OPTARG[0]} for the first argument, ${OPTARG[1]} for the second argument, ${OPTARG[2]} for the third argument and so on. It has the same disadvantage of its predecessor in not being portable and specific to bash.

3. Silly repeated brute-force scanning

Another approach is to check options with if statements "on demand". A function like this one may be useful:

   1 #!/bin/bash
   2 
   3 HaveOpt ()
   4 {
   5     local needle=$1
   6     shift
   7 
   8     while [[ $1 == -* ]]
   9     do
  10         # By convention, "--" means end of options.
  11         case "$1" in
  12             --)      return 1 ;;
  13             $needle) return 0 ;;
  14         esac
  15 
  16         shift
  17     done
  18 
  19     return 1
  20 }
  21 
  22 HaveOpt --quick "$@" && echo "Option quick is set"
  23 
  24 # End of file
  25 

and it will work if script is run as:

  • YES: ./script --quick
  • YES: ./script -other --quick

but will stop on first argument with no "-" in front (or on --):

  • NO: ./script -bar foo --quick
  • NO: ./script -bar -- --quick

Of course, this approach (iterating over the argument list every time you want to check for one) is far less efficient than just iterating once and setting flag variables.

It also spreads the options throughout the program. The literal option --quick may appear a hundred lines down inside the main body of the program, nowhere near any other option name. This is a nightmare for maintenance.

4. Complex nonstandard add-on utilities

bhepple suggests the use of process-getopt (GPL licensed) and offers this example code:

PROG=$(basename $0)
VERSION='1.2'
USAGE="A tiny example using process-getopt(1)"

# call process-getopt functions to define some options:
source process-getopt

SLOT=""
SLOT_func()   { [ "${1:-""}" ] && SLOT="yes"; }      # callback for SLOT option
add_opt SLOT "boolean option" s "" slot

TOKEN=""
TOKEN_func()  { [ "${1:-""}" ] && TOKEN="$2"; }      # callback for TOKEN option
add_opt TOKEN "this option takes a value" t n token number

add_std_opts     # define the standard options --help etc:

TEMP=$(call_getopt "$@") || exit 1
eval set -- "$TEMP" # just as with getopt(1)

# remove the options from the command line
process_opts "$@" || shift "$?"

echo "SLOT=$SLOT"
echo "TOKEN=$TOKEN"
echo "args=$@"

Here, all information about each option is defined in one place making for much easier authoring and maintenance. A lot of the dirty work is handled automatically and standards are obeyed as in getopt(1) - because it calls getopt for you.

  • Actually, what the author forgot to say was that it's actually using getopts semantics, rather than getopt. I ran this test:

     ~/process-getopt-1.6$ set -- one 'rm -rf /' 'foo;bar' "'"
     ~/process-getopt-1.6$ call_getopt "$@"
      -- 'rm -rf /' 'foo;bar' ''\'''
  • It appears to be intelligent enough to handle null options, whitespace-containing options, and single-quote-containing options in a manner that makes the eval not blow up in your face. But this is not an endorsement of the process-getopt software overall; I don't know it well enough. -GreyCat

It's written and tested on Linux where getopt(1) supports long options. For portability, it tests the local getopt(1) at runtime and if it finds an non-GNU one (ie one that does not return 4 for getopt --test) it only processes short options. It does not use the bash builtin getopts(1) command. -bhepple


CategoryShell

How can I get all lines that are: in both of two files (set intersection) or in only one of two files (set subtraction).

Use the comm(1) command:

# Bash
# Intersection of file1 and file2
# (i.e., only the lines that appear in both files)
comm -12 <(sort file1) <(sort file2)

# Subtraction of file1 from file2
# (i.e., only the lines unique to file2)
comm -13 <(sort file1) <(sort file2)

Read the comm man page for details. Those are process substitutions you see up there.

If for some reason you lack the core comm program, you can use these other methods. (Actually, you really should NOT use any of these. They were written by people who didn't know about comm yet. But people love slow, arcane alternatives!)

  1. An amazingly simple and fast implementation, that took just 20 seconds to match a 30k line file against a 400k line file for me.
      # intersection of file1 and file2
      grep -xF -f file1 file2
    
      # subtraction of file1 from file2
      grep -vxF -f file1 file2
    • It has grep read one of the sets as a pattern list from a file (-f), and interpret the patterns as plain strings not regexps (-F), matching only whole lines (-x).
    • Note that the file specified with -f will be loaded into memory, so it doesn't scale for very large files.
    • It should work with any POSIX grep; on older systems you may need to use fgrep rather than grep -F.

  2. An implementation using sort and uniq:
      # intersection of file1 and file2
      sort file1 file2 | uniq -d  (Assuming each of file1 or file2 does not have repeated content)
    
      # file1-file2 (Subtraction)
      sort file1 file2 file2 | uniq -u
    
      # same way for file2 - file1, change last file2 to file1
      sort file1 file2 file1 | uniq -u
  3. Another implementation of subtraction:
      sort file1 file1 file2 | uniq -c |
      awk '{ if ($1 == 2) { $1 = ""; print; } }'
    • This may introduce an extra space at the start of the line; if that's a problem, just strip it away.
    • Also, this approach assumes that neither file1 nor file2 has any duplicates in it.
    • Finally, it sorts the output for you. If that's a problem, then you'll have to abandon this approach altogether. Perhaps you could use awk's associative arrays (or perl's hashes or tcl's arrays) instead.
  4. These are subtraction and intersection with awk, regardless of whether the input files are sorted or contain duplicates:
      # prints lines only in file1 but not in file2. Reverse the arguments to get the other way round
      awk 'NR==FNR{a[$0];next} !($0 in a)' file2 file1
    
      # prints lines that are in both files; order of arguments is not important
      awk 'NR==FNR{a[$0];next} $0 in a' file1 file2

    For an explanation of how these work, see http://awk.freeshell.org/ComparingTwoFiles.

See also: http://www.pixelbeat.org/cmdline.html#sets


CategoryShell

How can I print text in various colors?

Do not hard-code ANSI color escape sequences in your program! The tput command lets you interact with the terminal database in a sane way:

  # Bourne
  tput setaf 1; echo this is red
  tput setaf 2; echo this is green
  tput bold; echo "boldface (and still green)"
  tput sgr0; echo back to normal

Cygwin users: you need to install the ncurses package to get tput (see: Where did "tput" go in 1.7?)

tput reads the terminfo database which contains all the escape codes necessary for interacting with your terminal, as defined by the $TERM variable. For more details, see the terminfo(5) man page.

tput sgr0 resets the colors to their default settings. This also turns off boldface (tput bold), underline, etc.

If you want fancy colors in your prompt, consider using something manageable:

  # Bash
  red=$(tput setaf 1)
  green=$(tput setaf 2)
  blue=$(tput setaf 4)
  reset=$(tput sgr0)
  PS1='\[$red\]\u\[$reset\]@\[$green\]\h\[$reset\]:\[$blue\]\w\[$reset\]\$ '

Note that we do not hard-code ANSI color escape sequences. Instead, we store the output of the tput command into variables, which are then used when $PS1 is expanded. Storing the values means we don't have to fork a tput process multiple times every time the prompt is displayed; tput is only invoked 4 times during shell startup. The \[ and \] symbols allow bash to understand which parts of the prompt cause no cursor movement; without them, lines will wrap incorrectly.

See also http://wiki.bash-hackers.org/scripting/terminalcodes for an overview.

1. Discussion

This will be contentious, but I'm going to disagree and recommend you use hard-coded ANSI escape sequences because terminfo databases in the real world are too often broken.

tput setaf literally means "Set ANSI foreground" and shouldn't have any difference with a hard-coded ANSI escape sequence, except that it will actually work with broken terminfo databases so your colors will look correct in a VT with terminal type linux-16color or any terminal type so long as it really is a terminal capable of 16 ANSI colors.

So do consider setting those variables to hard-coded ANSI sequences such as:

  # Bash
  white=$'\e[0;37m'
  • You assume the entire world of terminals that you will ever use always conforms to one single set of escape sequences. This is a very poor assumption. Maybe I'm showing my age, but in my first job after college, in 1993-1994, I worked with a wide variety of physical terminals (IBM 3151, Wyse 30, NCR something or other, etc.) all in the same work place. They all had different key mappings, different escape sequences, the works. If I were to hard-code a terminal escape sequence as you propose it would only work on ONE of those terminals, and then if I had to login from someone else's office, or from a server console, I'd be screwed. So, for personal use, if this makes you happy, I can't stop you. But the notion of writing a script that uses hard-coded escape sequences and then DISTRIBUTING that for other people should be discarded immediately. - GreyCat

    • I said it would be contentious, but there is an alternative view. A large number of people today will use Linux on their servers and their desktops and their profiles follow them around. The terminfo for linux-16color is broken. By doing it the "right" way, they will find their colors do not work correctly in a virtual terminal on one of the console tty's. Doing it the "wrong" way will result only in light red becoming bold if they use the real xterm or a close derivitve. If terminfo can't get it right for something as common as linux-16color, it's hard to recommend relying on it. People should be aware that it doesn't work correctly, try it yourself, go through the first 16 colours on A Linux VT with linux-16color. I know ANSI only specified names not hue's but setaf 7 is obviously not supposed to result in black text seeing as it is named white. I'd place money on a lot more people using Linux for their servers than any other UNIX based OS and if they are using another UNIX-based or true UNIX they are probably aware of the nuances. A Linux newbie would be very surprised to find after following the "right way" her colors did not work properly on a VT. Of course the correct thing to do is to fix terminfo, but that isn't in my power, although I have reported the bug for linux-16color in particular, how many other bugs are there in it? The only completely accurate thing to do is to hard-code the sequences for all the terminals you will encounter yourself, which is what terminfo is supposed to avoid the necessity of doing. However, it is buggy in at least this case (and a very common case), so relying on it to do it properly is also suspect. I will add here I have much respect for Greycat, and he is a very knowledgeable expert in many areas of IT; I fully admit I do not have the same depth of knowledge as he does, but will YOU ever be working on a Wyse 30? To be completely clear, I'm suggesting that you should consider hard-coded colors for your own profile and uses, if you are intending to write a completely portable script for others to use on foreign systems then you should rely on terminfo/termcap even if it is buggy.

  • I've never heard of linux-16colors before. It's not an installed terminfo entry in Debian, at least not by default. If your vendor is shipping broken terminfo databases, file a bug report. Meanwhile, find a system where the entry you need is not broken, and copy it to your broken system(s) -- or write it yourself. That's what the rest of the world has always done. It's where the terminfo entries came from in the first place. Someone had to write them all.
  • http://wooledge.org/~greg/linux-console-colors.png

  • -- GreyCat


CategoryShell

How do Unix file permissions work?

See Permissions.


CategoryShell

What are all the dot-files that bash reads?

See DotFiles.


CategoryShell

How do I use dialog to get input from the user?

Here is an example:

  # POSIX
  foo=$(dialog --inputbox "text goes here" 8 40 2>&1 >/dev/tty)
  echo "The user typed '$foo'"

The redirection here is a bit tricky.

  1. The foo=$(command) is set up first, so the standard output of the command is being captured by bash.

  2. Inside the command, the 2>&1 causes standard error to be sent to where standard out is going -- in other words, stderr will now be captured.

  3. >/dev/tty sends standard output to the terminal, so the dialog box will be seen by the user. Standard error will still be captured, however.

Another common dialog(1)-related question is how to dynamically generate a dialog command that has items which must be quoted (either because they're empty strings, or because they contain internal white space). One can use eval for that purpose, but the cleanest way to achieve this goal is to use an array.

  # Bash
  unset m; i=0
  words=(apple banana cherry "dog droppings")
  for w in "${words[@]}"; do
    m[i++]=$w; m[i++]=""
  done
  dialog --menu "Which one?" 12 70 9 "${m[@]}"

In this example, the while loop that populates the m array could have been reading from a pipeline, a file, etc.

Recall that the construction "${m[@]}" expands to the entire contents of an array, but with each element implicitly quoted. It's analogous to the "$@" construct for handling positional parameters. For more details, see FAQ #50.

Newer versions of bash have a slightly prettier syntax for appending elements to an array:

  # Bash 3.1 and up
  ...
  for w in "${words[@]}"; do
    m+=("$w" "")
  done
  ...

Here's another example, using filenames:

  # Bash
  files=(*.mp3)       # These may contain spaces, apostrophes, etc.
  cmd=(dialog --menu "Select one:" 22 76 16)
  i=0 n=${#cmd[*]}
  for f in "${files[@]}"; do
      cmd[n++]=$((i++)); cmd[n++]="$f"
  done
  choice=$("${cmd[@]}" 2>&1 >/dev/tty)
  echo "Here's the file you chose:"
  ls -ld -- "${files[choice]}"

A separate but useful function of dialog is to track progress of a process that produces output. Below is an example that uses dialog to track processes writing to a log file. In the dialog window, there is a tailbox where output is stored, and a msgbox with a clickable Quit. Clicking quit will cause trap to execute, removing the tempfile, and destroying the tail process.

  # POSIX(?)
  # you cannot tail a nonexistent file, so always ensure it pre-exists!
  rm -f dialog-tail.log; echo Initialize log >> dialog-tail.log
  date >> dialog-tail.log
  tempfile=`tempfile 2>/dev/null` || tempfile=/tmp/test$$
  trap 'rm -f $tempfile; stty sane; exit 1' 1 2 3 15
  dialog --title "TAIL BOXES" \
        --begin 10 10 --tailboxbg dialog-tail.log 8 58 \
        --and-widget \
        --begin 3 10 --msgbox "Press OK " 5 30 \
        2>$tempfile &
  mypid=$!
  for i in 1 2 3;  do echo $i >> dialog-tail.log; sleep 1; done
  echo Done. >> dialog-tail.log
  wait $mypid
  rm -f $tempfile

For an example of creating a progress bar using dialog --gauge, see FAQ #44.


CategoryShell

How do I determine whether a variable contains a substring?

In BASH:

  # Bash
  if [[ $foo = *bar* ]]

The above works in virtually all versions of Bash. Bash version 3 (and up) also allows regular expressions:

  # Bash
  my_re='ab*c'
  if [[ $foo =~ $my_re ]]   # bash 3, matches abbbbcde, or ac, etc.

For more hints on string manipulations in Bash, see FAQ #100.

If you are programming in the BourneShell instead of Bash, there is a more portable (but less pretty) syntax:

  # Bourne
  case "$foo" in
    *bar*) .... ;;
  esac

case allows you to match variables against globbing-style patterns (including extended globs, if your shell offers them). If you need a portable way to match variables against regular expressions, use grep or egrep.

  # Bourne
  if echo "$foo" | grep bar >/dev/null 2>&1; then ...


CategoryShell

How can I find out if a process is still running?

The kill command is used to send signals to a running process. As a convenience function, the signal "0", which does not exist, can be used to find out if a process is still running:

# Bourne
myprog &          # Start program in the background
daemonpid=$!      # ...and save its process id

while sleep 60
do
    if kill -0 $daemonpid       # Is the process still alive?
    then
        echo >&2 "OK - process is still running"
    else
        echo >&2 "ERROR - process $daemonpid is no longer running!"
        break
    fi
done

NOTE: Anything you do that relies on PIDs to identify a process is inherently flawed. If a process dies, the meaning of its PID is UNDEFINED. Another process started afterward may take the same PID as the dead process. That would make the previous example think that the process is still alive (its PID exists!) even though it is dead and gone. It is for this reason that nobody should try to manage processes other than the parent of that process. Read ProcessManagement.

This is one of those questions that usually masks a much deeper issue. It's rare that someone wants to know whether a process is still running simply to display a red or green light to an operator.

More often, there's some ulterior motive, such as the desire to ensure that some daemon which is known to crash frequently is still running. If this is the case, the best course of action is to fix the program or its configuration so that it stops crashing. If you can't do that, then just restart it when it dies:

# POSIX
while true
do
  myprog && break
  sleep 1
done

This piece of code will restart myprog if it terminates with an exit code other than 0 (indicating something went wrong). If the exit code is 0 (successfully shut down) the loop ends. (If your process is crashing but also returning exit status 0, then adjust the code accordingly.) Note that myprog must run in the foreground. If it automatically "daemonizes" itself, you are screwed.

For a much better discussion of these issues, see ProcessManagement or FAQ #33.


CategoryShell

Why does my crontab job fail? 0 0 * * * some command > /var/log/mylog.`date +%Y%m%d`

In many versions of crontab, the percent sign (%) is treated specially, and therefore must be escaped with backslashes:

0 0 * * * some command > /var/log/mylog.`date +\%Y\%m\%d`

See your system's manual (crontab(5) or crontab(1)) for details. Note: on systems which split the crontab manual into two parts, you may have to type man 5 crontab or man -s 5 crontab to read the part you need.


CategoryShell

How do I create a progress bar? How do I see a progress indicator when copying/moving files?

The easiest way to add a progress bar to your own script is to use dialog --gauge. Here is an example, which relies heavily on BASH features:

# Bash
# Process all of the *.zip files in the current directory.
files=(*.zip)
dialog --gauge "Working..." 20 75 < <(
   n=${#files[*]}; i=0
   for f in "${files[@]}"; do
      # process "$f" in some way (for testing, "sleep 1")
      echo $((100*(++i)/n))
   done
)

Here's an explanation of what it's doing:

  • An array named files is populated with all the files we want to process.

  • dialog is invoked, and its input is redirected from a ProcessSubstitution. (A pipe could also be used here; we'd simply have to reverse the dialog command and the loop.)

  • The processing loop iterates over the array.
  • Every time a file is processed, it increments a counter (i), and writes the percent complete to stdout.

For more examples of using dialog, see FAQ #40.

A simple progress bar can also be programmed without dialog. There are lots of different approaches, depending on what kind of presentation you're looking for.

One traditional approach is the spinner which shows a whirling line segment to indicate "busy". This is not really a "progress meter" since there is no information presented about how close the program is to completion.

The next step up is presenting a numeric value without scrolling the screen. Using a carriage return to move the cursor to the beginning of the line (on a graphical terminal, not a teletype...), and not writing a newline until the very end:

i=0
while ((i < 100)); do
  printf "\r%3d%% complete" $i
  ((i += RANDOM%5+2))
  # Of course, in real life, we'd be getting i from somewhere meaningful.
  sleep 1
done
echo

Of note here is the %3d in the printf format specifier. It's important to use a fixed-width field for displaying the numbers, especially if the numbers may count downward (first displaying 10 and then 9). Of course we're counting upwards here, but that may not always be the case in general. If a fixed-width field is not desired, then printing a bunch of spaces at the end may help remove any clutter from previous lines.

If an actual "bar" is desired, rather than a number, then one may be drawn using ASCII characters:

bar="=================================================="
barlength=${#bar}
i=0
while ((i < 100)); do
  # Number of bar segments to draw.
  n=$((i*barlength / 100))
  printf "\r[%-${barlength}s]" "${bar:0:n}"
  ((i += RANDOM%5+2))
  # Of course, in real life, we'd be getting i from somewhere meaningful.
  sleep 1
done
echo

Naturally one may choose a bar of a different length, or composed of a different set of characters, e.g., you can have a colored progress bar

files=(*)
width=${COLUMNS-$(tput cols)}
rev=$(tput rev)

n=${#files[*]}
i=0
printf "$(tput setab 0)%${width}s\r"
for f in "${files[@]}"; do
   # process "$f" in some way (for testing, "sleep 1")
   printf "$rev%$((width*++i/n))s\r" " "
done
tput sgr0
echo

1. When copying/moving files

You can't get a progress indicator with cp(1), but you can either:

  • build one yourself with tools such as pv or clpbar;

  • use some other tool, e.g. vcp.

You may want to use pv(1) since it's packaged for many systems. In that case, it's convenient if you create a function or script to wrap it.

For example:

pv "$1" > "$2/${1##*/}"

This lacks error checking and support for moving files.

you can also use rsync:

rsync -avx --progress --stats "$1" "$2"

Please note that the "total" of files can change each time rsync enters a directory and finds more/less files that it expected, but at least is more info than cp. Rsync progress is good for big transfers with small files.


CategoryShell

How can I ensure that only one instance of a script is running at a time (mutual exclusion)?

We need some means of mutual exclusion. One way is to use a "lock": any number of processes can try to acquire the lock simultaneously, but only one of them will succeed.

How can we implement this using shell scripts? Some people suggest creating a lock file, and checking for its presence:

  •  # locking example -- WRONG
     lockfile=/tmp/myscript.lock
     if [ -f "$lockfile" ]
     then                      # lock is already held
         echo >&2 "cannot acquire lock, giving up: $lockfile"
         exit 0
     else                      # nobody owns the lock
         > "$lockfile"         # create the file
         #...continue script
     fi

This example does not work, because there is a time window between checking and creating the file. Assume two processes are running the code at the same time. Both check if the lockfile exists, and both get the result that it does not exist. Now both processes assume they have acquired the lock -- a disaster waiting to happen. We need an atomic check-and-create operation, and fortunately there is one: mkdir, the command to create a directory:

  •  # locking example -- CORRECT
     # Bourne
     lockdir=/tmp/myscript.lock
     if mkdir "$lockdir"
     then    # directory did not exist, but was created successfully
         echo >&2 "successfully acquired lock: $lockdir"
         # continue script
     else
         echo >&2 "cannot acquire lock, giving up on $lockdir"
         exit 0
     fi

Here, even when two processes call mkdir at the same time, only one process can succeed at most. This atomicity of check-and-create is ensured at the operating system kernel level.

Instead of using mkdir we could also have used the program to create a symbolic link, ln -s.

Note that we cannot use mkdir -p to automatically create missing path components: mkdir -p does not return an error if the directory exists already, but that's the feature we rely upon to ensure mutual exclusion.

Now let's spice up this example by automatically removing the lock when the script finishes:

  •  # POSIX (maybe Bourne?)
     lockdir=/tmp/myscript.lock
     if mkdir "$lockdir"
     then
         echo >&2 "successfully acquired lock"
    
         # Remove lockdir when the script finishes, or when it receives a signal
         trap 'rm -rf "$lockdir"' 0    # remove directory when script finishes
    
         # Optionally create temporary files in this directory, because
         # they will be removed automatically:
         tmpfile=$lockdir/filelist
    
     else
         echo >&2 "cannot acquire lock, giving up on $lockdir"
         exit 0
     fi

This example is much better. There is still the problem that a stale lock could remain when the script is terminated with a signal not caught (or signal 9, SIGKILL), or could be created by a user (either accidentally or maliciously), but it's a good step towards reliable mutual exclusion. Charles Duffy has contributed an example that may remedy the "stale lock" problem.

If you're on linux, you can also get the benefit of using flock(1). flock(1) ties a file descriptor to a lock file. There are multiple ways to use it, one possibility to solve the multiple instance problem is:

  •   exec 9>/path/to/lock/file
      if ! flock -n 9  ; then
         echo "another instance is running";
         exit 1
      fi
      # this now runs under the lock until 9 is closed (it will be closed automatically when the script ends)

flock can also be used to protect only a part of your script, see the man page for more information.

1. Discussion

I believe using if (set -C; >$lockfile); then ... is equally safe if not safer. The Bash source uses open(filename, flags|O_EXCL, mode); which should be atomic on almost all platforms (with the exception of some versions of NFS where mkdir may not be atomic either). I haven't traced the path of the flags variable, which must contain O_CREAT, nor have I looked at any other shells. I wouldn't suggest using this until someone else can backup my claims. --Andy753421

  • Using set -C does not work with ksh88. Ksh88 does not use O_EXCL, when you set noclobber (-C). --jrw32982

    Are you sure mkdir has problems with being atomic on NFS? I thought that affected only open, but I'm not really sure. -- BeJonas 2008-07-24 01:22:59

For more discussion on these issues, see ProcessManagement.


CategoryShell

This example was contributed by Charles Duffy. It has been separated from the parent page because the code has several issues that make it dubious.

  • Are we sure this code's correct? There seems to be a discrepancy between the names LOCK_DEFAULT_NAME and DEFAULT_NAME; and it checks for processes in what looks to be a race condition; and it uses the Linux-specific /proc file system and the GNU-specific egrep -o to do so.... I don't trust it. It looks overly complex and fragile. And quite non-portable. -- GreyCat

     LOCK_DEFAULT_NAME=$0
     LOCK_HOSTNAME="$(hostname -f)"
    
     ## function to take the lock if free; will fail otherwise
     function grab-lock {
       local PROGRAMNAME="${1:-$DEFAULT_NAME}"
       local PID=${2:-$$}
       (
         umask 000;
         mkdir -p "/tmp/${PROGRAMNAME}-lock"
         mkdir "/tmp/${PROGRAMNAME}-lock/held" || return 1
         mkdir "/tmp/${PROGRAMNAME}-lock/held/${LOCK_HOSTNAME}--pid-${PID}" && return 0 || return 1
       ) 2>/dev/null
       return $?
     }
    
     ## function to nicely let go of the lock
     function release-lock {
       local PROGRAMNAME="${1:-$DEFAULT_NAME}"
       local PID=${2:-$$}
       (
         rmdir "/tmp/${PROGRAMNAME}-lock/held/${LOCK_HOSTNAME}--pid-${PID}" || true
         rmdir "/tmp/${PROGRAMNAME}-lock/held" && return 0 || return 1
       ) 2>/dev/null
       return $?
     }
    
     ## function to force anyone else off of the lock
     function break-lock {
       local PROGRAMNAME="${1:-$DEFAULT_NAME}"
       (
         [ -d "/tmp/${PROGRAMNAME}-lock/held" ] || return 0
         for DIR in "/tmp/${PROGRAMNAME}-lock/held/${LOCK_HOSTNAME}--pid-"* ; do
           OTHERPID="$(echo $DIR | egrep -o '[0-9]+$')"
           [ -d /proc/${OTHERPID} ] || rmdir $DIR
         done
         rmdir /tmp/${PROGRAMNAME}-lock/held && return 0 || return 1
       ) 2>/dev/null
       return $?
     }
    
     ## function to take the lock nicely, freeing it first if needed
     function get-lock {
       break-lock "$@" && grab-lock "$@"
     }

I want to check to see whether a word is in a list (or an element is a member of a set).

If your real question was How do I check whether one of my parameters was -v? then please see FAQ #35 instead. Otherwise, read on....

First of all, let's get the terminology straight. Bash has no notion of "lists" or "sets" or any such. Bash has strings and arrays. Strings are a "list" of characters, arrays are a "list" of strings.

NOTE: In the general case, a string cannot possibly contain a list of other strings because there is no reliable way to tell where each substring begins and ends.

Given a traditional array, the only proper way to do this is to loop over all elements in your array and check them for the element you are looking for. Say what we are looking for is in bar and our list is in the array foo:

  •    # Bash
       for element in "${foo[@]}"; do
          [[ $element = $bar ]] && echo "Found $bar."
       done

If you need to perform this several times in your script, you might want to extract the logic into a function:

  •    # Bash
       isIn() {
           local pattern="$1" element
           shift
    
           for element
           do
               [[ $element = $pattern ]] && return 0
           done
    
           return 1
       }
    
       if isIn "jacob" "${names[@]}"
       then
           echo "Jacob is on the list."
       fi

Or, if you want your function to return the index at which the element was found:

  •    # Bash 3.0 or higher
       indexOf() {
           local pattern=$1
           local index list
           shift
    
           list=("$@")
           for index in "${!list[@]}"
           do
               [[ ${list[index]} = $pattern ]] && {
                   echo $index
                   return 0
               }
           done
    
           echo -1
           return 1
       }
    
       if index=$(indexOf "jacob" "${names[@]}")
       then
           echo "Jacob is the ${index}th on the list."
       else
           echo "Jacob is not on the list."
       fi

If your "list" is contained in a string, and for some half-witted reason you choose not to heed the warnings above, you can use the following code to search through "words" in a string. (The only real excuse for this would be that you're stuck in Bourne shell, which has no arrays.)

  •    # Bourne
       set -f
       for element in $foo; do
          if test x"$element" = x"$bar"; then
             echo "Found $bar."
          fi
       done
       set +f

Here, a "word" is defined as any substring that is delimited by whitespace (or more specifically, the characters currently in IFS). The set -f prevents glob expansion of the words in the list. Turning glob expansions back on (set +f) is optional.

If you're working in bash 4 or ksh93, you have access to associative arrays. These will allow you to restructure the problem -- instead of making a list of words that are allowed, you can make an associative array whose keys are the words you want to allow. Their values could be meaningful, or not -- depending on the nature of the problem.

  •    # Bash 4
       declare -A good
       for word in "goodword1" "goodword2" ...; do
         good["$word"]=1
       done
    
       # Check whether $foo is allowed:
       if ((${good[$foo]})); then ...

Here's a hack that you shouldn't use, but which is presented for the sake of completeness:

  •    # Bash
       if [[ " $foo " = *" $bar "* ]]; then
          echo "Found $bar."
       fi

(The problem here is that is assumes space can be used as a delimiter between words. Your elements might contain spaces, which would break this!)

That same hack, for Bourne shells:

  •    # Bourne
       case " $foo " in
          *" $bar "*) echo "Found $bar.";;
       esac

You can also use extended glob with printf to search for a word in an array. I haven't tested it enough, so it might break in some cases --sn18

  •    # Bash
       shopt -s extglob
       #convert array to glob
       printf -v glob '%q|' "${array[@]}"
       glob=${glob%|}
       [[ $word = @($glob) ]] && echo "Found $word"
  • It will break when an array element contains a | character. Hence, I moved it down here with the other hacks that work in a similar fashion and have a similar limitation. -- GreyCat

    • printf %q quotes a | character too, so it probably should not --sn18

GNU's grep has a \b feature which allegedly matches the edges of words. Using that, one may attempt to replicate the shorter approach used above, but it is fraught with peril:

  •    # Is 'foo' one of the positional parameters?
       egrep '\bfoo\b' <<<"$@" >/dev/null && echo yes
    
       # This is where it fails: is '-v' one of the positional parameters?
       egrep '\b-v\b' <<<"$@" >/dev/null && echo yes
       # Unfortunately, \b sees "v" as a separate word.
       # Nobody knows what the hell it's doing with the "-".
    
       # Is "someword" in the array 'array'?
       egrep '\bsomeword\b' <<<"${array[@]}"
       # Obviously, you can't use this if someword is '-v'!

Since this "feature" of GNU grep is both non-portable and poorly defined, we recommend not using it. It is simply mentioned here for the sake of completeness.

Bulk comparison

This method tries to compare the desired string to the entire contents of the array. It can potentially be very efficient, but it depends on a delimiter that must not be in the sought value or the array. Here we use $'\a', the BEL character, because it's extremely uncommon.

  •    # usage: if has "element" list of words; then ...; fi
       has() {
         local IFS=$'\a' t="$1"
         shift
         [[ $'\a'"$*"$'\a' == *$'\a'$t$'\a'* ]]
       }


CategoryShell

How can I redirect stderr to a pipe?

A pipe can only carry standard output (stdout) of a program. To pipe standard error (stderr) through it, you need to redirect stderr to the same destination as stdout. Optionally you can close stdout or redirect it to /dev/null to only get stderr. Some sample code:

# Bourne
# Assume 'myprog' is a program that writes to both stdout and stderr.

# version 1: redirect stderr to the pipe while stdout survives (both come
# mixed)
myprog 2>&1 | grep ...

# version 2: redirect stderr to the pipe without getting stdout (it's
# redirected to /dev/null)
myprog 2>&1 >/dev/null | grep ...

# same idea, this time storing stdout in a file
myprog 2>&1 >file | grep ...

Another simple example of redirection stdout and stderr:

# Bourne
{ command | stdout_reader; } 2>&1 | stderr_reader

For further explanation of how redirections and pipes interact, see FAQ #55.

This has an obvious application with programs like dialog, which draws (using ncurses) windows onto the screen (stdout), and returns results on stderr. One way to deal with this would be to redirect stderr to a temporary file. But this is not necessary -- see FAQ #40 for examples of using dialog specifically!

In the examples above (as well as FAQ #40), we either discarded stdout altogether, or sent it to a known device (/dev/tty for the user's terminal). One may also pipe stderr only but keep stdout intact (without a priori knowledge of where the script's output is going). This is a bit trickier.

# Bourne
# Redirect stderr to a pipe, keeping stdout unaffected.

exec 3>&1                       # Save current "value" of stdout.
myprog 2>&1 >&3 | grep ...      # Send stdout to FD 3.
exec 3>&-                       # Now close it for the remainder of the script.

# Thanks to http://www.tldp.org/LDP/abs/html/io-redirection.html

The same can be done without exec:

# POSIX
$ myfunc () { echo "I'm stdout"; echo "I'm stderr" >&2; }
$ { myfunc 2>&1 1>&3 3>&- | cat  > stderr.file 3>&-; } 3>&1
I'm stdout
$ cat stderr.file
I'm stderr

The fd 3 is closed (3>&-) so that the commands do not inherit it. Note bash allows to duplicate and close in one redirection: 1>&3- You can check the difference on linux trying the following:

# Bash
{ bash <<< 'lsof -a -p $$ -d1,2,3'   ;} 3>&1
{ bash <<< 'lsof -a -p $$ -d1,2,3' 3>&-  ;} 3>&1

To show a dialog one-liner:

# Bourne
exec 3>&1
dialog --menu Title 0 0 0 FirstItem FirstDescription 2>&1 >&3 | sed 's/First/Only/'
exec 3>&-

This will have the dialog window working properly, yet it will be the output of dialog (returned to stderr) being altered by the sed.

A similar effect can be achieved with ProcessSubstitution:

# Bash
perl -e 'print "stdout\n"; warn "stderr\n"' 2> >(tr '[:lower:]' '[:upper:]')

This will pipe standard error through the tr command.

See this redirection tutorial (with an example that redirects stdout to one pipe and stderr to another pipe).


CategoryShell

Eval command and security issues

The eval command is extremely powerful and extremely easy to abuse.

It causes your code to be parsed twice instead of once; this means that, for example, if your code has variable references in it, the shell's parser will evaluate the contents of that variable. If the variable contains a shell command, the shell might run that command, whether you wanted it to or not. This can lead to unexpected results, especially when variables can be read from untrusted sources (like users or user-created files).

1. Examples of bad use of eval

"eval" is a common misspelling of "evil". The section of this FAQ dealing with spaces in file names used to include the following quote "helpful tool (which is probably not as safe as the \0 technique)", end quote.

    Syntax : nasty_find_all <path> <command> [maxdepth]

    # This code is evil and must never be used!
    export IFS=" "
    [ -z "$3" ] && set -- "$1" "$2" 1
    FILES=`find "$1" -maxdepth "$3" -type f -printf "\"%p\" "`
    #warning, evilness
    eval FILES=($FILES)
    for ((I=0; I < ${#FILES[@]}; I++))
    do
        eval "$2 \"${FILES[I]}\""
    done
    unset IFS

This script was supposed to recursively search for files and run a user-specified command on them, even if they had newlines and/or spaces in their names. The author thought that find -print0 | xargs -0 was unsuitable for some purposes such as multiple commands. It was followed by an instructional description of all the lines involved, which we'll skip.

To its defense, it worked:

$ ls -lR
.:
total 8
drwxr-xr-x  2 vidar users 4096 Nov 12 21:51 dir with spaces
-rwxr-xr-x  1 vidar users  248 Nov 12 21:50 nasty_find_all

./dir with spaces:
total 0
-rw-r--r--  1 vidar users 0 Nov 12 21:51 file?with newlines
$ ./nasty_find_all . echo 3
./nasty_find_all
./dir with spaces/file
with newlines
$

But consider this:

$ touch "\"); ls -l $'\x2F'; #"

You just created a file called  "); ls -l $'\x2F'; #

Now FILES will contain  ""); ls -l $'\x2F'; #. When we do eval FILES=($FILES), it becomes

FILES=(""); ls -l $'\x2F'; #"

Which becomes the two statements  FILES=("");  and  ls -l / . Congratulations, you just allowed execution of arbitrary commands.

$ touch "\"); ls -l $'\x2F'; #"
$ ./nasty_find_all . echo 3
total 1052
-rw-r--r--   1 root root 1018530 Apr  6  2005 System.map
drwxr-xr-x   2 root root    4096 Oct 26 22:05 bin
drwxr-xr-x   3 root root    4096 Oct 26 22:05 boot
drwxr-xr-x  17 root root   29500 Nov 12 20:52 dev
drwxr-xr-x  68 root root    4096 Nov 12 20:54 etc
drwxr-xr-x   9 root root    4096 Oct  5 11:37 home
drwxr-xr-x  10 root root    4096 Oct 26 22:05 lib
drwxr-xr-x   2 root root    4096 Nov  4 00:14 lost+found
drwxr-xr-x   6 root root    4096 Nov  4 18:22 mnt
drwxr-xr-x  11 root root    4096 Oct 26 22:05 opt
dr-xr-xr-x  82 root root       0 Nov  4 00:41 proc
drwx------  26 root root    4096 Oct 26 22:05 root
drwxr-xr-x   2 root root    4096 Nov  4 00:34 sbin
drwxr-xr-x   9 root root       0 Nov  4 00:41 sys
drwxrwxrwt   8 root root    4096 Nov 12 21:55 tmp
drwxr-xr-x  15 root root    4096 Oct 26 22:05 usr
drwxr-xr-x  13 root root    4096 Oct 26 22:05 var
./nasty_find_all
./dir with spaces/file
with newlines
./
$

It doesn't take much imagination to replace  ls -l  with  rm -rf  or worse.

One might think these circumstances are obscure, but one should not be tricked by this. All it takes is one malicious user, or perhaps more likely, a benign user who left the terminal unlocked when going to the bathroom, or wrote a funny PHP uploading script that doesn't sanity check file names, or who made the same mistake as oneself in allowing arbitrary code execution (now instead of being limited to the www-user, an attacker can use nasty_find_all to traverse chroot jails and/or gain additional privileges), or uses an IRC or IM client that's too liberal in the filenames it accepts for file transfers or conversation logs, etc.

2. Examples of good use of eval

The most common correct use of eval is reading variables from the output of a program which is specifically designed to be used this way. For example,

# On older systems, one must run this after resizing a window:
eval `resize`

# Less primitive: get a passphrase for an SSH private key.
# This is typically executed from a .xsession or .profile type of file.
# The variables produced by ssh-agent will be exported to all the processes in
# the user's session, so that an eventual ssh will inherit them.
eval `ssh-agent -s`

eval has other uses especially when creating variables out of the blue (indirect variable references). Here is an example of one way to parse command line options that do not take parameters:

# POSIX
#
# Create option variables dynamically. Try call:
#
#    sh -x example.sh --verbose --test --debug

for i in "$@"
do
    case "$i" in
       --test|--verbose|--debug)
            shift                   # Remove option from command line
            name=${i#--}            # Delete option prefix
            eval "$name='$name'"    # make *new* variable
            ;;
    esac
done

echo "verbose: $verbose"
echo "test: $test"
echo "debug: $debug"

So, why is this version acceptable? It's acceptable because we have restricted the eval command so that it will only be executed when the input is one of a finite set of known values. Therefore, it can't ever be abused by the user to cause arbitrary command execution -- any input with funny stuff in it wouldn't match one of the three predetermined possible inputs. This variant would not be acceptable:

# Dangerous code.  Do not use this!
for i in "$@"
do
    case "$i" in
       --test*|--verbose*|--debug*)
            shift                   # Remove option from command line
            name=${i#--}            # Delete option prefix
            eval "$name='$name'"    # make *new* variable
            ;;
    esac
done

All that's changed is that we attempted to make the previous "good" example (which doesn't do very much) useful in some way, by letting it take things like --test=foo. But look at what this enables:

$ ./foo --test='; ls -l /etc/passwd;x='
-rw-r--r-- 1 root root 943 2007-03-28 12:03 /etc/passwd

Once again: by permitting the eval command to be used on unfiltered user input, we've permitted arbitrary command execution.

3. Alternatives to eval

  • Could this not be done better with declare? eg:

     for i in "$@"
     do
        case "$i" in
           --test|--verbose|--debug)
                shift                   # Remove option from command line
                name=${i#--}            # Delete option prefix
                declare $name=Yes       # set default value
                ;;
           --test=*|--verbose=*|--debug=*)
                shift
                name=${i#--}
                value=${name#*=}        # value is whatever's after first word and =
                name=${name%%=*}        # restrict name to first word only (even if there's another = in the value)
                declare $name="$value"  # make *new* variable
                ;;
        esac
     done

    Note that --name for a default, and --name=value are the required formats.

    declare does seem to have some sort of parser magic in it, much like [[ does. Here's a test I performed with bash 3.1.17:

     griffon:~$ declare foo=x;date;x=Yes
     Sun Nov  4 09:36:08 EST 2007
     
     griffon:~$ name='foo=x;date;x'
     griffon:~$ declare $name=Yes
     griffon:~$ echo $foo
     x;date;x=Yes

    It appears that, at least in bash, declare is much safer than eval.

For a list of ways to reference or to populate variables indirectly without using eval, please see FAQ #6. (This section was written before #6 was, but I've left it here as a reference.)

4. Robust eval usage

Another approach can be to encapsulate dangerous code in a function. So for example instead of doing something like this.

    eval "${ArrayName}"'="${Value}"'

Now the above example is reasonably ok, but it still has a vulnerability. Notice what happens if we do the following.

    ArrayName="echo rm -rf /tmp/dummyfolder/*; tvar"
    eval "${ArrayName}"'="${Value}"'

The way to prevent this type of security hole is to create a function that gives you a certain amount of security it its use and allows for cleaner code.

  # check_valid_var_name VariableName
  function check_valid_var_name {
    case "${1:?Missing Variable Name}" in
      [!a-zA-Z_]* | *[!a-zA-Z_0-9]* ) return 3;;
    esac
  }
  # set_variable VariableName [<Variable Value>]
  function set_variable {
    check_valid_var_name "${1:?Missing Variable Name}" || return $?
    eval "${1}"'="${2:-}"'
  }
  set_variable "laksdpaso" "dasädöas# #-c,c pos 9302 1´ " 
  set_variable "echo rm -rf /tmp/dummyfolder/*; tvar" "dasädöas# #-c,c pos 9302 1´ " 
  # return Error

Note: set_variable also has an advantage over using declare. Consider the following.

   VariableName="Name=hhh"
   declare "${VariableName}=Test Value"         # Valid code, unexpected behavior
   set_variable "${VariableName}" "Test Value"  # return Error

For reference some other examples

  # get_array_element VariableName ArrayName ArrayElement
  function get_array_element {
    check_valid_var_name "${1:?Missing Variable Name}" || return $?
    check_valid_var_name "${2:?Missing Array Name}" || return $?
    eval "${1}"'="${'"${2}"'["${3:?Missing Array Index}"]}"'
  }
  # set_array_element ArrayName ArrayElement [<Variable Value>]
  function set_array_element {
    check_valid_var_name "${1:?Missing Array Name}" || return $?
    eval "${1}"'["${2:?Missing Array Index}"]="${3:-}"'
  }
  # unset_array_element ArrayName ArrayElement
  function unset_array_element {
    unset "${1}[${2}]"
  }
  # unset_array_element VarName ArrayName
  function get_array_element_cnt {
    check_valid_var_name "${1:?Missing Variable Name}" || return $?
    check_valid_var_name "${2:?Missing Array Name}" || return $?
    eval "${1}"'="${#'"${2}"'[@]}"'
  }
  # push_element ArrayName <New Element 1> [<New Element 2> ...]
  function push_element {
    check_valid_var_name "${1:?Missing Array Name}" || return $?
    local ArrayName="${1}"
    local LastElement
    eval 'LastElement="${#'"${ArrayName}"'[@]}"'
    while shift && [ $# -gt 0 ] ; do
      eval "${ArrayName}"'["${LastElement}"]="${1}"'
      let LastElement+=1
    done
  }
  # pop_element ArrayName <Destination Variable Name 1> [<Destination Variable Name 2> ...]
  function pop_element {
    check_valid_var_name "${1:?Missing Array Name}" || return $?
    local ArrayName="${1}"
    local LastElement
    eval 'LastElement="${#'"${ArrayName}"'[@]}"'
    while shift && [[ $# -gt 0 && ${LastElement} -gt 0 ]] ; do
      let LastElement-=1
      check_valid_var_name "${1:?Missing Variable Name}" || return $?
      eval "${1}"'="${'"${ArrayName}"'["${LastElement}"]}"'
      unset "${ArrayName}[${LastElement}]" 
    done
    [[ $# -eq 0 ]] || return 8
  }
  # shift_element ArrayName [<Destination Variable Name>]
  function shift_element {
    check_valid_var_name "${1:?Missing Array Name}" || return $?
    local ArrayName="${1}"
    local CurElement=0 LastElement
    eval 'LastElement="${#'"${ArrayName}"'[@]}"'
    while shift && [[ $# -gt 0 && ${LastElement} -gt ${CurElement} ]] ; do
      check_valid_var_name "${1:?Missing Variable Name}" || return $?
      eval "${1}"'="${'"${ArrayName}"'["${CurElement}"]}"'
      let CurElement+=1
    done
    eval "${ArrayName}"'=("${'"${ArrayName}"'[@]:${CurElement}}")'
    [[ $# -eq 0 ]] || return 8
  }
  # unshift_element ArrayName <New Element 1> [<New Element 2> ...]
  function unshift_element {
    check_valid_var_name "${1:?Missing Array Name}" || return $?
    [ $# -gt 1 ] || return 0
    eval "${1}"'=("${@:2}" "${'"${1}"'[@]}" )'
  }

 # 1000 x { declare "laksdpaso=dasädöas# #-c,c pos 9302 1´ "          } took 0m0.069s
 # 1000 x { set_variable laksdpaso "dasädöas# #-c,c pos 9302 1´ "     } took 0m0.141s
 # 1000 x { get_array_element TestVar TestArray 1                        } took 0m0.199s
 # 1000 x { set_array_element TestArray 1 "dfds  edfs fdf df"            } took 0m0.174s
 # 1000 x { set_array_element TestArray 0                                } took 0m0.167s
 # 1000 x { get_array_element_cnt TestVar TestArray                      } took 0m0.171s

 # all push,pops,shifts,unshifts done with a 2000 element array 
 # 1000 x { push_element TestArray "dsf sdf ss s"                        } took 0m0.274s
 # 1000 x { pop_element TestArray TestVar                                } took 0m0.380s
 # 1000 x { unshift_element TestArray "dsf sdf ss s"                     } took 0m9.027s
 # 1000 x { shift_element TestArray TestVar                              } took 0m5.583s

Note the shift_element and unsift_element have poor performance and as such should be avoided, especially on large array. The rest have acceptable performance and I use them regularly.


CategoryShell

How can I view periodic updates/appends to a file? (ex: growing log file)

tail -f will show you the growing log file. On some systems (e.g. OpenBSD), this will automatically track a rotated log file to the new file with the same name (which is usually what you want). To get the equivalent functionality on GNU systems, use tail -F instead.

This is helpful if you need to view only the updates to the file after your last view.

# Start by setting n=1
   tail -n $n testfile; n="+$(( $(wc -l < testfile) + 1 ))"

Every invocation of this gives the update to the file from where we stopped last. If you know the line number from where you want to start, set n to that.


CategoryShell

I'm trying to put a command in a variable, but the complex cases always fail!

Some people attempt to do things like this:

    # Non-working example
    args="-s 'The subject' $address"
    mail $args < $body

This fails because of WordSplitting and because the single quotes inside the variable are literal; not syntactical. When $args is expanded, it becomes four words. 'The is the second word, and subject' is the third word.

Read Arguments to get a better understanding of how the shell figures out what the arguments in your statement are.

So, how do we do this? That all depends on what this is!

There are at least three situations in which people try to shove commands, or command arguments, into variables and then run them. Each case needs to be handled separately.

1. I'm trying to save a command so I can run it later without having to repeat it each time

If you want to put a command in a container for later use, use a function. Variables hold data, functions hold code.

    pingMe() {
        ping -q -c1 "$HOSTNAME"
    }

    [...]
    if pingMe; then ..

2. I'm constructing a command based on information that is only known at run time

The root of the issue described above is that you need a way to maintain each argument as a separate word, even if that argument contains spaces. Quotes won't do it, but an array will.

Suppose your script wants to send email. You might have places where you want to include a subject, and others where you don't. The part of your script that sends the mail might check a variable named subject to determine whether you need to supply additional arguments to the mail command. A naive programmer may come up with something like this:

    # Don't do this.
    args=$recipient
    if [[ $subject ]]; then
        args+=" -s $subject"
    fi
    mail $args < $bodyfilename

As we have seen, this approach fails when the subject contains whitespace. It simply is not robust enough.

As such, if you really need to create a command dynamically, put each argument in a separate element of an array, like so:

    # Working example, bash 3.1 or higher
    args=("$recipient")
    if [[ $subject ]]; then
        args+=(-s "$subject")
    fi
    mail "${args[@]}" < "$bodyfilename"

(See FAQ #5 for more details on array syntax.)

Often, this question arises when someone is trying to use dialog to construct a menu on the fly. The dialog command can't be hard-coded, because its parameters are supplied based on data only available at run time (e.g. the number of menu entries). For an example of how to do this properly, see FAQ #40.

3. I want to generalize a task, in case the low-level tool changes later

You generally do NOT want to put command names or command options in variables. Variables should contain the data you are trying to pass to the command, like usernames, hostnames, ports, text, etc. They should NOT contain options that are specific to one certain command or tool. Those things belong in functions.

In the mail example, we've got hard-coded dependence on the syntax of the Unix mail command -- and in particular, versions of the mail command that permit the subject to be specified after the recipient, which may not always be the case. Someone maintaining the script may decide to fix the syntax so that the recipient appears last, which is the most correct form; or they may replace mail altogether due to internal company mail system changes, etc. Having several calls to mail scattered throughout the script complicates matters in this situation.

What you probably should be doing, is this:

    # POSIX

    # Send an email to someone.
    # Reads the body of the mail from standard input.
    #
    # sendto address [subject]
    #
    sendto() {
        # mail ${2:+-s "$2"} "$1"
        MailTool ${2:+--subject="$2"} --recipient="$1"
    }

    sendto "$address" "The Subject" < "$bodyfile"

Here, the parameter expansion checks if $2 (the optional subject) has expanded to anything. If it has, the expansion adds the -s "$2" to the mail command. If it hasn't, the expansion doesn't add the -s option at all.

The original implementation uses mail(1), a standard Unix command. Later, this is commented out and replaced by something called MailTool, which was made up on the spot for this example. But it should serve to illustrate the concept: the function's invocation is unchanged, even though the back-end tool changes.

4. I want a log of my script's actions

Another reason people attempt to stuff commands into variables is because they want their script to print each command before it runs it. If that's all you want, then simply use the set -x command, or invoke your script with #!/bin/bash -x or bash -x ./myscript. Note that you can turn it off and back on inside the script with set +x and set -x.

It's worth noting that you cannot put a pipeline command into an array variable and then execute it using the "${array[@]}" technique. The only way to store a pipeline in a variable would be to add (carefully!) a layer of quotes if necessary, store it in a string variable, and then use eval or sh to run the variable. This is not recommended, for security reasons. The same thing applies to commands involving redirection, if or while statements, and so on.

Some people get into trouble because they want to have their script print their commands including redirections before it runs them. set -x shows the command without redirections. People try to work around this by doing things like:

    # Non-working example
    command="mysql -u me -p somedbname < file"
    ((DEBUG)) && echo "$command"
    "$command"

(This is so common that I include it explicitly, even though it's repeating what I already wrote.)

Once again, this does not work. Not even using an array works here. The only thing that would work is rigorously escaping the command to be sure no metacharacters will cause serious security problems, and then using eval or sh to re-read the command. Please don't do that!

If your head is SO far up your ass that you still think you need to write out every command you're about to run before you run it, AND that you must include all redirections, then just do this:

    # Working example
    echo "mysql -u me -p somedbname < file"
    mysql -u me -p somedbname < file

Don't use a variable at all. Just copy and paste the command, wrap an extra layer of quotes around it (sometimes tricky), and stick an echo in front of it.

My personal recommendation would be just to use set -x and not worry about it.


CategoryShell

I want history-search just like in tcsh. How can I bind it to the up and down keys?

Just add the following to /etc/inputrc or your ~/.inputrc:

"\e[A":history-search-backward
"\e[B":history-search-forward

Then restart bash (either by logging out and back in, or by running exec bash).

Readline (the part of bash that handles terminal input) doesn't understand key names such as "up arrow". Instead, you must manually discern the escape sequence that the key sends on your particular terminal (usually by pressing Ctrl-V and then the key in question), and insert it into the .inputrc as shown above. \e denotes the Escape character in readline. The Ctrl-V trick shows Escape as ^[. You must recognize that the leading ^[ is an Escape character, and make the substitution yourself.

How do I convert a file from DOS format to UNIX format (remove CRs from CR-LF line terminators)?

Carriage return characters (CRs) are used in line ending markers on some systems. There are three different kinds of line endings in common use:

  • Unix systems use Line Feeds (LFs) only.
  • MS-DOS and Windows systems use CR-LF pairs.
  • Old Macintosh systems use CRs only.

If you're running a script on a Unix system, the line endings need to be Unix ones (LFs only), or you will have problems. You can check the kind of line endings in use by running:

cat -e yourscript

If you see something like this:

command^M$
^M$
another command^M$

then you need to remove the CRs. There are a plethora of ways to do this.

To remove them from a file, ex is a good standard way to do it:

ex -sc $'%s/\r$//e|x' file

There are many more ways:

  • Some systems have a dos2unix command which can do this. Or recode, or fromdos.

  • You can also use col <input.txt > output.txt

  • In vim, you can use :set fileformat=unix to do it and save it with a ":w".

  • You can use Perl:
    •   perl -pi -e 's/\r\n/\n/' filename
    This has the advantage of overwriting the original file, so you don't have to mess with temporary files.


Another way to check it:

file yourscript

The output tells you whether the ASCII text has some CR, if that's the case. Note: this is only true on GNU/Linux. On other operating systems, the result of file is unpredictable, except that it should contain the word "text" somewhere in the output if the result "kind of looks like a text file of some sort, maybe".

imadev:~$ printf 'DOS\r\nline endings\r\n' > foo
imadev:~$ file foo
foo:            commands text
arc3:~$ file foo
foo: ASCII text, with CRLF line terminators

And another way to fix it:

nano -w yourscript

Type Ctrl-O and before confirming, type Alt-D (DOS) or Alt-M (Mac) to change the format.

And another way to fix it:

dos2unix filename

I have a fancy prompt with colors, and now bash doesn't seem to know how wide my terminal is. Lines wrap around incorrectly.

You must put \[ and \] around any non-printing escape sequences in your prompt. Thus:

fancy_prompt() {
  local blue=$(tput setaf 4)
  local purple=$(tput setaf 5)
  local reset=$(tput sgr0)
  PS1="\[$blue\]\h:\[$purple\]\w\[$reset\]\\$ "
}

Without the \[ \], bash will think the bytes which constitute the escape sequences for the color codes will actually take up space on the screen, so bash won't be able to know where the cursor actually is.

If you still have problems, e.g. when going through your command history with the Up/Down arrows, make sure you have the checkwinsize option set:

shopt -s checkwinsize

Refer to the Wikipedia article for ANSI escape codes.

More generally, you should avoid writing terminal escape sequences directly in your prompt, because they are not necessarily portable across all the terminals you will use, now or in the future. Use tput to generate the correct sequences for your terminal (it will look things up in your terminfo or termcap database).

Since tput is an external command, you want to run it as few times as possible, which is why we suggest storing its results in variables, and using those to construct your prompt (rather than putting $(tput ...) in PS1 directly, which would execute tput every time the prompt is displayed). The code that constructs a prompt this way is much easier to read than the prompt itself, and it should work across a wide variety of terminals. (Some terminals may not have the features you are trying to use, such as colors, so the results will never be 100% portable in the complex cases. But you can get close.)


  • Personal note: I still prefer this answer:

    BLUE=$(tput setaf 4)
    PURPLE=$(tput setaf 5)
    RESET=$(tput sgr0)
    PS1='\[$BLUE\]\h:\[$PURPLE\]\w\[$RESET\]\$ '

    I understand that people like to avoid polluting the variable namespace; hence the function and the local part, which in turn forces the use of double quotes, which in turn forces the need to double up some but not all backslashes (and to know which ones -- oy!). I find that unnecessarily complicated. Granted, there's a tiny risk of collision if someone overrides BLUE or whatever, but on the other hand, the double-quote solution also carries the risk that a terminal will have backslashes in its escape sequences. And since the contents of the escape sequences are being parsed in the double-quote solution, but not in the single-quote solution, such a terminal could mess things up. Example of the difference:

     imadev:~$ FOO='\w'; PS1='$FOO\$ '
     \w$ FOO='\w'; PS1="$FOO\\$ "
     ~$ 

    Suppose our terminal uses \w in an escape sequence. A \w inside a variable that's referenced in a single-quoted PS1 is only expanded out to a literal \w when the prompt is printed, which is what we want. But in the double-quoted version, the \w is placed directly into the PS1 variable, and gets evaluated by bash when the prompt is printed. Now, I don't actually know of any terminals that use this notation -- it's entirely a theoretical objection. But then again, so is the objection to the use of variables like BLUE. And some people might actually want to echo "$BLUE" in their shells anyway. So, I'm not going to say the single-quote answer is better, but I'd like to see it retained here as an alternative. -- GreyCat

    • Fair enough. I initially just intended to change the BLACK= to a RESET= (since not everyone uses white on black), but then I thought it would be better if the prompt did not depend on variables being available. I obviously was not aware about the possibility of such terminal escape sequences, so I think mentioning the single-quote version first would be a better idea and also mention what happens if those vars change.

      I guess one could also make the variables readonly to prevent accidentally changing them and mess up the prompt, though that'll probably have other drawbacks..? -- ~~~

How can I tell whether a variable contains a valid number?

First, you have to define what you mean by "number". The most common case when people ask this seems to be "a non-negative integer, with no leading + sign". Or in other words, a string of all digits. Other times, people want to validate a floating-point input, with optional sign and optional decimal point.

1. Hand parsing

If you're validating a simple "string of digits", you can do it with a glob:

# Bash
if [[ $foo != *[!0-9]* ]]; then
    echo "'$foo' is strictly numeric"
else
    echo "'$foo' has a non-digit somewhere in it"
fi

The same thing can be done in Korn and POSIX shells as well, using case:

# ksh, POSIX
case "$foo" in
    *[!0-9]*) echo "'$foo' has a non-digit somewhere in it" ;;
    *) echo "'$foo' is strictly numeric" ;;
esac

If you need to allow a leading negative sign, or if want a valid floating-point number or something else more complex, then there are a few possible ways. Standard globs aren't expressive enough to do this, but we can use extended globs:

# Bash -- extended globs must be enabled.
# Check whether the variable is all digits.
shopt -s extglob
[[ $var == +([0-9]) ]]

A more complex case:

# Bash
shopt -s extglob
[[ $foo = *[0-9]* && $foo = ?([+-])*([0-9])?(.*([0-9])) ]] &&
  echo "foo is a floating-point number"

The leading test of $foo is to ensure that it contains at least one digit. The extended glob, by itself, would match the empty string, or a lone + or -, which may not be desirable behavior.

Korn shell has extended globs enabled by default, but lacks [[, so we must use case to do the glob-matching:

# Korn
case $foo in
  *[0-9]*)
    case $foo in
        ?([+-])*([0-9])?(.*([0-9]))) echo "foo is a number";;
    esac;;
esac

Note that this uses the same extended glob as the Bash example before it; the third closing parenthesis at the end of it is actually part of the case syntax.

If your definition of "a valid number" is even more complex, or if you need a solution that works in legacy Bourne shells, you might prefer to use an external tool's regular expression syntax. Here is a portable version (explained in detail here), using egrep:

# Bourne
if echo "$foo" | egrep '^[-+]?([0-9]+\.?|[0-9]*\.[0-9]+)$' >/dev/null
then
    echo "'$foo' is a number"
else
    echo "'$foo' is not a number"
fi

Bash version 3 and above have regular expression support in the [[ command. Due to bugs and changes in the implementation of the =~ feature throughout bash 3.x, we do not recommend using it, but people do it anyway, so we have to maintain this example (and keep restoring this warning, too, when people delete it):

# Bash
# Put the RE in a var for backward compatibility with versions <3.2
regexp='^[-+]?[0-9]*(\.[0-9]*)?$' 
if [[ $foo = *[0-9]* && $foo =~ $regexp ]]; then
    echo "'$foo' looks rather like a number"
else
    echo "'$foo' doesn't look particularly numeric to me"
fi

2. Using the parsing done by [ and printf (or "using eq")

# fails with ksh
if [ "$foo" -eq "$foo" ] 2>/dev/null;then
 echo "$foo is an integer"
fi

[ parses the variable and interprets it as in integer because of the -eq. If the parsing succeds the test is trivially true; if it fails [ prints an error message that 2>/dev/null hides and sets a status different from 0. However this method fails if the shell is ksh, because ksh evaluates the variable as an arithmetic expression.

You can use a similar trick with printf:

# POSIX
if printf "%f" "$foo" >/dev/null 2>&1; then
  echo "$foo is a float"
fi

You can use %d to parse an integer. Take care that the parsing might be (is supposed to be?) locale-dependent.

3. Using the integer type

If you just want to guarantee ahead of time that a variable contains an integer, without actually checking, you can give the variable the "integer" attribute.

# Bash
declare -i foo
foo=-10+1; echo "$foo"    # prints -9

foo="hello"; echo "$foo"
# the value of the variable "hello" is evaluated; if unset, foo is 0

foo="Some random string"  # results in an error.

Any value assigned to a variable with the integer attribute set is evaluated as an arithmetic expression just like inside $(( )). Bash will raise an error if you try to assign an invalid arithmetic expression.

In Bash and ksh93, if a variable which has been declared integer is used in a read command, the user's input is treated as an arithmetic expression, as with assignment. In particular, if the user types an identifier, the variable will be set to the value of the variable with that name, and read will give no other indication of a problem.

# Bash (and ksh93, if you replace declare with typeset)
$ declare -i foo
$ read foo
hello
$ echo $foo    # prints 0; 'hello' is unset, so is treated as 0 for arithmetic purposes
$ hello=5
$ read foo     # user types hello again
hello
$ echo $foo    # prints 5, the value of 'hello' as an arithmetic expression

Pretty useless if you want to read only integers.

In the older Korn shell (ksh88), if a variable is declared integer and used in a read command, and the user types an invalid integer, the shell complains, the read command returns an error status, and the value of the variable is unchanged.

# ksh88
$ typeset -i foo
$ foo=42
$ read foo
hello
ksh: hello: bad number
$ echo $?
1
$ echo $foo
42

Tell me all about 2>&1 -- what's the difference between 2>&1 >foo and >foo 2>&1, and when do I use which?

Bash processes all redirections from left to right, in order. And the order is significant. Moving them around within a command may change the results of that command.

If all you want is to send both standard output and standard error to the same file, use this:

# Bourne
foo >file 2>&1          # Sends both stdout and stderr to file.

Here's a simple demonstration of what's happening:

# POSIX
foo() {
  echo "This is stdout"
  echo "This is stderr" 1>&2
}
foo >/dev/null 2>&1             # produces no output
foo 2>&1 >/dev/null             # writes "This is stderr" on the screen

Why do the results differ? In the first case, >/dev/null is performed first, and therefore the standard output of the command is sent to /dev/null. Then, the 2>&1 is performed, which causes standard error to be sent to the same place that standard output is already going. So both of them are discarded.

In the second example, 2>&1 is performed first. This means standard error is sent to wherever standard output happens to be going -- in this case, the user's terminal. Then, standard output is sent to /dev/null and is therefore discarded. So when we run foo the second time, we see only its standard error, not its standard output.

The redirection chapter in the guide explains why we use a duplicate file descriptor rather than opening /dev/null twice. In the specific case of /dev/null it doesn't actually matter because all writes are discarded, but when we write to a log file, it matters very much indeed.

There are times when we really do want 2>&1 to appear first -- for one example of this, see FAQ #40.

There are other times when we may use 2>&1 without any other redirections. Consider:

# Bourne
find ... 2>&1 | grep "some error"

In this example, we want to search find's standard error (as well as its standard output) for the string "some error". The 2>&1 in the piped command forces standard error to go into the pipe along with standard output. (When pipes and redirections are mixed in this way, remember: the pipe is done first, before any redirections. So find's standard output is already set to point to the pipe before we process the 2>&1 redirection.)

If we wanted to read only standard error in the pipe, and discard standard output, we could do it like this:

# Bourne
find ... 2>&1 >/dev/null | grep "some error"

The redirections in that example are processed thus:

  1. First, the pipe is created. find's output is sent to it.

  2. Next, 2>&1 causes find's standard error to go to the pipe as well.

  3. Finally, >/dev/null causes find's standard output to be discarded, leaving only stderr going into the pipe.

A related question is FAQ #47, which discusses how to send stderr to a pipeline.

See Making sense of the copy descriptor operator for a more graphical explanation.

1. If you're still confused...

If you're still confused at this point, it's probably because you started out with a misconception about how FDs work, and you haven't been able to drop that misconception yet. Don't worry -- it's an extremely common misconception, and you're not alone. Let me try to explain....

Many people think that 2>&1 somehow "unites" or "ties together" or "marries" the two FDs, so that any change to one of them becomes a change to the other. This is not the case. And this is where the confusion comes from, for many people.

2>&1 only changes FD2 to point to "that which FD1 points to at the moment"; it does not actually make FD2 point to FD1 itself. Note that "2" and "1" have different meanings due to the way they are used: "2", which occurs before ">&" means the actual FD2, but "1", which occurs after ">&", means "that which FD1 currently points to", rather than FD1 itself. (If reversed, as in "1>&2", then 1 means FD1 itself, and 2 means "that which FD2 currently points to".)

Analogies may help. One analogy is to think of FDs as being like C pointers.

   int some_actual_integer;
   int *fd1, *fd2;

   fd1 = &some_actual_integer;  /* Analogous to 1>file */
   fd2 = fd1;                   /* Analogous to 2>&1 */
   fd1 = NULL;                  /* Analogous to 1>&- */

   /* At this point, fd2 is still pointing to the actual memory location.
      The fact that fd1 and fd2 both *used to* point to the same place is
      not relevant.  We can close or repoint one of them, without affecting
      the other. */

Another analogy is to think of them like hardlinks in a file system.

    touch some_real_file
    ln some_real_file fd1       # Make fd1 a link to our file
    ln fd1 fd2                  # Make fd2 another link to our file
    rm fd1                      # Remove the fd1 link, but fd2 is not
                                # affected

    # At this point we still have a file with two links: "some_real_file"
    # and "fd2".

Or like symbolic links -- but we have to be careful with this analogy.

    touch some_real_file
    ln -s some_real_file fd1    # Make fd1 a SYMlink to our file
    ln -s "$(readlink fd1)" fd2 # Make fd2 symlink to the same thing that
                                # fd1 is a symlink to.
    rm fd1                      # Remove fd1, but fd2 is untouched.

    # Requires the nonstandard "readlink" program.
    # Result is:

    lrwxrwxrwx 1 wooledg wooledg 14 Mar 25 09:19 fd2 -> some_real_file
    -rw-r--r-- 1 wooledg wooledg  0 Mar 25 09:19 some_real_file

    # If we had attempted to use "ln -s fd1 fd2" in this analogy, we would have
    # FAILED badly.  This isn't how FDs work; rather, it's how some people
    # THINK they work.  And it's wrong.

Other analogies include thinking of FDs as hoses. Think of files as barrels full of water (or empty, or half full). You can put a hose in a barrel in order to dump more water into it. You can put two hoses into the same barrel, and they can both dump water into the same barrel. You can then remove one of those hoses, and that doesn't cause the other hose to go away. It's still there.

How can I untar (or unzip) multiple tarballs at once?

As the tar command was originally designed to read from and write to tape devices (tar - Tape ARchiver), you can specify only filenames to put inside an archive (write to tape) or to extract out of an archive (read from tape).

There is an option to tell tar that the archive is not on some tape, but in a file: -f. This option takes exactly one argument: the filename of the file containing the archive. All other (following) filenames are taken to be archive members:

    tar -x -f backup.tar myfile.txt
    # OR (more common syntax IMHO)
    tar xf backup.tar myfile.txt

Now here's a common mistake -- imagine a directory containing the following archive-files you want to extract all at once:

    $ ls
    backup1.tar backup2.tar backup3.tar

Maybe you think of tar xf *.tar. Let's see:

    $ tar xf *.tar
    tar: backup2.tar: Not found in archive
    tar: backup3.tar: Not found in archive
    tar: Error exit delayed from previous errors

What happened? The shell replaced your *.tar by the matching filenames. You really wrote:

    tar xf backup1.tar backup2.tar backup3.tar

And as we saw earlier, it means: "extract the files backup2.tar and backup3.tar from the archive backup1.tar", which will of course only succeed when there are such filenames stored in the archive.

The solution is relatively easy: extract the contents of all archives one at a time. As we use a UNIX shell and we are lazy, we do that with a loop:

    for tarname in ./*.tar; do
      tar xf "$tarname"
    done

What happens? The for-loop will iterate through all filenames matching *.tar and call tar xf for each of them. That way you extract all archives one-by-one and you even do it automagically.

The second common archive type in these days is ZIP. The command to extract contents from a ZIP file is unzip (who would have guessed that!). The problem here is the very same: unzip takes only one option specifying the ZIP-file. So, you solve it the very same way:

    for zipfile in ./*.zip; do
      unzip "$zipfile"
    done

Not enough? Ok. There's another option with unzip: it can take shell-like patterns to specify the ZIP-file names. And to avoid interpretation of those patterns by the shell, you need to quote them. unzip itself and not the shell will interpret *.zip in this case:

    unzip "*.zip"
    # OR, to make more clear what we do:
    unzip \*.zip

(This feature of unzip derives mainly from its origins as an MS-DOS program. MS-DOS's command interpreter does not perform glob expansions, so every MS-DOS program must be able to expand wildcards into a list of filenames. This feature was left in the Unix version, and as we just demonstrated, it can occasionally be useful.)

How can I group entries (in a file by common prefixes)?

As in, one wants to convert:

    foo: entry1
    bar: entry2
    foo: entry3
    baz: entry4

to

    foo: entry1 entry3
    bar: entry2
    baz: entry4

There are two simple general methods for this:

  1. sort the file, and then iterate over it, collecting entries until the prefix changes, and then print the collected entries with the previous prefix
  2. iterate over the file, collect entries for each prefix in an array indexed by the prefix

A basic implementation of a in bash:

old=xxx ; stuff=
(sort file ; echo xxx) | while read prefix line ; do 
        if [[ $prefix = $old ]] ; then
                stuff="$stuff $line"
        else
                echo "$old: $stuff"
                old="$prefix"
                stuff=
        fi
done 

And a basic implementation of b in awk, using a true multi-dimensional array:

    {
      a[$1,++b[$1]] = $2;
    }

    END {
      for (i in b) {
        printf("%s", i);
        for (j=1; j<=b[i]; j++) {
          printf(" %s", a[i,j]);
        }
        print "";
      }
    }

Written out as a shell command:

    awk '{a[$1,++b[$1]]=$2} END {for (i in b) {printf("%s", i); for (j=1; j<=b[i]; j++) printf(" %s", a[i,j]); print ""}}' file

Can bash handle binary data?

The answer is, basically, no....

While bash won't have as many problems with it as older shells, it still can't process arbitrary binary data, and more specifically, shell variables are not 100% binary clean, so you can't store binary files in them.

You can store uuencoded ASCII data within a variable such as

    var=$(uuencode /bin/ls ls)
    cd /somewhere/else
    uudecode <<<"$var"  # don't forget the quotes!
  • Note: there is a huge difference between GNU and Unix uuencode/uudecode. With Unix uudecode, you cannot specify the output file; it always uses the filename encoded in the ASCII data. I've fixed the previous example so that it works on Unix systems. If you make further changes, please don't use GNUisms. Thanks. --GreyCat

One instance where such would sometimes be handy is storing small temporary bitmaps while working with netpbm... here I resorted to adding an extra pnmnoraw to the pipe, creating (larger) ASCII files that bash has no problems storing.

If you are feeling adventurous, consider this experiment:

    # bindec.bash, attempt to decode binary data to ascii decimals
    IFS=
    while read -n1 x ;do
        case "$x" in
            '') echo empty ;;
            # insert the 256 lines generated by the following oneliner here:
            # for x in $(seq 0 255) ;do echo "        $'\\$(printf %o $x)') echo $x;;" ;done
        esac
    done

and then pipe binary data into it, maybe like so:

    for x in $(seq 0 255) ;do echo -ne "\\$(printf %o $x)" ;done | bash bindec.bash | nl | less

This suggests that the 0 character is skipped entirely, because we can't create it with the input generation, enough to conveniently corrupt most binary files we try to process.

  • Yes, Bash is written in C, and uses C semantics for handling strings -- including the NUL byte as string terminator -- in its variables. You cannot store NUL in a Bash variable sanely. It simply was never intended to be used for this. - GreyCat

Note that this refers to storing them in variables... moving data between programs using pipes is always binary clean. Temporary files are also safe, as long as appropriate precautions are taken when creating them.

To cat binary file with just bash builtins when no external command is available (had to use this trick once when /lib/libgcc_s.so.1 was renamed, saved the day):

# simulate cat with just bash builtins, binary safe
IFS=
while read -d '' -r -n1 x ; do
    case "$x" in
        '') printf "\x00";;
        *) printf "%s" "$x";;
    esac
done
  • I'd rather just use cat. Also, is the -n1 really needed? -GreyCat

    • without -n1 you have to be careful to deal with the data after the last \0, something like [[ $x ]] && printf "%s" "%x" after the loop. I haven't tested this to know if it works or if it is enough. Also I don't know what happens if you read a big file without any \0 --pgas

I saw this command somewhere: :(){ :|:& } (fork bomb). How does it work?

First of all -- and this is important -- please do not run this command. I've actually omitted the trigger from the question above, and left only the part that sets up the function.

Here is that part, but written out in normal shell coding style, rather than rammed all together:

:() { 
 : | : &
}

What this does is create a function named : which calls itself recursively. Twice. In the background. Since the function keeps calling itself over and over (forking new processes), forever, this quickly consumes a lot of system resources. That's why it's called a "fork bomb".

If you still don't see how it works, here is an equivalent, which creates a function named bomb instead of :

bomb() {
 bomb | bomb &
}

A more verbose explanation:

Inside the function, a pipeline is created which forks two more instances of the function (each of which will be a whole process) in the background. Then the function exits. However, for every instance of the function which exits in this manner, two more have already been created. The result is a vast number of processes, extremely quickly.

Theoretically, anybody that has shell access to your computer can use such a technique to consume all the resources to which he/she has access. A chroot(2) won't help here. If the user's resources are unlimited then in a matter of seconds all the resources of your system (processes, virtual memory, open files, etc.) will be used, and it will probably deadlock itself. Any attempt made by the kernel to free more resources will just allow more instances of the function to be created.

As a result, the only way to protect yourself from such abuse is by limiting the maximum allowed use of resources for your users. Such resources are governed by the setrlimit(2) system call. The interface to this functionality in Bash and KornShell is the ulimit command. Your operating system may also have special configuration files to help manage these resources (for example, /etc/security/limits.conf in Debian, or /etc/login.conf in OpenBSD). Consult your documentation for details.

I'm trying to write a script that will change directory (or set a variable), but after the script finishes, I'm back where I started (or my variable isn't set)!

Consider this:

   #!/bin/sh
   cd /tmp

If one executes this simple script, what happens? Bash forks, resulting in a parent (the interactive shell in which you typed the command) and a child (a new shell that reads and executes the script). The child runs, while the parent waits for it to finish. The child reads and executes the script, changes its current directory to /tmp, and then exits. The parent, which was waiting for the child, harvests the child's exit status (presumably 0 for success), and then carries on with the next command. Nowhere in this process has the parent's current working directory changed -- only the child's.

A child process can never affect any part of the parent's environment, which includes its variables, its current working directory, its open files, its resource limits, etc.

So, how does one go about changing the current working directory of the parent? You can still have the cd command in an external file, but you can't run it as a script. That would cause the forking explained earlier. Instead, you must source it with . (or the Bash-only synonym, source). Sourcing basically means you execute the commands in a file using the current shell, not in a forked shell (child shell):

   echo 'cd /tmp' > "$HOME/mycd"  # Create a file that contains the 'cd /tmp' command.
   . $HOME/mycd                   # Source that file, executing the 'cd /tmp' command in the current shell.
   pwd                            # Now, we're in /tmp

The same thing applies to setting variables. . ("dot in") the file that contains the commands; don't try to run it.

If the command you execute is a function, not a script, it will be executed in the current shell. Therefore, it's possible to define a function to do what we tried to do with an external file in the examples above, without needing to "dot in" or "source" anything. Define the following function and then call it simply by typing mycd:

   mycd() { cd /tmp; }

Put it in ~/.bashrc or similar if you want the function to be available automatically in every new shell you open.

Some people prefer to use aliases instead of functions. Functions are more powerful, more general, more flexible, and... some people just don't seem to like them.

   alias mycd='cd /tmp'  # Equivalent to the function shown above.

   alias cdlstmp='cd /tmp && ls tmp*'
   # will take you to /tmp and show you what files there are starting with "tmp"

   cdls() { cd "$1" && ls; }
   # cannot be done with an alias
   # usage 'cdls directory'

Is there a list of which features were added to specific releases (versions) of Bash?

Here are some links to official Bash documentation:

  • NEWS: a file tersely listing the notable changes between the current and previous versions

  • CHANGES: a "complete" bash change history (back to 2.0 only)

  • COMPAT: compatibility issues between bash3 and previous versions

A more extensive, partial list than the one below can be found at http://wiki.bash-hackers.org/scripting/bashchanges

Here's a partial list of the changes, in a more compact format:

Feature

Added in version

\uXXXX and \UXXXXXXXX

4.2-alpha

declare -g

4.2-alpha

test -v

4.2-alpha

printf %(fmt)T

4.2-alpha

array[-idx] and ${var:start:-len}

4.2-alpha

lastpipe (shopt)

4.2-alpha

read -N

4.1-alpha

{var}> or {var}< etc. (FD variable assignment)

4.1-alpha

syslog history (compile option)

4.1-alpha

BASH_XTRACEFD

4.1-alpha

;& and ;;& fall-throughs for case

4.0-alpha

associative arrays

4.0-alpha

&>> and |&

4.0-alpha

command_not_found_handle

4.0-alpha

coproc

4.0-alpha

globstar

4.0-alpha

mapfile/readarray

4.0-alpha

${var,[,]} and ${var^[^]}

4.0-alpha

{009..012} (leading zeros in brace expansions)

4.0-alpha

{x..y..incr}

4.0-alpha

read -t 0

4.0-alpha

read -i

4.0-alpha

x+=string array+=(string)

3.1-alpha1

printf -v var

3.1-alpha1

{x..y}

3.0-alpha

${!array[@]}

3.0-alpha

[[ =~

3.0-alpha

<<<

2.05b-alpha1

i++

2.04-devel

for ((;;))

2.04-devel

/dev/fd/N, /dev/tcp/host/port, etc.

2.04-devel

a=(*.txt) file expansion

2.03-alpha

extglob

2.02-alpha1

[[

2.02-alpha1

builtin printf

2.02-alpha1

$(< filename)

2.02-alpha1

** (exponentiation)

2.02-alpha1

\xNNN

2.02-alpha1

(( ))

2.0-beta2

How do I create a temporary file in a secure manner?

Good question. To be filled in later. (Interim hints: tempfile is not portable. mktemp exists more widely, but it may require a -c switch to create the file in advance; or it may create the file by default and barf if -c is supplied. Some systems don't have either command (Solaris, POSIX). There does not appear to be any single command that simply works everywhere.)

The traditional answer has usually been something like this:

   # Do not use!  Race condition!
   tempfile=/tmp/myname.$$
   trap 'rm -f "$tempfile"; exit 1' 1 2 3 15
   rm -f "$tempfile"
   touch "$tempfile"

The problem with this is: if the file already exists (for example, as a symlink to /etc/passwd), then the script may write things in places they should not be written. Even if you remove the file immediately before using it, you still have a RaceCondition: someone could re-create a malicious symlink in the interval between your shell commands.


In some systems (like Linux):

  • you have available the mktemp command and you can use its -d option so that it creates temporary directores only accessible for you, with random characters inside their names to make it almost impossible for an attacker to guess the directories names beforehand.

  • you can create filenames longer than 14 characters in /tmp.

  • you have available Bash so you can use special features of it.

   # Sets $TMPDIR to "/tmp" only if it didn't have a value previously
   TMPDIR=${TMPDIR:-/tmp}
   # We can find about $TMPDIR in http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html

   # Creates a particular temporary directory inside $TMPDIR
   temporary_dir=$(mktemp -d "$TMPDIR/XXXXXXXXXXXXXXXXXXXXXXXXXXXXX") || { echo "ERROR creating a temporary file" >&2; exit 1; }

   # Remove the temporary directory when the script finishes, or when it receives a signal
   trap 'rm -rf "$temporary_dir"' 0       # remove directory when script finishes
   trap 'exit 2' 1 2 3 15       # terminate script when receiving signal

And then you can create your particular files inside the temporary folder. If you want to make life more difficult to an adversary and you are using Bash, you can use random numbers in the names of your temporary files to prevent strange cases like when you are using a shared system, your program is paused for a long time, its process ID is known, it uses a temporary file, root runs Tmpwatch (or similar) to delete temporary files that are not used for a long time and an adversary creates a replica of your temporary folder (if you use random names the adversary will be able to know the name of your temporary folder but not the name of your files inside it). For example:

   # Prepares the name of the future temporary file
   temporary_file="$temporary_dir/strings-$RANDOM-$RANDOM-$RANDOM"
   # Then you can use the temporary file, like in
   grep string file > "$temporary_file"


A different suggestion (remove if not universal): A temporary directory can be created that is unlikely to match an existing directory using the RANDOM variable as follows:

   temp_dir=/tmp/$RANDOM
   mkdir "$temp_dir"

This will make a directory of the form: /tmp/20445/. To decrease the chance of collision with an existing directory, the RANDOM variable can be used a number of times:

   temp_dir=/tmp/$RANDOM-$RANDOM-$RANDOM
   mkdir "$temp_dir"

This will make a directory of the form: /tmp/24953-2875-2182/ . This avoids a race condition because the mkdir is atomic, as we see in FAQ #45.

  • Hmmm... this has potential, if you check the exit status of mkdir to be sure it actually created the directory. And set umask to something fairly restrictive as well. It could use some more peer review, though. -- GreyCat

    • Oh, also, you shouldn't assume you can create filenames longer than 14 characters in /tmp. There are still some systems out there with 14-character filename limits. --GreyCat

      • Also also, RANDOM is not available in the Bourne shell, so it still fails on Solaris. This is why most legacy Bourne shell scripts use $$ for that purpose. --GreyCat

Another not-quite-serious suggestion is to include C code in the script that implements a mktemp(1) command based on the mktemp(3) library function, compile it, and use that in the script. But this has a couple problems:

  • The useless Solaris systems where we would need this probably don't have a C compiler either.
  • Chicken and egg problem: we need a temporary file name to hold the compiler's output.


Instead of RANDOM, awk can be used to generate a random number in a POSIX compatible way:

   temp_dir=/tmp/$(awk 'BEGIN { srand (); print rand() }')
   mkdir -m 700 "$temp_dir"

Note, however that "srand()" seeds the random number generator using seconds since the epoch which is fairly easy for an adversary to predict and perform a denial of service attack.

My ssh client hangs when I try to logout after running a remote background job!

The following will not do what you expect:

   ssh me@remotehost 'sleep 120 &'
   # Client hangs for 120 seconds

This is a "feature" of OpenSSH. The client will not close the connection as long as the remote end's terminal still is still in use -- and in the case of sleep 120 &, stdout and stderr are still connected to the terminal.

The immediate answer to your question -- "How do I get the client to disconnect so I can get my shell back?" -- is to kill the ssh client. You can do this with the kill or pkill commands, of course; or by sending the INT signal (usually Ctrl-C) for a non-interactive ssh session (as above); or by pressing <Enter><~><.> (Enter, Tilde, Period) in the client's terminal window for an interactive remote shell.

The long-term workaround for this is to ensure that all the file descriptors are redirected to a log file (or /dev/null) on the remote side:

   ssh me@remotehost 'sleep 120 >/dev/null 2>&1 &'
   # Client should return immediately

This also applies to restarting daemons on some legacy Unix systems.

   ssh root@hp-ux-box   # Interactive shell
   ...                  # Discover that the problem is stale NFS handles
   /sbin/init.d/nfs.client stop   # autofs is managed by this script and
   /sbin/init.d/nfs.client start  # killing it on HP-UX is OK (unlike Linux)
   exit
   # Client hangs -- use Enter ~ . to kill it.

Please note that allowing root to log in over SSH is a very bad security practice. If you must do this, then create a single script that does all the commands you want, with no command line options, and then configure the sudoers file to grant a single user the right to run the mentioned script with no password required. This will ensure that you know which commands need run regularly, and that if the regular account is compromised, the damage which can be incurred is quantified.

The legacy Unix /sbin/init.d/nfs.client script runs daemons in the background but leaves their stdout and stderr attached to the terminal (and they don't fully self-daemonize). The solution is either to fix the Unix vendor's broken init script, or to kill the ssh client process after this happens. The author of this article uses the latter approach.

Why is it so hard to get an answer to the question that I asked in #bash?

Maybe nobody knows the answer (or the people who know the answer are busy). Maybe you haven't given enough detail about the problem, or you haven't presented the problem clearly. Maybe the question you asked is answered in this FAQ, or in BashPitfalls, or in the BashGuide.

This is a big one: don't just post a URL and say "here is my script, fix it!" Only post code as a last resort, if you have a small piece of code that you don't understand. Instead, you should state what you're trying to do.

Shell scripting is largely a collection of hacks and tricks that do not generalize very well. The optimal answer to one problem may be quite different from the optimal answer to a similar-looking problem, so it's extremely important that you tell us the exact problem you want to solve.

Moreover, if you've attempted to solve a problem yourself, there's a really high probability that you've gone about it using a technique that doesn't work (or, at least, doesn't work for that particular problem). Any code you already have is probably going to be thrown away. Posting your non-working code as a substitute for a description of the problem you want to solve is usually a waste of time, and is nearly always irritating.

See NetEtiquette for more general suggestions. Try to avoid the infamous XyProblem.

dilbert-20110402.gif

Also:

  • #bash aphorism 1: The questioner's first description of the problem/question will be misleading.

    • #bash corollary 1.1: The questioner's second description of the problem/question will also be misleading.

  • #bash aphorism 2: The questioner will keep changing the original question until it drives the helpers in the channel insane.

The aphorisms given here are intended to be humorous, but with a touch of realism underlying them. Several have been suggested over time, but only the ones shown above have remained largely untouched. Others include:

  • The data is never formatted in the way that makes it easiest to manipulate.
  • 30 to 40 percent of the conversations in #bash will be about aphorisms #1 and #2.
  • The questioner will never tell you what they are really doing the first time they ask.
  • The questioner's third description of the problem will clarify two previously misdescribed elements of the problem, but will add two new irrelevant issues that will be even more difficult to unravel from the actual problem.
  • Offtopicness will continue until someone asks a bash question that falls under bashphorisms 1 and/or 2, and greycat gets pissed off.
  • The questioner will not read and apply the answers he is given but will instead continue to practice b1 and b2.
  • The ignorant will continually mis-educate the other noobies.
  • When given a choice of two solutions, the newbie will always choose the more complicated, or less portable, solution.
  • When given a choice of solutions, the newbie will always choose the wrong one.
  • The newbie will always find a reason to say, "It doesn't work."
  • If you don't know to whom the bashphorism's referring, it's you.
  • All examples given by the questioner will be broken, misleading, wrong, and not representative of the actual question.
  • Everyone ignores greycat when he is right. When he is wrong, it is !b1.
  • The newbie doesn't actually know what he's asking. If he did, he wouldn't need to ask.
  • The more advanced you are, the more likely you are to be overcomplicating it.
  • The more beginner you are, the more likely you are to be overcomplicating it.
  • A newbie comes to #bash to get his script confirmed. He leaves disappointed.
  • The newbie will not accept the answer you give, no matter how right it is.
  • The newbie is a bloody loon.
  • The newbie will always have some excuse for doing it wrong.

Is there a "PAUSE" command in bash like there is in MSDOS batch scripts? To prompt the user to press any key to continue?

Use the following to wait until the user presses enter:

# Bash
read -p "Press [enter] to continue..."

# Bourne
echo "Press [enter] to continue..."
read junk

Or use the following to wait until the user presses any key to continue:

# Bash
read -rsn 1 -p "Press any key to continue..."

Sometimes you need to wait until the user presses any key to continue, but you are already using the "standard input" because (for example) you are using a pipe to feed your script. How do you tell read to read from the keyboard? Unix flexibility is helpful here, you can add "< /dev/tty"

# Bash
read -rsn 1 -p "Press any key to continue..." < /dev/tty

If you want to put a timeout on that, use the -t option to read:

# Bash
echo "WARNING: You are about to do something stupid."
echo -n "Press a key within 5 seconds to cancel."
if ! read -rsn 1 -t 5
then something_stupid
fi

If you just want to pause for a while, regardless of the user's input, use sleep:

echo "The script is tired.  Please wait a minute."
sleep 60

If you want a fancy countdown on your timed read:

# Bash
# This function won't handle multi-digit counts.
countdown() {
  local i 
  echo -n $1
  sleep 1
  for ((i=$1-1; i>=1; i--)); do
    printf "\b%d" $i
    sleep 1
  done
}

echo 'Warning!!'
echo -n 'Five seconds to cancel: '
countdown 5 & pid=$!
if ! read -s -n 1 -t 5; then
  echo; echo "boom"
else
  kill $pid; echo; echo "phew"
fi

(If you test that code in an interactive shell, you'll get "chatter" from the job control system when the child process is created, and when it's killed. But in a script, there won't be any such noise.)

I want to check if [[ $var == foo || $var == bar || $var == more ]] without repeating $var n times.

The portable solution uses case:

   # Bourne
   case "$var" in
      foo|bar|more) ... ;;
   esac

In Bash and ksh, Extended globs can also do this within a [[ command:

   # bash/ksh -- ksh does not need the shopt
   shopt -s extglob
   if [[ $var = @(foo|bar|more) ]]; then
      ...
   fi


CategoryShell

How can I trim leading/trailing white space from one of my variables?

There are a few ways to do this. Some involve special tricks that only work with whitespace. Others are more general, and can be used to strip leading zeroes, etc.

Here's one that only works for whitespace. It relies on the fact that read strips all leading and trailing whitespace when IFS isn't set:

   # POSIX, but fails if the variable contains newlines
   read -r var << EOF
   $var
   EOF

Bash can do something similar with a "here string":

   # Bash
   read  -rd '' x <<< "$x"

Using an empty string as a delimiter means the read consumes the whole string, as NUL is used. (Remember: BASH only does C-string variables.) This is entirely safe for any text, including newlines.

Here's a solution using extglob together with parameter expansion:

   # Bash
   shopt -s extglob
   x=${x##+([[:space:]])} x=${x%%+([[:space:]])}

This also works in KornShell, without needing the explicit extglob setting:

   # ksh
   x=${x##+([[:space:]])} x=${x%%+([[:space:]])}

This solution isn't restricted to whitespace like the first few were. You can remove leading zeroes as well:

   # Bash
   shopt -s extglob
   x=${x##+(0)}

Another way to remove leading zeroes from a number in bash is to treat it as an integer, in a math context:

   # Bash
   x=$((10#$x))
   # However, this fails if x contains anything other than digits.

If you need to remove leading zeroes in a POSIX shell, you can use a loop:

   # POSIX
   while true; do
     case "$var" in
       0*) var=${var#0};;
       *)  break;;
     esac
   done

Or this trick (covered in more detail in FAQ #100):

   # POSIX
   zeroes=${var%%[!0]*}
   var=${var#$zeroes}

There are many, many other ways to do this, using sed for instance:

   # POSIX, suppress the trailing and leading whitespace on every line
   x=$(echo "$x" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')

Solutions based on external programs like sed are better suited to trimming large files, rather than shell variables.

How do I run a command, and have it abort (timeout) after N seconds?

FIRST check whether the command you're running can be told to timeout directly. The methods described here are "hacky" workarounds to force a command to terminate after a certain time has elapsed. Configuring your command properly is always preferable to the alternatives below.

If the command has no native support for stopping after a specified time, then the best alternatives are some external commands called timeout and doalarm. Some Linux distributions offer the tct version of timeout as a package. There is also a GNU version of timeout, included in recent coreutils releases.

Beware: by default, some implementations of timeout issue a SIGKILL (kill -9), which is roughly the same as pulling out the power cord (leaving no chance for the program to commit its work, often resulting in corruption of its data). You should use a signal that allows the program to shut itself down instead (SIGTERM). See ProcessManagement for more information on SIGKILL.

The primary difference between doalarm and timeout is that doalarm "execs" the program after setting up the alarm, which makes it wonderful in a WrapperScript; while timeout launches the program as a child and then hangs around (both processes exist simultaneously), which gives it the opportunity to send more than one signal if necessary.

If you don't have or don't want one of the above programs, you can use a perl one-liner to set an ALRM and then exec the program you want to run under a time limit. In any case, you must understand what your program does with SIGALRM; programs with periodic updates usually use ALRM for that purpose and update rather than dying when they receive that signal.

doalarm() { perl -e 'alarm shift; exec @ARGV' "$@"; }

doalarm ${NUMBER_OF_SECONDS_BEFORE_ALRMING} program arg arg ...

If you can't or won't use one of these programs (which really should have been included with the basic core Unix utilities 30 years ago!), then the best you can do is an ugly hack like:

   command & pid=$!
   { sleep 10; kill $pid; } &

This will, as you will soon discover, produce quite a mess regardless of whether the timeout condition kicked in or not, if it's run in an interactive shell. Cleaning it up is not something worth my time. Also, it can't be used with any command that requires a foreground terminal, like top.

It is possible to do something similar, but to keep command in the foreground:

   bash -c '(sleep 10; kill $$) & exec command'

kill $$ would kill the shell, except that exec causes the command to take over the shell's PID. It is necessary to use bash -c so that the calling shell isn't replaced; in bash 4, it is possible to use a subshell instead:

   ( cmdpid=$BASHPID; (sleep 10; kill $cmdpid) & exec command )

The shell-script "timeout" (not to be confused with the command 'timeout') uses the second approach above. It has the advantage of working immediately (no need for compiling a program), but has problems e.g. with programs reading standard input.

Just use timeout or doalarm instead. Really.

I want to automate an ssh (or scp, or sftp) connection, but I don't know how to send the password....

STOP!

First of all, if you actually were to embed your password in a script somewhere, it would be visible to the entire world (or at least, anyone who can read files on your system). This would defeat the entire purpose of having a password on your remote account.

If all you want is for the user to be prompted for a password by ssh, simply make sure your script is executed in a terminal and that your ssh command is executed in the foreground ("normally"). ssh will prompt the user for a password if the remote server requires one for authentication. Your script doesn't need to get involved.

Specifically, do not ask the user for their password yourself, store it in a variable, and then try to pass it along to ssh. That reduces your security enormously.

If you want to bypass the password authentication entirely, then you should use public key authentication instead. Read and understand the man page for ssh-keygen(1), or see SshKeys for a brief overview. This will tell you how to generate a public/private key pair (in either RSA or DSA format), and how to use these keys to authenticate to the remote system without sending a password at all.

Here is a brief summary of the procedure:

ssh-keygen -t rsa
ssh me@remote "cat >> ~/.ssh/authorized_keys" < ~/.ssh/id_rsa.pub
ssh me@remote date     # should not prompt for a passWORD,
                       # but your key may have a passPHRASE

If your key has a passphrase on it, and you want to avoid typing it every time, look into ssh-agent(1). It's beyond the scope of this document, though. If your script has to run unattended, then you may need to remove the passphrase from the key. This reduces your security, because then anyone who grabs the key can log in to the remote server as you (it's equivalent to putting a password in a file). However, sometimes this is deemed an acceptable risk.

If you're being prompted for a password even with the public key inserted into the remote authorized_keys file, chances are you have a permissions problem on the remote system. See SshKeys for a discussion of such problems.

If that's not it, then make sure you didn't spell it authorised_keys. SSH uses the US spelling, authorized_keys.

If you really want to store a password in a variable and then pass it to a program, instead of using public keys, first have your head examined. Then, if you still want to use a password, use expect(1) (or the less classic but maybe more bash friendly empty(1)). But don't ask us for help with it.

expect also applies to the telnet or FTP variations of this question. However, anyone who's still running telnetd without a damned good reason needs to be fired and replaced.

How do I convert Unix (epoch) times to human-readable values?

The only sane way to handle time values within a program is to convert them into a linear scale. You can't store "January 17, 2005 at 5:37 PM" in a variable and expect to do anything with it....

Therefore, any competent program is going to use time stamps with semantics such as "the number of seconds since point X". These are called epoch timestamps. If the epoch is January 1, 1970 at midnight UTC, then it's also called a "Unix timestamp", because this is how Unix stores all times (such as file modification times).

Standard Unix, unfortunately, has no tools to work with Unix timestamps. (Ironic, eh?) GNU date, and later BSD date, has a %s extension to generate output in Unix timestamp format:

   # GNU/BSD date
   date +%s    # Prints the current time in Unix format, e.g. 1164128484

This is commonly used in scripts when one requires the interval between two events:

   # POSIX shell, with GNU/BSD date
   start=$(date +%s)
   ...
   end=$(date +%s)
   echo "Operation took $(($end - $start)) seconds."

Now, to convert those Unix timestamps back into human-readable values, one needs to use an external tool. One method is to trick GNU date using:

   # GNU date
   date -d "1970-01-01 UTC + 1164128484 seconds"
   # Prints "Tue Nov 21 12:01:24 EST 2006" in the US/Eastern time zone.

Reading info date (GNU coreutils:Date input formats) reveals that it accepts Unix timestamps prefixed with '@', so:

   # recent GNU date
   $ date -d "@1164128484"
   # Prints "Tue Nov 21 18:01:24 CET 2006" in the central European time zone

However, this feature only works with newer versions of GNU date -- coreutils 5.3.0 and above.

If you don't have GNU date available, you can use Perl:

   perl -le "print scalar localtime 1164128484"
   # Prints "Tue Nov 21 12:01:24 2006"

I used double quotes in these examples so that the time constant could be replaced with a variable reference. See the documentation for date(1) and Perl for details on changing the output format.

Newer versions of Tcl (such as 8.5) have very good support of date and clock functions. See the tclsh man page for usage details. For example:

   echo 'puts [clock format [clock scan "today"]]' | tclsh
   # Prints today's date (the format can be adjusted with parameters to "clock format").
   
   echo 'puts [clock format [clock scan "fortnight"]]' | tclsh
   # Prints the date two weeks from now.
   
   echo 'puts [clock format [clock scan "5 years + 6 months ago"]]' | tclsh
   # Five and a half years ago, compensating for leap days and daylight savings time.

A convenient way of calculating seconds elapsed since 'YYYY MM DD HH MM SS' is to use awk.

   echo "2008 02 27 18 50 23" | awk '{print systime() - mktime($0)}'
   # This uses systime() to return the current time in epoch format
   # It then performs mktime() on the input string to return the epoch time of the input string

To make this more human readable, GNU awk (gawk) can be used: The format string is similar to date, and is specifed exactly here at http://www.gnu.org/software/gawk/manual/gawk.html#Time-Functions

   echo "YYYY MM DD HH MM SS" | gawk '{print strftime("%M Minutes, %S Seconds",systime() - mktime($0))}'
   # The gawk-specific strftime() function converts the difference into a human readable format

How do I convert an ASCII character to its decimal (or hexadecimal) value and back?

If you have a known octal or hexadecimal value (at script-writing time), you can just use printf:

   # POSIX
   printf '\x27\047\n'

This prints two literal ' characters (27 is the hexadecimal ASCII value of the character, and 47 is the octal value) and a newline.

   ExpandedString=$'\x27\047\u0027\U00000027\n'
   echo -n "$ExpandedString"

Another approach $'...' strings are escaped before evaluation and can be embedded directly in code.

If you need to convert characters (or numeric ASCII values) that are not known in advance (i.e., in variables), you can use something a little more complicated:

# POSIX
# chr() - converts decimal value to its ASCII character representation
# ord() - converts ASCII character to its decimal value

chr() {
  [ ${1} -lt 256 ] || return 1
  printf \\$(printf '%03o' $1)
}

# Another version doing the octal conversion with arithmetic
# faster as it avoids a subshell
chr () {
  [ ${1} -lt 256 ] || return 1
  printf \\$(($1/64*100+$1%64/8*10+$1%8))
}

# Another version using a temporary variable to avoid subshell.
# This one requires bash 3.1.
chr() {
  local tmp
  [ ${1} -lt 256 ] || return 1
  printf -v tmp '%03o' "$1"
  printf \\"$tmp"
}

ord() {
  LC_CTYPE=C printf '%d' "'$1"
}

# hex() - converts ASCII character to a hexadecimal value
# unhex() - converts a hexadecimal value to an ASCII character

hex() {
   LC_CTYPE=C printf '%x' "'$1"
}

unhex() {
   printf \\x"$1"
}

# examples:

chr $(ord A)    # -> A
ord $(chr 65)   # -> 65

The ord function above is quite tricky.

  • Tricky? Rather, it's using a feature that I can't find documented anywhere -- putting a single quote in front of an integer. Neat effect, but how on earth did you find out about it? Source diving? -- GreyCat

    • It validates The Single Unix Specification: "If the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote." (see printf() to know more) -- mjf

1. More complete examples (with UTF-8 support)

  • The following example was submitted quite recently and needs to be cleaned up and validated.

1.1. Note about Ext Ascii and UTF-8 encoding

  • for values 0x00 - 0x7f Identical
  • for values 0x80 - 0xff conflict between UTF-8 & ExtAscii

  • for values 0x100 - 0xffff Only UTF-8 UTF-16 UTF-32
  • for values 0x100 - 0x7FFFFFFF Only UTF-8 UTF-32

    value

    EAscii

    UTF-8

    UTF-16

    UTF-32

    0x20

    "\x20"

    "\x20"

    \u0020

    \U00000020

    0x20

    "\x7f"

    "\x7f"

    \u007f

    \U0000007f

    0x80

    "\x80"

    "\xc2\x80"

    \u0080

    \U00000080

    0xff

    "\xff"

    "\xc3\xbf"

    \u00ff

    \U000000ff

    0x100

    N/A

    "\xc4\x80"

    \u0100

    \U00000100

    0x1000

    N/A

    "\xc8\x80"

    \u1000

    \U00001000

    0xffff

    N/A

    "\xef\xbf\xbf"

    \uffff

    \U0000ffff

    0x10000

    N/A

    "\xf0\x90\x80\x80"

    \ud800\udc00

    \U00010000

    0xfffff

    N/A

    "\xf3\xbf\xbf\xbf"

    \udbbf\udfff

    \U000fffff

    0x10000000

    N/A

    "\xfc\x90\x80\x80\x80\x80"

    N/A

    \U10000000

    0x7fffffff

    N/A

    "\xfd\xbf\xbf\xbf\xbf\xbf"

    N/A

    \U7fffffff

    0x80000000

    N/A

    N/A

    N/A

    N/A

    0xffffffff

    N/A

    N/A

    N/A

    N/A

  ###########################################################################
  ## ord family
  ###########################################################################
  # ord        <Return Variable Name> <Char to convert> [Optional Format String]
  # ord_hex    <Return Variable Name> <Char to convert>
  # ord_oct    <Return Variable Name> <Char to convert>
  # ord_utf8   <Return Variable Name> <Char to convert> [Optional Format String]
  # ord_eascii <Return Variable Name> <Char to convert> [Optional Format String]
  # ord_echo                      <Char to convert> [Optional Format String]
  # ord_hex_echo                  <Char to convert>
  # ord_oct_echo                  <Char to convert>
  # ord_utf8_echo                 <Char to convert> [Optional Format String]
  # ord_eascii_echo               <Char to convert> [Optional Format String]
  #
  # Description:
  # converts character using native encoding to its decimal value and stores
  # it in the Variable specified
  #
  #       ord
  #       ord_hex         output in hex
  #       ord_hex         output in octal
  #       ord_utf8        forces UTF8 decoding
  #       ord_eascii      forces eascii decoding
  #       ord_echo        prints to stdout
  function ord {
          printf -v "${1?Missing Dest Variable}" "${3:-%d}" "'${2?Missing Char}"
  }
  function ord_oct {
          ord "${@:1:2}" "0%c"
  }
  function ord_hex {
          ord "${@:1:2}" "0x%x"
  }
  function ord_utf8 {
          LC_CTYPE=C.UTF-8 ord "${@}"
  }
  function ord_eascii {
          LC_CTYPE=C ord "${@}"
  }
  function ord_echo {
          printf "${2:-%d}" "'${1?Missing Char}"
  }
  function ord_oct_echo {
          ord_echo "${1}" "0%o"
  }
  function ord_hex_echo {
          ord_echo "${1}" "0x%x"
  }
  function ord_utf8_echo {
          LC_CTYPE=C.UTF-8 ord_echo "${@}"
  }
  function ord_eascii_echo {
          LC_CTYPE=C ord_echo "${@}"
  }

  ###########################################################################
  ## chr family
  ###########################################################################
  # chr_utf8   <Return Variale Name> <Integer to convert>
  # chr_eascii <Return Variale Name> <Integer to convert>
  # chr        <Return Variale Name> <Integer to convert>
  # chr_oct    <Return Variale Name> <Octal number to convert>
  # chr_hex    <Return Variale Name> <Hex number to convert>
  # chr_utf8_echo                  <Integer to convert>
  # chr_eascii_echo                <Integer to convert>
  # chr_echo                       <Integer to convert>
  # chr_oct_echo                   <Octal number to convert>
  # chr_hex_echo                   <Hex number to convert>
  #
  # Description:
  # converts decimal value to character representation an stores
  # it in the Variable specified
  #
  #       chr                     Tries to guess output format
  #       chr_utf8                forces UTF8 encoding
  #       chr_eascii              forces eascii encoding
  #       chr_echo                prints to stdout
  #
  function chr_utf8_m {
    local val
    #
    # bash only supports \u \U since 4.2
    #

    # here is an example how to encode
    # manually
    # this will work since Bash 3.1 as it uses -v.
    #
    if [[ ${2:?Missing Ordinal Value} -le 0x7f ]]; then
      printf -v val "\\%03o" "${2}"
    elif [[ ${2} -le 0x7ff        ]]; then
      printf -v val "\\%03o\\%03o" \
        $((  (${2}>> 6)      |0xc0 )) \
        $(( ( ${2}     &0x3f)|0x80 ))
    elif [[ ${2} -le 0xffff       ]]; then
      printf -v val "\\%03o\\%03o\\%03o" \
        $(( ( ${2}>>12)      |0xe0 )) \
        $(( ((${2}>> 6)&0x3f)|0x80 )) \
        $(( ( ${2}     &0x3f)|0x80 ))
    elif [[ ${2} -le 0x1fffff     ]]; then
      printf -v val "\\%03o\\%03o\\%03o\\%03o"  \
        $(( ( ${2}>>18)      |0xf0 )) \
        $(( ((${2}>>12)&0x3f)|0x80 )) \
        $(( ((${2}>> 6)&0x3f)|0x80 )) \
        $(( ( ${2}     &0x3f)|0x80 ))
    elif [[ ${2} -le 0x3ffffff    ]]; then
      printf -v val "\\%03o\\%03o\\%03o\\%03o\\%03o"  \
        $(( ( ${2}>>24)      |0xf8 )) \
        $(( ((${2}>>18)&0x3f)|0x80 )) \
        $(( ((${2}>>12)&0x3f)|0x80 )) \
        $(( ((${2}>> 6)&0x3f)|0x80 )) \
        $(( ( ${2}     &0x3f)|0x80 ))
    elif [[ ${2} -le 0x7fffffff ]]; then
      printf -v val "\\%03o\\%03o\\%03o\\%03o\\%03o\\%03o"  \
        $(( ( ${2}>>30)      |0xfc )) \
        $(( ((${2}>>24)&0x3f)|0x80 )) \
        $(( ((${2}>>18)&0x3f)|0x80 )) \
        $(( ((${2}>>12)&0x3f)|0x80 )) \
        $(( ((${2}>> 6)&0x3f)|0x80 )) \
        $(( ( ${2}     &0x3f)|0x80 ))
    else
      printf -v "${1:?Missing Dest Variable}" ""
      return 1
    fi
    printf -v "${1:?Missing Dest Variable}" "${val}"
  }
  function chr_utf8 {
          local val
          [[ ${2?Missing Ordinal Value} -lt 0x80000000 ]] || return 1

          if [[ ${2} -lt 0x100 && ${2} -ge 0x80 ]]; then

                  # bash 4.2 incorrectly encodes
                  # \U000000ff as \xff so encode manually
                  printf -v val "\\%03o\%03o" $(( (${2}>>6)|0xc0 )) $(( (${2}&0x3f)|0x80 ))
          else
                  printf -v val '\\U%08x' "${2}"
          fi
          printf -v ${1?Missing Dest Variable} ${val}
  }
  function chr_eascii {
          local val
          # Make sure value less than 0x100
          # otherwise we end up with
          # \xVVNNNNN
          # where \xVV = char && NNNNN is a number string
          # so chr "0x44321" => "D321"
          [[ ${2?Missing Ordinal Value} -lt 0x100 ]] || return 1
          printf -v val '\\x%02x' "${2}"
          printf -v ${1?Missing Dest Variable} ${val}
  }
  function chr {
          if [ "${LC_CTYPE:-${LC_ALL:-}}" = "C" ]; then
                  chr_eascii "${@}"
          else
                  chr_utf8 "${@}"
          fi
  }
  function chr_dec {
          # strip leading 0s otherwise
          # interpreted as Octal
          chr "${1}" "${2#${2%%[!0]*}}" 
  }
  function chr_oct {
          chr "${1}" "0${2}"
  }
  function chr_hex {
          chr "${1}" "0x${2#0x}"
  }
  function chr_utf8_echo {
          local val
          [[ ${1?Missing Ordinal Value} -lt 0x80000000 ]] || return 1

          if [[ ${1} -lt 0x100 && ${1} -ge 0x80 ]]; then

                  # bash 4.2 incorrectly encodes
                  # \U000000ff as \xff so encode manually
                  printf -v val '\\%03o\\%03o' $(( (${1}>>6)|0xc0 )) $(( (${1}&0x3f)|0x80 ))
          else
                  printf -v val '\\U%08x' "${1}"
          fi
          printf "${val}"
  }
  function chr_eascii_echo {
          local val
          # Make sure value less than 0x100
          # otherwise we end up with
          # \xVVNNNNN
          # where \xVV = char && NNNNN is a number string
          # so chr "0x44321" => "D321"
          [[ ${1?Missing Ordinal Value} -lt 0x100 ]] || return 1
          printf -v val '\\x%x' "${1}"
          printf "${val}"
  }
  function chr_echo {
          if [ "${LC_CTYPE:-${LC_ALL:-}}" = "C" ]; then
                  chr_eascii_echo "${@}"
          else
                  chr_utf8_echo "${@}"
          fi
  }
  function chr_dec_echo {
          # strip leading 0s otherwise
          # interpreted as Octal
          chr_echo "${1#${1%%[!0]*}}" 
  }
  function chr_oct_echo {
          chr_echo "0${1}"
  }
  function chr_hex_echo {
          chr_echo "0x${1#0x}"
  }

  #
  # Simple Validation code
  #
  function test_echo_func {
    local Outcome _result
    _result="$( "${1}" "${2}" )"
    [ "${_result}" = "${3}" ] && Outcome="Pass" || Outcome="Fail"
    printf "# %-20s %-6s => "           "${1}" "${2}" "${_result}" "${3}"
    printf "[ "%16q" = "%-16q"%-5s ] "  "${_result}" "${3}" "(${3//[[:cntrl:]]/_})"
    printf "%s\n"                       "${Outcome}"


  }
  function test_value_func {
    local Outcome _result
    "${1}" _result "${2}"
    [ "${_result}" = "${3}" ] && Outcome="Pass" || Outcome="Fail"
    printf "# %-20s %-6s => "           "${1}" "${2}" "${_result}" "${3}"
    printf "[ "%16q" = "%-16q"%-5s ] "  "${_result}" "${3}" "(${3//[[:cntrl:]]/_})"
    printf "%s\n"                       "${Outcome}"
  }
  test_echo_func  chr_echo "$(ord_echo  "A")"  "A"
  test_echo_func  ord_echo "$(chr_echo "65")"  "65"
  test_echo_func  chr_echo "$(ord_echo  "ö")"  "ö"
  test_value_func chr      "$(ord_echo  "A")"  "A"
  test_value_func ord      "$(chr_echo "65")"  "65"
  test_value_func chr      "$(ord_echo  "ö")"  "ö"
  # chr_echo             65     => [                A = A               (A)   ] Pass
  # ord_echo             A      => [               65 = 65              (65)  ] Pass
  # chr_echo             246    => [      $'\303\266' = $'\303\266'     (ö)  ] Pass
  # chr                  65     => [                A = A               (A)   ] Pass
  # ord                  A      => [               65 = 65              (65)  ] Pass
  # chr                  246    => [      $'\303\266' = $'\303\266'     (ö)  ] Pass
  #


  test_echo_func  chr_echo     "65"     A
  test_echo_func  chr_echo     "065"    5
  test_echo_func  chr_dec_echo "065"    A
  test_echo_func  chr_oct_echo "65"     5
  test_echo_func  chr_hex_echo "65"     e
  test_value_func chr          "65"     A
  test_value_func chr          "065"    5
  test_value_func chr_dec      "065"    A
  test_value_func chr_oct      "65"     5
  test_value_func chr_hex      "65"     e
  # chr_echo             65     => [                A = A               (A)   ] Pass
  # chr_echo             065    => [                5 = 5               (5)   ] Pass
  # chr_dec_echo         065    => [                A = A               (A)   ] Pass
  # chr_oct_echo         65     => [                5 = 5               (5)   ] Pass
  # chr_hex_echo         65     => [                e = e               (e)   ] Pass
  # chr                  65     => [                A = A               (A)   ] Pass
  # chr                  065    => [                5 = 5               (5)   ] Pass
  # chr_dec              065    => [                A = A               (A)   ] Pass
  # chr_oct              65     => [                5 = 5               (5)   ] Pass
  # chr_hex              65     => [                e = e               (e)   ] Pass

  #test_value_func chr          0xff   $'\xff'
  test_value_func chr_eascii   0xff   $'\xff'
  test_value_func chr_utf8     0xff   $'\uff'      # Note this fails because bash encodes it incorrectly
  test_value_func chr_utf8     0xff   $'\303\277'
  test_value_func chr_utf8     0x100  $'\u100'
  test_value_func chr_utf8     0x1000 $'\u1000'
  test_value_func chr_utf8     0xffff $'\uffff'
  # chr_eascii           0xff   => [          $'\377' = $'\377'         (�)   ] Pass
  # chr_utf8             0xff   => [      $'\303\277' = $'\377'         (�)   ] Fail
  # chr_utf8             0xff   => [      $'\303\277' = $'\303\277'     (ÿ)  ] Pass
  # chr_utf8             0x100  => [      $'\304\200' = $'\304\200'     (Ā)  ] Pass
  # chr_utf8             0x1000 => [  $'\341\200\200' = $'\341\200\200' (က) ] Pass
  # chr_utf8             0xffff => [  $'\357\277\277' = $'\357\277\277' (���) ] Pass
  test_value_func ord_utf8     "A"           65
  test_value_func ord_utf8     "ä"          228
  test_value_func ord_utf8     $'\303\277'  255
  test_value_func ord_utf8     $'\u100'     256



  #########################################################
  # to help debug problems try this
  #########################################################
  printf "%q\n" $'\xff'                  # => $'\377'
  printf "%q\n" $'\uffff'                # => $'\357\277\277'
  printf "%q\n" "$(chr_utf8_echo 0x100)" # => $'\304\200'
  #
  # This can help a lot when it comes to diagnosing problems
  # with read and or xterm program output
  # I use it a lot in error case to create a human readable error message
  # i.e.
  echo "Enter type to test, Enter to continue"
  while read -srN1 ; do
    ord asciiValue "${REPLY}"
    case "${asciiValue}" in
      10) echo "Goodbye" ; break ;;
      20|21|22) echo "Yay expected input" ;;
      *) printf ':( Unexpected Input 0x%02x %q "%s"\n' "${asciiValue}" "${REPLY}" "${REPLY//[[:cntrl:]]}" ;;
    esac
  done

  #########################################################
  # More exotic approach 1
  #########################################################
  # I used to use this before I figured out the LC_CTYPE=C approach
  # printf "EAsciiLookup=%q" "$(for (( x=0x0; x<0x100 ; x++)); do printf '%b' $(printf '\\x%02x' "$x"); done)"
  EAsciiLookup=$'\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023'
  EAsciiLookup+=$'\024\025\026\027\030\031\032\E\034\035\036\037 !"#$%&\'()*+,-'
  EAsciiLookup+=$'./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghi'
  EAsciiLookup+=$'jklmnopqrstuvwxyz{|}~\177\200\201\202\203\204\205\206\207\210'
  EAsciiLookup+=$'\211\212\213\214\215\216\217\220\221\222\223\224\225\226\227'
  EAsciiLookup+=$'\230\231\232\233\234\235\236\237\240\241\242\243\244\245\246'
  EAsciiLookup+=$'\247\250\251\252\253\254\255\256\257\260\261\262\263\264\265'
  EAsciiLookup+=$'\266\267\270\271\272\273\274\275\276\277\300\301\302\303\304'
  EAsciiLookup+=$'\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323'
  EAsciiLookup+=$'\324\325\326\327\330\331\332\333\334\335\336\337\340\341\342'
  EAsciiLookup+=$'\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361'
  EAsciiLookup+=$'\362\363\364\365\366\367\370\371\372\373\374\375\376\377'
  function ord_eascii2 {
    local idx="${EAsciiLookup%%${2:0:1}*}"
    eval ${1}'=$(( ${#idx} +1 ))'
  }

  #########################################################
  # More exotic approach 2
  #########################################################
  #printf "EAsciiLookup2=(\n    %s\n)" "$(for (( x=0x1; x<0x100 ; x++)); do printf '%-18s'  "$(printf '[_%q]="0x%02x"' "$(printf "%b" "$(printf '\\x%02x' "$x")")" $x )" ; [ "$(($x%6))" != "0" ] || echo -en "\n    " ; done)"
  typeset -A EAsciiLookup2
  EAsciiLookup2=(
    [_$'\001']="0x01" [_$'\002']="0x02" [_$'\003']="0x03" [_$'\004']="0x04"
    [_$'\005']="0x05" [_$'\006']="0x06" [_$'\a']="0x07"   [_$'\b']="0x08"
    [_$'\t']="0x09"   [_'']="0x0a"      [_$'\v']="0x0b"   [_$'\f']="0x0c"
    [_$'\r']="0x0d"   [_$'\016']="0x0e" [_$'\017']="0x0f" [_$'\020']="0x10"
    [_$'\021']="0x11" [_$'\022']="0x12" [_$'\023']="0x13" [_$'\024']="0x14"
    [_$'\025']="0x15" [_$'\026']="0x16" [_$'\027']="0x17" [_$'\030']="0x18"
    [_$'\031']="0x19" [_$'\032']="0x1a" [_$'\E']="0x1b"   [_$'\034']="0x1c"
    [_$'\035']="0x1d" [_$'\036']="0x1e" [_$'\037']="0x1f" [_\ ]="0x20"
    [_\!]="0x21"      [_\"]="0x22"      [_\#]="0x23"      [_\$]="0x24"
    [_%]="0x25"       [_\&]="0x26"      [_\']="0x27"      [_\(]="0x28"
    [_\)]="0x29"      [_\*]="0x2a"      [_+]="0x2b"       [_\,]="0x2c"
    [_-]="0x2d"       [_.]="0x2e"       [_/]="0x2f"       [_0]="0x30"
    [_1]="0x31"       [_2]="0x32"       [_3]="0x33"       [_4]="0x34"
    [_5]="0x35"       [_6]="0x36"       [_7]="0x37"       [_8]="0x38"
    [_9]="0x39"       [_:]="0x3a"       [_\;]="0x3b"      [_\<]="0x3c"
    [_=]="0x3d"       [_\>]="0x3e"      [_\?]="0x3f"      [_@]="0x40"
    [_A]="0x41"       [_B]="0x42"       [_C]="0x43"       [_D]="0x4