Handling_contents_of

Contents

How can I read a file line-by-line?
How can I recursively search all files for a string?
How can I split a file into line ranges, e.g. lines 1-10, 11-20, 21-30?
How can I replace a string with another string in all files?
How can I randomize (shuffle) the order of lines in a file?

How can I read a file line-by-line?

    while read line
    do
        echo "$line"
    done < "$file"

The read command still modifies each line read, e.g. it removes all leading whitespace characters (blanks, tab characters). If that is not desired, the IFS (internal field separator) variable has to be cleared:

    OIFS=$IFS; IFS=
    while read line
    do
        echo "$line"
    done < "$file"
    IFS=$OIFS

As a feature, the read command concatenates lines that end with a backslash '\' character to one single line. To disable this feature, KornShell and BASH have read -r:

    OIFS=$IFS; IFS=
    while read -r line
    do
        echo "$line"
    done < "$file"
    IFS=$OIFS

Note that reading a file line by line this way is very slow for large files. Consider using e.g. AWK instead if you get performance problems.

Sometimes it's useful to read a file into an array, one array element per line. You can do that with the following example:

    O=$IFS IFS=$'\n' arr=($(< myfile)) IFS=$O

This temporarily changes the Input Field Separator to a newline, so that each line will be considered one field by read. Then it populates the array arr with the fields. Then it sets the IFS back to what it was before.

How can I recursively search all files for a string?

On most recent systems (GNU/Linux/BSD), you would use grep -r pattern . to search all files from the current directory (.) downward.

You can use find if your grep lacks -r:

    find . -type f -exec grep -l "$search" '{}' \;

The {} characters will be replaced with the current file name.

This command is slower than it needs to be, because find will call grep with only one file name, resulting in many grep invocations (one per file). Since grep accepts multiple file names on the command line, find can be instrumented to call it with several file names at once:

    find . -type f -exec grep -l "$search" '{}' \+

The trailing '+' character instructs find to call grep with as many file names as possible, saving processes and resulting in faster execution. This example works for POSIX find, e.g. with Solaris.

GNU find uses a helper program called xargs for the same purpose:

    find . -type f -print0 | xargs -0 grep -l "$search"

The -print0 / -0 options ensure that any file name can be processed, even ones containing blanks, TAB characters, or new-lines.

90% of the time, all you need is:

Have grep recurse and print the lines (GNU grep):

    grep -r "$search" .

Have grep recurse and print only the names (GNU grep):

    grep -r -l "$search" .

The find command can be used to run arbitrary commands on every file in a directory (including sub-directories). Replace grep with the command of your choice. The curly braces {} will be replaced with the current file name in the case above.

(Note that they must be escaped in some shells, but not in BASH.)

How can I split a file into line ranges, e.g. lines 1-10, 11-20, 21-30?

Some Unix systems provide the split utility for this purpose:

    split --lines 10 --numeric-suffixes input.txt output-

For more flexibility you can use sed. The sed command can print e.g. the line number range 1-10:

    sed -n '1,10p'

This stops sed from printing each line (-n). Instead it only processes the lines in the range 1-10 ("1,10"), and prints them ("p"). sed still reads the input until the end, although we are only interested in lines 1 though 10. We can speed this up by making sed terminate immediately after printing line 10:

    sed -n -e '1,10p' -e '10q'

Now the command will quit after reading line 10 ("10q"). The -e arguments indicate a script (instead of a file name). The same can be written a little shorter:

    sed -n '1,10p;10q'

We can now use this to print an arbitrary range of a file (specified by line number):

file=/etc/passwd
range=10
firstline=1
maxlines=$(wc -l < "$file") # count number of lines
while (($firstline < $maxlines))
do
    ((lastline=$firstline+$range+1))
    sed -n -e "$firstline,${lastline}p" -e "${lastline}q" "$file"
    ((firstline=$firstline+$range+1))
done

This example uses BASH and KornShell ArithmeticExpressions, which older Bourne shells do not have. In that case the following example should be used instead:

file=/etc/passwd
range=10
firstline=1
maxlines=`wc -l < "$file"` # count line numbers
while [ $firstline -le $maxlines ]
do
    lastline=`expr $firstline + $range + 1`
    sed -n -e "$firstline,${lastline}p" -e "${lastline}q" "$file"
    firstline=`expr $lastline + 1`
done

How can I replace a string with another string in all files?

sed is a good command to replace strings, e.g.

    sed 's/olddomain\.com/newdomain\.com/g' input > output

To replace a string in all files of the current directory:

    for i in *; do
        sed 's/old/new/g' "$i" > atempfile && mv atempfile "$i"
    done

GNU sed 4.x (but no other version of sed) has a special -i flag which makes the temp file unnecessary:

   for i in *; do
      sed -i 's/old/new/g' "$i"
   done

Those of you who have perl 5 can accomplish the same thing using this code:

    perl -pi -e 's/old/new/g' *

Recursively:

    find . -type f -print0 | xargs -0 perl -pi -e 's/old/new/g'

Finally, here's a script that some people may find useful:

    :
    # chtext - change text in several files

    # neither string may contain '|' unquoted
    old='olddomain\.com'
    new='newdomain\.com'

    # if no files were specified on the command line, use all files:
    [ $# -lt 1 ] && set -- *

    for file
    do
        [ -f "$file" ] || continue # do not process e.g. directories
        [ -r "$file" ] || continue # cannot read file - ignore it
        # Replace string, write output to temporary file. Terminate script in case of errors
        sed "s|$old|$new|g" "$file" > "$file"-new || exit
        # If the file has changed, overwrite original file. Otherwise remove copy
        if cmp "$file" "$file"-new >/dev/null 2>&1
        then rm "$file"-new              # file nas not changed
        else mv "$file"-new "$file"      # file has changed: overwrite original file
        fi
    done

If the code above is put into a script file (e.g. chtext), the resulting script can be used to change a text e.g. in all HTML files of the current and all subdirectories:

    find . -type f -name '*.html' -exec chtext {} \;

Many optimizations are possible:

use another sed separator character than '|', e.g. ^A (ASCII 1)
some implementations of sed (e.g. GNU sed) have an "-i" option that can change a file in-place; no temporary file is necessary in that case
the find command above could use either xargs or the built-in xargs of POSIX find

Note: set -- * in the code above is safe with respect to files whose names contain spaces. The expansion of * by set is the same as the expansion done by for, and filenames will be preserved properly as individual parameters, and not broken into words on whitespace.

A more sophisticated example of chtext is here: http://www.shelldorado.com/scripts/cmds/chtext

How can I randomize (shuffle) the order of lines in a file?

    randomize(){
        while read l ; do echo "0$RANDOM $l" ; done |
        sort -n |
        cut -d" " -f2-
    }

Note: the leading 0 is to make sure it doesnt break if the shell doesnt support $RANDOM, which is supported by BASH, KornShell, KornShell93 and POSIX shell, but not BourneShell.

The same idea (printing random numbers in front of a line, and sorting the lines on that column) using other programs:

    awk '
        BEGIN { srand() }
        { print rand() "\t" $0 }
    ' |
    sort -n |    # Sort numerically on first (random number) column
    cut -f2-     # Remove sorting column

This is faster then the previous solution, but will not work for very old AWK implementations (try "nawk", or "gawk", if available).

go back: BashFAQwTopics