[EnglishFrontPage] [TitleIndex] [WordIndex

This is a .torrent file parser. It was written for fun/challenge purposes, and is not necessarily intended to be useful for any serious purpose.

Torrent files are "bencoded". Writing a bencode parser in bash poses several difficult challenges, and pushes against the limitations of the language.

The approach used here is to parse the data stream and store the contents as fields within an associative array. The structure of the fields is determined dynamically during the parsing, which is by necessity recursive (a dictionary can contain a list, which can contain a dictionary, which can contain a list, etc.). Once the associative array has been built, all the fields become available for any purpose, assuming the application can determine how to identify the field(s) of interest.

This approach would also work well for parsing an XML document, I think. I haven't actually tried it yet.

First, here is the output. You can see the hierarchical structure of the field-namespace contained within the .torrent file. The dots between namespace elements were chosen arbitrarily, and could be replaced by another delimiter. I chose the dot partly because the .torrent namespace is reminiscent of the window namespace in Tk. The way I've represented lists with numeric namespace components is a design choice I made.

I chose to sort the output lexicographically, to keep related elements together; the order of the fields shown here is almost the same as the order inside the .torrent file (the dictionaries must be sorted lexicographically, but my use of .n. for lists, without zero-padding, means the lists are printed out of sequence if they have more than 10 elements).

$ ben /stuff/tori2007-09-11.torrent 
.announce                http://tracker.toritraders.com/announce.php
.comment                 2007-09-11
.created by              BitsOnWheels 1.0.6
.creation date           1192017330
.encoding                UTF-8
.info.files.0.length     1083
.info.files.0.path.0     Tori Amos 2007-09-11.txt
.info.files.1.length     1276
.info.files.1.path.0     tori2007-09-11.ffp
.info.files.10.length    40970187
.info.files.10.path.0    Tori2007-09-11d1t09.flac
.info.files.11.length    2100576
.info.files.11.path.0    Tori2007-09-11d1t10.flac
.info.files.12.length    37919531
.info.files.12.path.0    Tori2007-09-11d1t11.flac
.info.files.13.length    39882119
.info.files.13.path.0    Tori2007-09-11d1t12.flac
.info.files.14.length    32640182
.info.files.14.path.0    Tori2007-09-11d1t13.flac
.info.files.15.length    21839105
.info.files.15.path.0    Tori2007-09-11d1t14.flac
.info.files.16.length    35343829
.info.files.16.path.0    Tori2007-09-11d2t01.flac
.info.files.17.length    10502680
.info.files.17.path.0    Tori2007-09-11d2t02.flac
.info.files.18.length    22704770
.info.files.18.path.0    Tori2007-09-11d2t03.flac
.info.files.19.length    27933648
.info.files.19.path.0    Tori2007-09-11d2t04.flac
.info.files.2.length     36546205
.info.files.2.path.0     Tori2007-09-11d1t01.flac
.info.files.20.length    37142439
.info.files.20.path.0    Tori2007-09-11d2t05.flac
.info.files.21.length    48347811
.info.files.21.path.0    Tori2007-09-11d2t06.flac
.info.files.22.length    42226577
.info.files.22.path.0    Tori2007-09-11d2t07.flac
.info.files.23.length    38228003
.info.files.23.path.0    Tori2007-09-11d2t08.flac
.info.files.3.length     44009544
.info.files.3.path.0     Tori2007-09-11d1t02.flac
.info.files.4.length     27210926
.info.files.4.path.0     Tori2007-09-11d1t03.flac
.info.files.5.length     28364865
.info.files.5.path.0     Tori2007-09-11d1t04.flac
.info.files.6.length     29017426
.info.files.6.path.0     Tori2007-09-11d1t05.flac
.info.files.7.length     31116854
.info.files.7.path.0     Tori2007-09-11d1t06.flac
.info.files.8.length     26280337
.info.files.8.path.0     Tori2007-09-11d1t07.flac
.info.files.9.length     29748219
.info.files.9.path.0     Tori2007-09-11d1t08.flac
.info.name               Tori2007-09-11
.info.name.utf8          Tori2007-09-11
.info.piece length       524288

And here is the script:

   1 #!/bin/bash
   2 export LC_ALL=C
   3 
   4 # Filename ($1)
   5 # Results go into global associative array ben
   6 declare -A ben
   7 benparse() {
   8   local data skip p max
   9   [[ -r $1 ]] || { echo "cannot read file '$1'"; return 1; }
  10   IFS= read -rd '' data < <(tr \\0 \\1 <"$1")
  11   max=${#data}
  12 
  13   # Begin parsing file at offset 0, namespace ""
  14   bp_parse 0 ""
  15 }
  16 
  17 # Starting offset ($1), starting namespace ($2)
  18 # Uses global AA ben, module variable data
  19 # Return parsed data in module var p
  20 # Return total char length of parsed data in module var skip
  21 bp_parse() {
  22   (($1 >= max)) && return
  23   case "${data:$1:1}" in
  24   d)
  25     # Data dictionary, terminated by "e".  Get pairs.
  26     local i=$1 j=$(($1 + 1)) key value
  27     while ((j < max)) && [[ ${data:j:1} != e ]]; do
  28       bp_parse $j "$2."
  29       key=$p
  30       ((j+=skip))
  31       bp_parse $j "$2.$key"
  32       value=$p
  33       ((j+=skip))
  34       [[ $value ]] && ben["$2.$key"]=$value
  35     done
  36     p=""        # We populate the AA ourselves, rather than passing data back
  37     skip=$((j-i+1))
  38     ;;
  39   i)
  40     # Integer, terminated by "e"
  41     local i=$1 j=$(($1 + 1))
  42     while [[ ${data:j:1} != e ]]; do
  43       ((j++))
  44     done
  45     p=${data:i+1:j-i-1}
  46     skip=$((j-i+1))
  47     ;;
  48   l)
  49     # List, concatenated elements, terminated by "e"
  50     local i=$1 j=$(($1 + 1)) k=0 value
  51     while [[ ${data:j:1} != e ]]; do
  52       bp_parse $j "$2.$k"
  53       [[ $p ]] && ben["$2.$k"]=$p
  54       ((k++, j+=skip))
  55     done
  56     p=""
  57     skip=$((j-i+1))
  58     ;;
  59   *)
  60     # String, length-prefixed (integer, colon).  Get the length first.
  61     local n n_len
  62     bp_getnum $1
  63     n_len=${#n}
  64     p=${data:$1+n_len+1:n}
  65     skip=$((n_len+1+n))
  66     ;;
  67   esac
  68 }
  69 
  70 # Find an integer in data, beginning at offset ($1)
  71 # Return value in upstream variable n
  72 bp_getnum() {
  73   local i=$1 j=$1
  74   while [[ ${data:j:1} = [[:digit:]-] ]]; do
  75     ((j++))
  76   done
  77   n=${data:i:j-i}
  78 }
  79 
  80 benparse "$1"
  81 # Dump the AA, indices in string-sorted order
  82 # Skip the big binary blob
  83 printf "%s\n" "${!ben[@]}" | sort | while IFS= read -r idx; do
  84   ((${#ben["$idx"]} > 80)) && continue
  85   printf "%-24.24s %s\n" "$idx" "${ben["$idx"]}"
  86 done

Here are some points of interest in the code:


2012-07-01 04:11