How can I group entries (in a file by common prefixes)?
As in, one wants to convert:
foo: entry1 bar: entry2 foo: entry3 baz: entry4
to
foo: entry1 entry3 bar: entry2 baz: entry4
There are two simple general methods for this:
- sort the file, and then iterate over it, collecting entries until the prefix changes, and then print the collected entries with the previous prefix
- iterate over the file, collect entries for each prefix in an array indexed by the prefix
A basic implementation of a in bash:
old=xxx ; stuff= (sort file ; echo xxx) | while read prefix line ; do if [[ $prefix = $old ]] ; then stuff="$stuff $line" else echo "$old: $stuff" old="$prefix" stuff= fi done
And a basic implementation of b in awk, using a true multi-dimensional array:
{ a[$1,++b[$1]] = $2; } END { for (i in b) { printf("%s", i); for (j=1; j<=b[i]; j++) { printf(" %s", a[i,j]); } print ""; } }
Written out as a shell command:
awk '{a[$1,++b[$1]]=$2} END {for (i in b) {printf("%s", i); for (j=1; j<=b[i]; j++) printf(" %s", a[i,j]); print ""}}' file