How can I group entries (in a file by common prefixes)?
As in, one wants to convert:
foo: entry1
bar: entry2
foo: entry3
baz: entry4to
foo: entry1 entry3
bar: entry2
baz: entry4There are two simple general methods for this:
- sort the file, and then iterate over it, collecting entries until the prefix changes, and then print the collected entries with the previous prefix
- iterate over the file, collect entries for each prefix in an array indexed by the prefix
A basic implementation of a in bash:
old=xxx ; stuff=
(sort file ; echo xxx) | while read prefix line ; do
if [[ $prefix = $old ]] ; then
stuff="$stuff $line"
else
echo "$old: $stuff"
old="$prefix"
stuff=
fi
done And a basic implementation of b in awk, using a true multi-dimensional array:
{
a[$1,++b[$1]] = $2;
}
END {
for (i in b) {
printf("%s", i);
for (j=1; j<=b[i]; j++) {
printf(" %s", a[i,j]);
}
print "";
}
}Written out as a shell command:
awk '{a[$1,++b[$1]]=$2} END {for (i in b) {printf("%s", i); for (j=1; j<=b[i]; j++) printf(" %s", a[i,j]); print ""}}' file