I had an interesting challenge today: we use an application (Pinnacle) that uses tar files to save a collection of patient files as an archive. How can we tell which tar file contains which patients?
For each patient in the archive, there's a file called "Patient", so I initially started by getting these out and grepping them. Because tar doesn't allow a wildcard operator, you have to first build a list of the files (one per patient) in a sub-shell, then pass it to tar:
tar xfO 20131025_03_HD1.1.tar `tar tf 20131025_03_HD1.1.tar| grep 'Patient$'` | grep -i lastname
Note: you'll need to use gtar if you're on Solaris (on Solaris 10 boxes it's installed at /usr/sfw/bin/gtar) so you can extract the file's contents to STDOUT using xfO (that's a capital letter Oh, not a zero).
However, there's an easier way to manage this. At the start of each tar file, there's a file called Institution which stores header information about the patients contained in this archive file. The section we're interested in looks like this:
PatientLite ={
PatientID = 98765;
PatientPath = "Institution_123/Mount_0/Patient_98765";
MountPoint = "Mount_0";
FormattedDescription = "CLAUS&&Santa&&&&098765&&SL&&2013-10-23 11:20:03";
DirSize = 349.607;
};
PatientLite ={
PatientID = 12345;
PatientPath = "Institution_123/Mount_0/Patient_12345";
MountPoint = "Mount_0";
FormattedDescription = "CHRISTMAS&&Mary&&&&012345&&MAG&&2013-10-23 11:20:14";
DirSize = 262.177;
};
OK, so it shouldn't be too hard to get this info:
$ gtar xfO 20131025_03_HD1.1.tar Institution | awk -F'=' '$1 ~ /FormattedDescription/ {print $2}'
"CLAUS&&Santa&&&&098765&&SL&&2013-10-23 11:20:03";
"CHRISTMAS&&Mary&&&&012345&&MAG&&2013-10-23 11:20:14";
... clean up those ampersands:
$ gtar xfO 20131025_03_HD1.1.tar Institution | awk -F'=' '$1 ~ /FormattedDescription/ {print $2}' | sed -e 's/&/ /g'
"CLAUS Santa 098765 SL 2013-10-23 11:20:03";
"CHRISTMAS Mary 012345 MAG 2013-10-23 11:20:14";
Now let's write something to catalog a directory full of these tar files:
$ for TARFILE in *.tar; do echo "Filename: $TARFILE"; (gtar xfO "$TARFILE" Institution | awk -F'=' '$1 ~ /FormattedDescription/ {print $2}' | sed -e 's/&/ /g'); done
Filename: 20131025_03_HD1.1.tar
"EXAMPLE Fred 012345 SL 2013-10-23 11:20:03";
"EG Robert 123456 MAG 2013-10-23 11:20:14";
[etc]
Filename: 20131025_04_HD1.1.tar
"CITIZEN Jeanette 234567 SL 2013-10-23 11:20:03";
"ALIAS Dean 345678 MAG 2013-10-23 11:20:14";
[etc]
Filename: 20131025_05_HD1.1.tar
"MANCHU Fu 456789 SL 2013-10-23 11:20:03";
"KHAN Ghengis 567890 MAG 2013-10-23 11:20:14";
Hey presto!