Progress report for commands processing large files

How many times have you done

$ md5sum a-large-and-probably-important-file

and during the thumb-rolling wait, cursed yourself for not remembering to use the excellent pv – Pipe Viewer program ?

pv shows a nice progress report as well as a useful ETA time.

$ pv < debian-504-i386-CD-1.iso | md5sum 252MB 0:00:02 [ 130MB/s] [============> ] 38% ETA 0:00:03

The pv program is only an apt-get install pv away!

Yeah, pv is very good if you remember to use it….

Can we get a progress report after we started md5sum ?

Yes, Linux to the rescue! Every executing program has lots of virtual files inside the /proc file system. For example, /proc/1 contains information about the ‘init’-process. ‘init’ always has number 1, since it is the first program started after the kernel is loaded. The number is the process id, also called pid.

Hmm… if only we could get the pid-number of the md5sum program and maybe take a peek at some of those /proc files!

Start a md5sum on a large file in one terminal, and open another terminal.

$ pidof md5sum

Good, our md5sum program is still running and has, in this example, pid 25603. The directory /proc/25603/fdinfo contains virtual files describing the open files of that process. To print the contents of those files :

$ cat /proc/25603/fdinfo/\*
pos: 0
flags: 02
pos: 0
flags: 02
pos: 0
flags: 02
pos: 12562432
flags: 0100000

Note the line with “pos:12562432”. That is the progress of the file md5sum is crunching on.

Finally we can use the watch command, telling it to run cat every 2 seconds repeatedly:

$ watch "cat /proc/$(pidof md5sum)/fdinfo/\*"

watch runs until terminated with ctrl+c. When the md5sum is ready the display in watch will be empty, since the virtual files are now gone.

Written by SirPing