Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Visual file content comparisons with Terminal and MD5 UNIX
Given a folder of files that are being copied from one place to another, sometimes it is difficult to see whether or not the files were copied in their entirety, i.e. a file transfer between computers is interrupted or whatnot. Yes, the file sizes might be different, but there's always the possibility that the contents of the files might be identical as far as the actual *amount* of data, but not for the *contents* of the data. Hence the use of MD5. I've also found this to be one of the few times when the 'transparency' option of Terminal.app is completely indispensible.

In Terminal, cd into the source directory. If it's just a series of files being copied, then typing MD5 * will result in a series of checksums being performed on every file in the directory. Now cd into the desintation directory, and repeat the command. A single byte change in any of the files will result in a different checksum being generated. If the checksums are identical, then the files are identical.

Now comes the fun part with Terminal and transparency. Instead of having to manually check the numbers produced by MD5, try this. Open two Terminal windows, one in the source directory, and one in the target. After you run the MD5 commands, make one window partially transparent, and then slide it over the other until the file names line up. The checksums should also line up. Any discrepancy between the two will be very obvious to see.

Yes, you could always do this:
source_dir: MD5 * > ~/MD5checksum1.txt 
target_dir: MD5 * > ~/MD5checksum2.txt
home_dir: diff MD5checksum2.txt MD5checksum1.txt
However, this visual method is far faster for the occasional time when this situation comes up. It's particularly useful when you need to compare files across great distances, instead of transferring them again 'just to be sure.'
    •    
  • Currently 2.00 / 5
  • 1
  • 2
  • 3
  • 4
  • 5
  (2 votes cast)
 
[11,396 views]  

Visual file content comparisons with Terminal and MD5 | 10 comments | Create New Account
Click here to return to the 'Visual file content comparisons with Terminal and MD5' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Visual file content comparisons with Terminal and MD5
Authored by: jsuen on Nov 28, '05 07:42:17AM

It's an awful amount of work when you can compare the first or last MD5 nibble. That will give you a 1/16 chance of missing an error... two will give you 1/256, three 1/4096. You can easily eyeball the first four, in which case you're down to 1/65536 of missing an error.



[ Reply to This | # ]
Visual file content comparisons with Terminal and MD5
Authored by: spiff on Nov 28, '05 09:20:22AM

As with a lot of hints, the time taken to write it up makes quick fast hints look longer than they actually take to do it. This 'trick' takes me all of 2 seconds to do or so.

This may be overkill for a small set of files, but when you're comparing a list of 'em, say 30-40, and the list fits in a single terminal window then it works quite nicely. It is faster than eyeballing a select part of the checksum, be it the beginning, middle or end and A:B comparing them that way.



[ Reply to This | # ]
diff -r
Authored by: hayne on Nov 28, '05 09:32:03AM

The value of the MD5 method explained in this hint is when you have remotely logged into the source machine, since then the access to the file contents is done on the remote machine without any need to transfer bytes over the network.

If the source disk is mounted on the local machine, then you might as well just use 'diff -r' to compare the source and destination folders. That will give you a more direct look at the differences (if any). However it will of course result in the file contents being transferred over the network again (to do the comparison) unless the disk contents have been cached by the local OS.



[ Reply to This | # ]
`md5sum -c`
Authored by: lullabud on Nov 28, '05 10:00:39AM
Using the tool `md5sum`, which is installable via fink and likely other methods, you can compare against the output of a previous md5sum list. For example:
md5sum * > checksums
cp * /tmp/
cd /tmp
md5sum -c checksums
This will return nothing if the files were copied correctly, and will return a message indicating a failure if there are any.

[ Reply to This | # ]
Visual file content comparisons with Terminal and MD5
Authored by: voisine on Nov 28, '05 10:23:36AM

Alternately you could just run diff on the two directories. Them unix hackers are pretty clever eh?



[ Reply to This | # ]
Visual file content comparisons with Terminal and MD5
Authored by: auricgoldfinger on Nov 28, '05 12:06:05PM

And you can even create a patch for it :)

diff -Naur old_file_or_dir new_file_or_dir > patch



[ Reply to This | # ]
diff
Authored by: sjk on Nov 29, '05 12:50:18PM
I wish diff (and several other Unix utilities) were smarter with traditional Mac-style newline translation, e.g. using a built-in tr '\r' '\n' filter.

[ Reply to This | # ]
backslash hell
Authored by: sjk on Nov 29, '05 12:57:22PM
Add backslashes in front of the 'r' and 'n' characters, like:
tr '\r' '\n'
(if I'm lucky).

[ Reply to This | # ]
Using rsync for robust file transfers
Authored by: jvinocur on Nov 28, '05 04:53:58PM
I do most of my file transfers with rsync these days, because it is smart about comparing existing files for differences and seamlessly resuming interrupted transfers. Other than that, the behavior and interface both resemble scp:
    rsync -aP jeff@example.org:/tmp/portrait.jpg ~/Pictures/
where of course either the source or the destination (or both) can be a path located on a remote host (transfer encrypted via ssh), you can omit the username if it's the same as on the local machine, and you can use wildards if you quote carefully.

The -a flag is for archiving (makes an exact copy, preserving timestamps, permissions, etc), and the -P flag gets you partial-download resumes and realtime progress stats on the download speed etc. See the man page for more details. Just don't get overwhelmed; has a lot of other powerful features, but you don't need to understand them to use it as an easy tool for robust file transfers.

[ Reply to This | # ]

Visual file content comparisons with Terminal and MD5
Authored by: adrianm on Nov 29, '05 09:14:10AM
If you have the dev tools installed, opendiff will give you a direct visual comparison of the files in the directories.

Not good for very big (or binary) files (too slow) but handy as you can show only diffs/files removed/added and so on, and a double-click gets you the actual differences in the files.

[ Reply to This | # ]