Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!


Click here to return to the '10.4: Find potential duplicate files via Spotlight metadata' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
10.4: Find potential duplicate files via Spotlight metadata
Authored by: S Barman on Oct 13, '06 10:35:53PM
On the previous rewrite of the hint, the command md5sum is not a standard MacOS/Darwin command. I rewrote the script to use /sbin/md5. Also, rather than calling mdls three times, I rewrote the script to call it once. Then, I fixed the mdfind conditions (one had "=" and I changed it to "=="). Finally, rather than using sed, which has a lot of processing overhead, I am using cut to do the same thing. Overall, it cut a bit more than one second off the command execution on my system.

So, without further ado, here's my updated script:


#!/bin/bash
# dupecheck - identified potential duplicates of a file using Spotlight metadata
# by Derick Fay, October 2006
# Extended to check md5sums by Craig Hughes, October 2006
# Making more MacOS/Darwin standard and added speedups and efficiencies by Scott Barman

# Errors should be written to stderr (file designator 2) and exit with a
# non-zero status. I also like shortening the parsing!
[ -z $1 ] && echo "usage: $0 filename" >&2, exit 1
SEARCHFILE=$1

# Get the to-match MD5 sum
# /sbin/md5 is standard on MacOS/Darwin. The -q option just prints the MD5 value
MD5SUM=$(/sbin/md5 -q $SEARCHFILE)

# extract metadata from the file to be checked
# Let's do it with one command and pull the pieces out of the command.
# I use "set" to replace the command line and just parse the command line!
set $(mdls -name kMDItemFSSize -name kMDItemFSName -name kMDItemKind "$1")
name=$5
size=$8
kind=${11}	# braces needed because position > 9 (more than 2 char)

# Get possible matches
# do this by using $(..) to put the file names on the command line for md5
# which does not require xargs and another pipe!
echo "MD5-confirmed matches:"
mdfind -0 "kMDItemFSName == $name || (kMDItemFSSize == $size && kMDItemKind == $kind)" | xargs -0 /sbin/md5 -r | grep $MD5SUM  | cut -d ' ' -f 2

I love squeezing every last bit of efficiency out of scripts!! :-)

Scott

[ Reply to This | # ]

10.4: Find potential duplicate files via Spotlight metadata
Authored by: S Barman on Oct 14, '06 08:00:22AM
Oops... ignore the comment on using $(..) because it did not work. But other than that, it still cuts a bit more than a second off the search on my Powerbook G4!

[ Reply to This | # ]