Handy find | xargs Construct
Authored by: MartySells on Sep 26, '04 04:18:36PM
The UNIX file command uses a "magic" set of data (/usr/share/file/magic) to identify file types. This is much smarter than expecting file extensions or Mac file types to identify file content. For instance, a JPEG may be named .JPG, .jpg or .jpeg or .JPEG.

Some examples: The following command will look for MP3 files:
    find . -type f -print0 | xargs -0  file | grep MP3
The first argument to find, . (dot), is the starting directory. A similar bit for finding JPEGs:
    find . -type f -print0 | xargs -0  file | grep JPEG
The find FIND_EXPRESSIONS -print0 | xargs -0 COMMAND is a nice construct that can be used for other things besides file. Some examples:
    find fed/ -name '*.xls' -print0 | xargs -0 zip ./
will find all *.xls files below the fedl/ directory and zip them into
    find fed/ -type f -print0 | xargs -0 du -sk | sort -n | tail -5
will show the five largest files below fed/

Two key features with using the find -print0 | xargs -0 approach are that the command you specify will get multiple filenames per invocation and that it's "safe" with filenames containing special characters like spaces and quotes.

Using a small example we can show that calling zip once with all filenames is faster than invoking it once per filename:
$ time for i in fed/* ; do zip -q "$i" ; done
real    0m0.232s
user    0m0.130s
sys     0m0.090s
eyeBook:~ msells$ time find fed/ -type f -print0 | xargs -0 zip -q
real    0m0.160s
user    0m0.100s
sys     0m0.030s
And also that an example of not using this technique which fails on filenames with spaces:
$ md5 -r `find fed -type f -print`
e8a731935dd19a18d7c2583ee14cd2b8 fed/269block.xls
030b4bf1ddd17de9131d54b5ddd52b7d fed/288BLOCK.XLS
5d57958ecb970e09b56006b0219bc9e1 fed/358BLK1.xls
d14d3c649673ee1db249491da5ce6f0b fed/681block.xls
md5: fed/743: No such file or directory
md5: block: No such file or directory
md5: South: No such file or directory
md5: Point.xls: No such file or directory
Many Unix commands support a null terminated list of filenames from find -print0 as input. While the syntax of xargs requires a bit of learning (RTFM) it is a very powerfull tool!

