Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!


Click here to return to the 'one-line perl version' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
one-line perl version
Authored by: SOX on Apr 09, '07 10:43:34AM
perl -e ' open FH, "mdfind -onlyin ~/ \"$ARGV[0]\"|"; while ($s=<FH>) {  chomp ($s) ; @x = grep {m/$ARGV[0]/} `mdimport -nfd2 $s \&> /tmp/crap ; cat /tmp/crap ` ; print "$s\n" if @x>0}   ' "centimeter measure"

the above one line perl code locates the phrase "centimeter measure" using the same approach as the python script. You can make an alias out of this. It will goof up if your phrase has an unescaped double-quotation mark in it. Note that it overwrites a temporary file called /tmp/crap when it runs. I had to create that temporary file because the silly behavior of mdimport does not write to a standard stream that is easy to capture.

[ Reply to This | # ]

one-line perl version
Authored by: SOX on Apr 09, '07 10:54:04AM
A couple of usage notes:

1) it's hardcoded to look in ~/ your home directory. This could obviously be changed to be an input parameter as well

2) don't forget to quote your phrase.

3) the script is multi-threaded, doing the mdfind and mdimport concurrenty. This flourish however is overhill in many cases because the slow step in the process is the MDimport.

4) it hunts the phrase you are seeking in all of the metadata, not just the text content. It would be easy to modify to restrict it to just the text content, but why would you want that.

5) if you want this to run lightning fast then just replace the mdimport with cat. like this:

perl -e ' open FH, "mdfind -onlyin ~/ \"$ARGV[0]\"|"; while ($s=<FH>) {  chomp ($s) ; $x = `cat "$s"` ; print "$s\n" if $x=~m/$ARGV[0]/}   ' "centimeter measure"

this will not use mdimport but just do a raw text search

[ Reply to This | # ]

one-line perl version
Authored by: CBrachyrhynchos on Apr 09, '07 07:42:50PM

Cat will choke on "files" that are really bundles (like mellel) and odf files which are zipped archives. Not to mention searching on the raw input of pdf files is an issue. (Doesn't seem to work for me.)

It's a cool one-liner, and my script does suffer from a bit of creeping featurism in its switches.



[ Reply to This | # ]
one-line perl version
Authored by: SOX on Apr 10, '07 10:42:34AM

well yeah the cat version is only good for plain text. But MDimport is dog slow, so when you know it's plain text. If one wanted to push things a bit one could run the files through `strings` first to remove all the binary crud and hope to get lucky finding a the phrase in a plain text file even if it was pdf or Word.doc format. It's so much faster than Mdimport that one could just do it as a pre-screen.

One feature that would be fun to add is a concept of "near" in addition to exact phrases.

@g = split /s+/, $ARGV[0];
$h = join ".[,20]+",@g
then match m/$h/
to find the words in the phrase order but insensitive to up to 20 intervening characters



[ Reply to This | # ]
one-line perl version
Authored by: SOX on Apr 10, '07 10:45:50AM

By the way, did you figure out what the heck is up with mdimport's output streams? It seems to be own of those crazy functions like top that can tell if it's being run in a terminal or redirected to a file or sent on a pipe, and then changes which stream it uses to to write the data. For example if you try to capture it's output using backtics in perl it still writes to the terminal directly, but if to redirect it to a file it does not write to the terminal.



[ Reply to This | # ]