Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!


Click here to return to the 'one-line perl version' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
one-line perl version
Authored by: CBrachyrhynchos on Apr 09, '07 07:42:50PM

Cat will choke on "files" that are really bundles (like mellel) and odf files which are zipped archives. Not to mention searching on the raw input of pdf files is an issue. (Doesn't seem to work for me.)

It's a cool one-liner, and my script does suffer from a bit of creeping featurism in its switches.



[ Reply to This | # ]
one-line perl version
Authored by: SOX on Apr 10, '07 10:42:34AM

well yeah the cat version is only good for plain text. But MDimport is dog slow, so when you know it's plain text. If one wanted to push things a bit one could run the files through `strings` first to remove all the binary crud and hope to get lucky finding a the phrase in a plain text file even if it was pdf or Word.doc format. It's so much faster than Mdimport that one could just do it as a pre-screen.

One feature that would be fun to add is a concept of "near" in addition to exact phrases.

@g = split /s+/, $ARGV[0];
$h = join ".[,20]+",@g
then match m/$h/
to find the words in the phrase order but insensitive to up to 20 intervening characters



[ Reply to This | # ]
one-line perl version
Authored by: SOX on Apr 10, '07 10:45:50AM

By the way, did you figure out what the heck is up with mdimport's output streams? It seems to be own of those crazy functions like top that can tell if it's being run in a terminal or redirected to a file or sent on a pipe, and then changes which stream it uses to to write the data. For example if you try to capture it's output using backtics in perl it still writes to the terminal directly, but if to redirect it to a file it does not write to the terminal.



[ Reply to This | # ]