Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!


Click here to return to the '10.4: A Finder plug-in to access file metadata' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
10.4: A Finder plug-in to access file metadata
Authored by: magnamous on Jun 29, '05 02:00:37AM
I have a feeling there are many academics and students in this situation .... but maybe not yet.
I'm sorry - I wasn't trying to suggest that your situation is irrelevant. I thought that your goal was to be able to content-search these files, so a flag as to whether they were text-searchable or not seemed irrelevant.
Some additional information as to my situation....acrobat has "capture page" which is effectively an OCR of the image... which then becomes searchable text. Also Omnipage will do this.... but given that I have 2000x pdf files (journal articles) that are text searchable, and another 1000 that are just images in pdf format... wondering if there is an underlying metadata tag that indicates whether the documents first 100 words have been indexed by spotlight... if there is that would be great.... I thought it might reside as a metadata tag.... but maybe not.
I've tried to figure out a way to search for PDF files that have any searchable text content, but
  1. There isn't a "text-searchable" metadata tag for PDFs in OS X by default that I am aware of (read this to find out how to make your own customized metadata tag - make sure to read the whole page so you don't get yourself into trouble down the road)

  2. I don't know enough about Spotlight syntax to know how to create a completely wildcard query (that is, one which searches for any text content). If anybody can improve on my solution, please do!

I've created a bit of a kludge that might get you what you want, but it'll take a while.

  1. First, you have to create some smart folders. Create a new smart folder. Set it up to search the folder or drive where the PDFs are stored, and to have 2 search criteria: Kind and Contents. Select PDF in Kind and type the letter "a" (no quotes) in Contents. (Alternately, you can choose "Other…" from the search criteria menu and search for rawquery (learn more about spotlight query syntax here). Select it as the search criterion. Put this in the text box field for that search: (kMDItemContentTypeTree == "com.adobe.pdf") && (kMDItemTextContent == "*a*"cd) - repeat this with other folders, replacing "*a*"cd with "*b*"cd, "*c*"cd, and so on.)

  2. You have to create 25 more Smart Folders like the last one - each with a different letter of the alphabet. If you want to include PDFs that might only have numbers as text, make smart folders that search for individual numbers in the Contents field. Make sure to set each one to search the folder or drive where the PDFs are stored.

  3. Once you've done that, you should be able to open each Smart Folder, and it'll show you all of the PDF files that have that particular letter or number in it's text contents. At this point, you can highlight all of the PDFs in the Smart Folder and hit Command-Option-I to get the Inspector window. In the Spotlight Comments section, type "Text-searchable" (no quotes). Repeat this for each of the smart folders that you created.

  4. Now, all of your PDFs that have text in them should have "Text-searchable" in the Spotlight Comments metadata tag. At this point, you should be able to trash all of those single-letter and single-number Smart Folders and replace them with only two: one which looks for "Text-searchable" in your giant folder of PDFs, and one that looks for files that do not have "Text-searchable" anywhere in them. If you prefer, you could also tag all of the PDFs in the not-text-searchable Smart Folder with the tag "Not-text-searchable" so that they're easier to find (then, of course, change the not-text-searchable Smart Folder's parameters to make use of that change).
Whew! Quite a lot of work. I hope I've explained it all correctly (and I hope it works!).

[ Reply to This | # ]
10.4: A Finder plug-in to access file metadata
Authored by: gmehl on Jun 29, '05 02:50:22PM

I'm impressed and very thankful. I'll give it a try right now and report back as to whether this works.... quite a workaround! and I really appreciate your effort and creative spirit. I really wish I had any programming background.

Thanks again.

G



[ Reply to This | # ]
10.4: A Finder plug-in to access file metadata
Authored by: gmehl on Jun 29, '05 03:14:13PM

A follow up after having used this suggestion:

Given that most of my documents (if not all of them) have the letter a in them somewhere, I only made one smart folder with content "a" -- worked like a charm.

On your excellent suggestion of adding comment, I identified an automator workflow that enables me to add "searchable" in the spotlight comments of multiple selected items....it can be found here: http://www.automatorworld.com/2005/05/03/add-spotlight-comments/

Thanks again.



[ Reply to This | # ]