Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

10.4: Spotlight's rules for indexing plain text files System 10.4
Tiger only hintSpotlight is nice service, but in my opinion, it has very limited control options in GUI, so I had to start digging into the internals to solve some of its problems. Here is solution for one of them.

I've noticed that Spotlight doesn't want to index some plain text files in my home Documents folder. It turns out that, to be indexed, text files must meet rather specific conditions. In particular, they must have the extension .txt or .text, OR they should have file type set to TEXT. The simplest method, of course, would be to rename the files, but this is not always possible, so a second method is always a good option.

I don't know if it's possible to change a file's type from the GUI, so I use the command line utility SetFile from the Developer Tools package for this. SetFile -t TEXT filename will do the job and will make your file indexable.

To check if a given file (not only text files) is already indexed, you'll need to use the mdimport command. In the Terminal, type mdimport -d1 filename_to_test. If it says something like this:
2005-05-03 22:27:53.872 mdimport[336] Import 'filename' type
 'dyn.ah62d4rv4gk8z2addrf3u' no mdimporter
then Spotlight doesn't recognize the format (no mdimporter is the key part) of filename. If it says something like this:
2005-05-03 22:29:46.764 mdimport[338] Import 'filename' type 
 'com.apple.traditional-mac-plain-text' using
 'file://localhost/System/Library/Spotlight/RichText.mdimporter/'
Then it's all good and file is already indexed. Enjoy!
    •    
  • Currently 2.25 / 5
  You rated: 2 / 5 (4 votes cast)
 
[19,713 views]  

10.4: Spotlight's rules for indexing plain text files | 15 comments | Create New Account
Click here to return to the '10.4: Spotlight's rules for indexing plain text files' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
10.4: Spotlight's rules for indexing plain text files
Authored by: Safar on May 13, '05 11:57:59AM

does not work for me :
$ SetFile -t TEXT ~/test.php
$ GetFileInfo -t ~/test.php
"TEXT"
$ mdimport -d1 ~/test.php
2005-05-13 17:56:40.995 mdimport[2806] Import '/Users/mic/test.php' type 'public.php-script' no mdimporter



[ Reply to This | # ]
10.4: Spotlight's rules for indexing plain text files
Authored by: atverd on May 13, '05 12:36:47PM

This is an another case. The file's type actually recognized as php-script through it's extension and obviously the TEXT tag got overruled. The hint works only for files with unrecognized types, those which reported as dyn.* by mdimport.



[ Reply to This | # ]
10.4: Spotlight's rules for indexing plain text files
Authored by: atverd on May 13, '05 01:30:16PM

As I said before even though this is a different kind of issue I think I know how to fix it. We have to tell spotlight to treat scripts like plain text files. Edit file /System/Library/Spotlight/RichText.mdimporter/Contents/Info.plist, find this section and add public.php-script as shown below:

<dict>
<key>CFBundleTypeRole</key>
<string>MDImporter</string>
<key>LSItemContentTypes</key>
<array>
<string>public.rtf</string>
<string>public.html</string>
<string>public.xml</string>
<string>public.plain-text</string>
<string>com.apple.traditional-mac-plain-text</string>
<string>com.apple.rtfd</string>
<string>com.apple.webarchive</string>
<string>public.php-script</string>
</array>
</dict>

After saving the file run mdimport -d1 on some php script again and this time it should import it as a plain text. Worked for me.



[ Reply to This | # ]
10.4: Spotlight's rules for indexing plain text files
Authored by: Safar on May 13, '05 02:23:08PM

THANK YOU ! This is great, and should definitely have a hint of its own. I saw lots of places on the web where people try to achieve this sort of behavior.

I did exactly what you said, and then ran mdimport -f ~/webserver/ on my webserver to force spotlight to reindex my webserver, where all my php files rest.

Now i can search by content my php files.

I would just like to precise how you can get the type for a given extension (eg 'public.php-script' for php files). Run mdimport -d1 on a file with that extension , and observe the output.



[ Reply to This | # ]
10.4: Spotlight's rules for indexing plain text files
Authored by: jmichaelson on May 13, '05 07:40:43PM

I did a similar approach (I thought about posting this hint, but obviously I was beaten to the punch), but used the source code importer in /Library/Spotlight. All you need to do is add the file extension to the info.plist entry under CFBundleDocumentTypes > 0 > LSItemContentTypes. Just use the New Sibling button and under value, type the extension (I added .cgi, .pl, and .pm entries). Now Spotlight sees and indexes all my Perl stuff. You can add any other type of plain text extension, which makes this much more effective than the original hint here.



[ Reply to This | # ]
10.4: Spotlight's rules for indexing plain text files
Authored by: atverd on May 13, '05 08:25:23PM

I actually tried the source code importer first, but it looks like it's made specifically for C/C++/ Objective C/C++ and I'm not sure how it's going to behave with other languages. I think the plain-text importer is safer.



[ Reply to This | # ]
10.4: Spotlight's rules for indexing plain text files
Authored by: roger69 on May 13, '05 01:52:19PM

Anyone know if we can use this hint to allow indexing of Thunderbird mail files? They are just plain text.

I tried this on a Thunderbird mail folder file in my /Users/rjw/Library/Thunderbird/profiles/foo/Mail/Local Folders directory.

mdimport doesn't like it even after using SetFile. I don't know if that's because the mail file is just a name, without an extension, or what.

Any ideas?



[ Reply to This | # ]
10.4: Spotlight's rules for indexing plain text files
Authored by: atverd on May 13, '05 02:20:58PM

Run mdimport -d1 on some mail file and give me the output,
then try mdimport -f -d1 and give me the output again. This way I could tell you exactly why it doesn't work.



[ Reply to This | # ]
10.4: Spotlight's rules for indexing plain text files
Authored by: roger69 on May 13, '05 07:47:43PM

Here's the output of both:

[code]
rjw$ mdimport -d1 Orders
2005-05-13 16:45:07.319 mdimport[15269] Import '/Users/rjw/Library/Thunderbird/Profiles/3pto0pgb.default/Mail/Local Folders/Orders' type 'dyn.ah62d4rv4gk8zkvnxnu' no mdimporter

rjw$ mdimport -f -d1 Orders
2005-05-13 16:45:14.407 mdimport[15271] Import '/Users/rjw/Library/Thunderbird/Profiles/3pto0pgb.default/Mail/Local Folders/Orders' type 'dyn.ah62d4rv4gk8zkvnxnu' no mdimporter
[/code]

Er, I could be missing something but these appear identical to me.

?

Roger



[ Reply to This | # ]
10.4: Spotlight's rules for indexing plain text files
Authored by: atverd on May 13, '05 08:20:41PM

Ok, it should work. Check with GetFileInfo if the Orders file indeed has type TEXT and then run "mdimport -d1 Orders".
But there is another problem actually - since thunderbird saves multiple emails to single file the spotlight search will be inefficient - it will show you that file Inbox contains the word, but there is no way to show that this is the message #321 out of 34182 messages in total. This is how spotlight works and even Mail.app has this problem - Apple had to use some dirty hack to make it work as people would expect. So I'd stay with thunderbird's internal search.



[ Reply to This | # ]
10.4: Spotlight's rules for indexing plain text files
Authored by: roger69 on May 13, '05 08:47:34PM

Nuts. That's a good point. I guess I'll have to wait for someone to hack Thunderbird. I have lots of personal mail saved there and the search just sucks. It would be nice to have them indexed.

Roger



[ Reply to This | # ]
10.4: Spotlight's rules for indexing plain text files
Authored by: jmichaelson on May 14, '05 02:32:03AM

You don't need to wait. I use the autosave extension in Thunderbird that automatically saves incoming messages into individual .eml files. The methods discussed here could be used to index the .eml files as plain text, but the problem is that Thunderbird has a longstanding bug that prevents it from properly opening .eml files when they're double-clicked. So Spotlight would return the proper .eml file, but you'd have to use a text editor to open it.



[ Reply to This | # ]
10.4: Spotlight's rules for indexing plain text files
Authored by: Tom Robinson on May 14, '05 03:41:11PM

Apple's 'dirty hack' was to save each mail message as a separate file. In iCal they still have a single file per calendar, but the cache folder gets a new file for every entry.



[ Reply to This | # ]
10.4: Spotlight's rules for indexing plain text files
Authored by: sandrift on May 13, '05 02:35:12PM
You can change creater/type codes via A Better Finder Creators & Types, available at: http://www.publicspace.net/ABetterFinderCreatorsAndTypes/

[ Reply to This | # ]
10.4: Spotlight's rules for indexing plain text files
Authored by: Guntis on Aug 06, '05 01:04:24PM

I use PearLyrics program to fetch song lyrics (from Internet) played in iTunes. What's funny, is that Spotlight doesn't find PearLyrics lyrics files in txt format in my Documents folder! Specifically, I enter "All I remember is a smile at the top of every working morning" (with the quoation marks) and Spotlight finds nothing, but this text is copy/paste from the lyrics file I see on my screen. So what's good that I have Spotlight if it doesn't index even txt files (they *have* .txt extension)??? I tried to save this lyrics file once more to the same folder, hoping that doing so it will be indexed, but no - still no search results...

---
I'm not really a Windows user...I just play one at work.



[ Reply to This | # ]