Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Prevent locate's update from updating certain folders UNIX
NOTE: This is an advanced hint, so beginners should not do this.

Even if you never use the unix locate command, OS X still runs in the background, updating the database for this command to quickly search your hard drive. It often turns out you don't want the unix locate command to search certain file systems or directories. The database it uses is updated by the locate.updatedbcommand that runs automatically as a cron job as part of your periodic maintainence. This hint is about how to tell locate.updatedb to avoid a filesystem, and it involves modestly editing a system file. My solution is to create a special Group-ID that tells updatedb not to index beyond a certain point; thus any folders or files below that initial folder with the special group ID are protected from being indexed.

MOTIVATION: I have a large 400GB FireWire drive on my computer that I use for backup images. It not only contains lots of files, but it also contains lots of hard linked files (created for differential backups). Running a find command on this filesystem can take hours. And since these are multiple backup copies of my principal directories, I dont want them in locate's database anyhow. Read on for the step-by-step solution...

  1. Create a plain text file called mypatch containing the following lines exactly:
    42a43
    > set EXCLUDE_GROUP = 399         # prune any path with a directory whose group is this name
    63c64
    < find ${SRCHPATHS} \( ! -fstype local -o -fstype fdesc -o -fstype devfs \) -a \
    ---
    > find ${SRCHPATHS} \( ! -fstype local -o -fstype fdesc -o -fstype devfs -o -group ${EXCLUDE_GROUP} \) -a \
    
  2. Open a terminal window and type:
    sudo patch --backup /usr/libexec/locate.updatedb  mypatch
    
    Note that the --backup flag will cause patch to create a copy of the original that you should keep in case you want to revert this. You can also use patch to revert with the -R flag).
  3. [optional] Use Netinfo Manager to create a new Group. Call it whatever you wish, but give it a GID number of 399 as this is what the patch is expecting. I called mine noindx. The easiest way to do this reliably is to select and existing group, duplicate it, then edit the duplicate and save it.
  4. Now for any folder or file system you want locate.updatedb to avoid descending into, simply change its group ID to 399. The locate database will then not search any files or folders that are located underneath that path. Note you do not have to change the group-ID of all of the files you do not want indexed. You only need to change the group-ID of the top-level folder containing them.
NOTE: You do not need to take this precaution for remote filesystems or other kinds of mounted devices, as locate.updatedb already avoids these in its indexing. It's only the large local disks you need to worry about.

CAVEAT: Editing system files is not something to undertake lightly. This is a fairly benign change, but don't do this if you are new to Unix. Keep the backup files (making your own is even a better idea). I had no choice, since update database ran for eight hours on my machine!
    •    
  • Currently 2.86 / 5
  You rated: 1 / 5 (7 votes cast)
 
[15,419 views]  

Prevent locate's update from updating certain folders | 5 comments | Create New Account
Click here to return to the 'Prevent locate's update from updating certain folders' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
A simpler way
Authored by: astfgl on Jul 08, '04 11:20:02AM
You can also just chmod o-rwx any directories you want excluded; as the weekly job runs the locate.updatedb as user nobody it can't see into directories unless they are world-readable/executable. Again, this means you only need to change the top-level directory, not every directory under it.

This might cause a problem if you need all users to be able to see the files in the directory, I suppose.

[ Reply to This | # ]
Previously posted alternate method
Authored by: jecwobble on Jul 08, '04 11:39:24AM
Depending on your specific needs, this previouse hint may also be of interest. I use it so that only my local computer (no external hard drives) is indexed.

[ Reply to This | # ]
Anticipating your comments
Authored by: SOX on Jul 08, '04 02:12:28PM

1) This hint was aimed at the case I have an external volume or large shared filesystem that I am trying to keep from being indexed. I cant just make this inaccessible to all users besides the owner and group; this is why locate.updatedb is indexing it in the first place.

2) A previous hint had a comment that suggested using the --prune option in locate.updatedb instead of editing the system file. Unfortunatley there is no such option in Mac OSX. The man page mentions it but it is incorrect. Just look at the code yourself if you doubt this.

3) This approach is general, as opposed to editing the command to explictly exclude a particular named directory. If one only had a single specific case to worry about then this wold work. That is a previous hint suggested adding the filter -regex "/Volumes" to avoid indexing external volumes.



[ Reply to This | # ]
Prevent locate's update from updating certain folders
Authored by: bluehz on Jul 12, '04 01:51:35AM

The GNU locate and also slocate (which is now the standard in the Linux world) both offer pruning options for defining items to not scan into the db. I recommend slocate - which is a secure version of locate allowing users to only search for files they have permissions to access. Installing slocate is a bit beyond the scope of this hint and really requires its own thread so I will post to the site and maybe it will show up in a few days as a hint.

The GNU version of locate only seems to come as part of the findutils source code - this includes find, xargs, and locate. You can install the complete pkg through Fink:

fink install findutils

or use the following script to install it manually if you don't have Fink - optionally only installing the locate/updatedb components. You will still need the Developer tools installed to compile the source code.

#!/bin/sh

mkdir ~/Desktop/findutils-build
cd findutils-build

# downloads source files
curl -O ftp://alpha.gnu.org/gnu/findutils/findutils-4.1.20.tar.gz
curl "http://cvs.sourceforge.net/viewcvs.py/*checkout*/fink/dists/10.3/stable/main/finkinfo/utils/findutils.patch" -o findutils.patch
tar -zxvf findutils-4.1.20.tar.gz

# patch source files
patch -p1 -b -d findutils-4.1.20 < findutils.patch
cd findutils-4.1.20
sed 's/@PREFIX@/usr\/local/g' findutils.cron > findutils.cron.tmp
mv findutils.cron.tmp findutils.cron
chmod 755 findutils.cron

# build and install
./configure CFLAGS=-DHAVE_F_FSTYPENAME_IN_STATFS
make
sudo make install-strip

# If you only want to install locate and updatedb
# comment out the above "make install" line
# and uncomment the following lines
#
# mkdir build 
# sudo make install-strip DESTDIR=$cwd/build 
# sudo cp -p build/usr/local/man/man1/updatedb.1 /usr/local/man/man1 
# sudo cp -p build/usr/local/man/man1/locate.1 /usr/local/man/man1 
# sudo cp -p build/usr/local/man/man5/locatedb.5 /usr/local/man/man5 
# sudo cp -p build/usr/local/bin/locate /usr/local/bin 
# sudo cp -p build/usr/local/bin/updatedb /usr/local/bin 

# install crontab and updatedb.conf
sudo cp -p findutil.cron /etc/periodic/daily
sudo echo > /etc/updatedb.conf <<EOF "# /etc/updatedb.conf:  updatedb configuration file

PRUNEPATHS=\"/tmp /usr/tmp /var/tmp /afs /net\""
EOF

echo "Information at http://www.gnu.org/software/findutils/manual/html_chapter/find_7.html#SEC73"


[ Reply to This | # ]
Prevent locate's update from updating certain folders
Authored by: bluehz on Jul 12, '04 02:09:57AM

Ooops - forget about the part "optionally installing only locate/updatedb". I just realized that the complete findutils pkg is required for updatedb to work properly



[ Reply to This | # ]