Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Search for text in multiple Word files via the Terminal UNIX
Have you ever wanted to search for a string of text across multiple Word files? Paste this in the Terminal:

$ find . -name '*.doc' -exec grep -li 'danny the dog' {} \;
It will print the names of the files containg the string danny the dog, along with the directory in which they were found. Case is not sensitive thanks to -i option on grep.

[robg adds: The find . command will search down from the currently active directory. So if you open a new Terminal window and just enter the above command, it will search your user's entire Home directory structure. If your word files are elsewhere, you'll need to do a cd /path/to/Word/files first, then run the command. Tiger will apparently make this trick unnecessary, but until it's released, it's a handy shortcut (and if you do want case sensitivity, just leave off the i on the -li options string.]
    •    
  • Currently 2.67 / 5
  You rated: 5 / 5 (6 votes cast)
 
[21,460 views]  

Search for text in multiple Word files via the Terminal | 11 comments | Create New Account
Click here to return to the 'Search for text in multiple Word files via the Terminal' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Search for text in multiple Word files via the Terminal
Authored by: geohar on Feb 23, '05 08:58:25AM

Or add this alias to your tcsh aliases:

alias hgrep 'grep \!* -Ir .'

use as in

hgrep foo

finds all instances of foo in non-binary files from here down in the hierarchy

In other words, grep supports recursion, so no need for the find

grep 'danny the dog' -li -r .

does the same thing as the hint.



[ Reply to This | # ]
Even easier in zsh
Authored by: gidds on Feb 23, '05 09:09:22AM
Sorry if I've mentioned this before, but if you use the 'Z' shell (zsh) instead of csh or bash, then you don't need to use the 'find' command, as it has inbuilt recursive filename completion.  So you can get the same effect by simply typing
    grep -li 'danny the dog' **/*.doc
You can also restrict by file size, type, permissions — everything find can do and more.

zsh has lots of other great features, too — and it's free, open source, and supplied as part of Mac OS X.  (I'm surprised it's not more popular.)

---
Andy/

[ Reply to This | # ]

Even easier in zsh
Authored by: chtito on Feb 23, '05 02:49:54PM
great! I've been using zsh for years (and fully agree it's a terrific shell), was aware of the '**' feature but it never occured to me that it could be used to replace the rather clumsy find/grep combination. Thanks!

Let me also add that for those who are versed in regular expressions, egrep should be used in place of grep (as far as i understood).

[ Reply to This | # ]
Search for text in multiple Word files via the Terminal
Authored by: dsf on Feb 23, '05 09:25:14AM
find doesn't operate from the current directory down. The first argument to find is the directory to start from, and "." is the current directory. There's no need to cd elsewhere, you just change the first argument, eg
find /Users/foo/stuff -name \*.doc -exec grep -li 'quux' {} \;


[ Reply to This | # ]
Search for text in multiple Word files via the Terminal
Authored by: clith on Feb 23, '05 10:49:16PM
find doesn't handle spaces well. A command I often execute and have aliased is:
find . -type f -print0 | xargs -0 egrep (your pattern here)
The -print0 and the -0 argument to xargs cause the two tools to use zero-bytes to terminate file names instead of newlines, since the shell doesn't discriminate between newlines and other whitespace in many cases. This means it will handle folders and files with spaces, tabs and other strange characters in their name.

[ Reply to This | # ]
Search for text in multiple Word files via the Terminal
Authored by: miles_thatsme on Feb 23, '05 02:04:28PM

In response to the parenthetical comment, you don't need Tiger to do this now. You just need to open up the "Find" window in the Finder, select your home directory as the target of the search (likely already available in your selected places list), add "doc" to the extension criterion, then add the words in the content criterion. Nowhere near as fast as Tiger promises, but the find dialog window is largely keyboard-navigable, so you can save yourself a trip to Terminal.



[ Reply to This | # ]
i wish commenters would try what they advise
Authored by: Safar on Feb 23, '05 05:28:04PM

grep -r reads every file looking for the string, even in non .doc files ==> very slow !

As for the finder, it is enable to search inside word documents (at least for the moment, didn't know it was a tiger feature).



[ Reply to This | # ]
Search for text in multiple Word files via the Terminal
Authored by: Glide on Feb 24, '05 02:14:31AM

Once you find what you're looking for you might then want to change the sought after text to something else. Say all instances of 'dog' to 'cat' as one example...

find . -type f -name '*.doc' | xargs -n1 perl -p -i -e 's/dog/cat/g'

the perl code will edit in place and make a backup of the original file being edited at the time. All instances of 'dog' will now be 'cat'. Very handy...



[ Reply to This | # ]
A simple script
Authored by: SlewSys on Feb 24, '05 02:11:09PM
It is faster to use find(1) with xargs(1), but then the syntax is a little more complicated. Here is a script that reduces the syntax to a bare minimum for the most common usage - i.e., finding files containing strings. To use it, copy the text below and save it (e.g., with TextEdit) to a file named `pat' in your current path (.e.g., /usr/bin/pat or preferably someplace like /opt/bin/pat, if that's in your path). Examples of how the command is used are contained in the comments of the script itself. Don't forget to make the script executable, e.g., with the command line:

    $ chmod +x /whatever/path/pat
---- CUT HERE ----

#!/bin/sh -
#
#    @(#)pat
#
# This script greps files matching a pattern under the current folder.
#
# EXAMPLES
#
# To display a list of files under the current directory
# containing the word "hello" ignoring case, use:
#
#       $ pat -li  hello
#
# To display, under the current directory, the actual lines in files
# with the `.txt' extension containing either "hello" or "world":
#
#       $ pat 'hello|world' *.txt
#
PATH=/bin:/usr/bin

USAGE="usage: pat [egrep-options] egrep-pattern [file-glob ... [-- find-args]]"

typeset -i i

i=0
for arg; do
    case "$arg" in
    --) break ;;
    -*) egrep_arg[i++]="$arg" ;;
    *)  egrep_arg[i++]="$arg"
        break ;;
    esac
done

(( i>0 )) || { echo "$USAGE" >&2; exit 2; }
shift $i 

find_arg[i=0]="-type f"
for arg; do
    case "$arg" in
    --) (( ++i ))
        break ;;
    *)  if (( i==0 )); then
          find_arg[i++]="-a \( -name '$arg'"
        else
          find_arg[i++]="-or -name '$arg'"
        fi ;;
    esac
done

(( i>0 )) && find_arg[i]="\)" 
shift $i

eval find . "${find_arg[@]}" $@ -print0 | xargs -0 -n30 egrep "${egrep_arg[@]}"


[ Reply to This | # ]
A simple script
Authored by: cxd101 on Nov 15, '05 12:57:10PM

Thanks so much! I was looking for a 'UNIX' way to do this for some time.



[ Reply to This | # ]
Search for text in multiple Word files via the Terminal
Authored by: Jwink3101 on Feb 24, '05 04:01:55PM

This is very helpful. Too bad that you can't do it in finder or, better yet, in word 2004



[ Reply to This | # ]