Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Quickly find any text string in any set of files UNIX
To quickly find any text string within any text file, try this from a terminal window:
grep -l  [text to find] [files to look in]
For example, grep -l 123abc *.html will list the name of any file in the current directory that ends in .html and contains the string 123abc.

(That's a lower-case-L following the GREP)

Quite powerful, and fairly fast. Now, if you have some spare time, and want to see what it can really do, try this:
su root
cd /
grep -lr "text to find" *
This will tell the OS to find the "text to find" in every file in every directory, all the way down through the tree. The -r flag tells grep to recursively search directories.

Of course, OS X has something like 26,000 files, so this can take a very long time!
  • Currently 1.68 / 5
  You rated: 2 / 5 (22 votes cast)

Quickly find any text string in any set of files | 3 comments | Create New Account
Click here to return to the 'Quickly find any text string in any set of files' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Welcome to the world of RegEx's
Authored by: babbage on Apr 25, '01 06:09:02PM
If the weird name throws you, "grep" is an acronym for "general regular expression
program". If that doesn't help, it's probably because you're wondering what a
regular expression ("re" or "regex") is. Basically, it's a pattern used to describe
a string of characters, and if you want to know aaaaaaall about them, I highly
recommend reading Mastering Regular Expressions by Jeffrey Friedl and
published by Unix ├╝ber-publisher O'Reilly & Associates.

Regexes (regices, regexen, ...the pluralization is a matter of debate) are an extremely
useful tool for any kind of text processing. Searching for patterns with grep is
most people's first exposure to them, as like the article says, you can use them to search
for a literal pattern within any number of text files on your computer. The cool thing is
that it doesn't have to be a literal pattern, but can be as complex as you'd like.

The key to this is understanding that certain characters are "metacharacters", which have
special meaning for the regex-using program. For example, a plus character (+) tells the
program to match one or more instances of whatever immediately precedes it, while parentheses
serve to treat whatever is contained as a unit. Thus, 'ha+' matches "ha", but it also matches
"haa" and "haaaaaaaaaaa", but not "hahaha". If you want to match the word "ha", you can use
'(ha)+' to match one or more instances of it, such as 'hahaha' and 'hahahahahahahahaha'.
Using a vertical bar allows alternate matching, so '(ha|ho)+' matches 'hohoho', 'hahaha', and
'hahohahohohohaha'. Etc.

There are many of these metacharacters to keep in mind. Inside brackets ([]), a carat (^)
means that you don't want to match whatever follows inside the brackets. For Magritte
fans, '[^(a cigar)]' matches any text that is not "a cigar". The rest of the time, the carat tells
the program to match only at the beginning of a line, while a dollar sign ($) matches only at
the end. Therefore, '^everything$' matches the word "everything" only when it is on a line all
by itself and '^[^(anything else)]' matches all lines that do not begin with "anything else".

The period (.) matches any character at all, and the asterisk (*) matches zero or more times.
Compare this to the plus, which matches one or more times -- a subtle but important
difference. A lot of regular expressions look for ".*", which is zero or more of anything
(that is, anything at all). This is useful when searching for two things that might or might
not have anything else (that you probably don't care about) between them: 'foo.*bar' will match
on 'foobar', 'foo bar' & 'foo boo a wop bop a lop bam boo bar'. Changing the previous example
to a plus, 'foo.+bar', requires that anything -- come between foo and bar, but it doesn't matter
what, so 'foobar' doesn't match but the other two examples given do match.

For details, try the man pages -- "man grep". There are a lot of different versions of the
program, so details may vary. All of this should be valid for OSX though.

Confusing? Maybe, but regular expressions aren't that bad when you get used to them, and
they can be a very useful tool to take advantage of it you know what you're doing. An example.

Let's say you have an website stored on your computer as a series of html documents.
As a cutting edge developer, you've seen the CSS light and want to delete all the
tags wherever they're just saying e.g. face="sans-serif" &/or size="12", because the
stylesheet can now do that for you. On the other hand, it's possible that the patterns
'face="sans-serif"' or 'size="12"' could show up in normal text (though admittedly
that's unlikely). In fact, what you really want to know is wherever those patterns show up in
a font tag, but you don't care about anywhere else that they might appear. Here's one way to
find that pattern:

grep -ir ']*(face="sans-serif"|size="12")' *.htm *.html

This does a number of things. The -i tells grep to ignore case (otherwise it's case sensitive,
and won't match 'FONT' if you're looking for 'font' or 'Font'). The -r tells it to recursively
descend through the directories from wherever the command starts -- in this case, all htm and
html files in the current directory. Everything in single quotes is the pattern we're matching.
We tell grep to match on any text that starts with " (thus staying within the font tag), and then either the face or
size definition that we're interested in. The one glitch here is that line breaks can break
things, though there are various ways around that. Finding them is left as the proverbial
exercise for the reader. :)

The next question is, what do you want to do with this information you've come up with?
Presumably you want to edit those files in order to fix them, right? With that in mind, maybe
it would be useful to just make a list of matches. Grep normally outputs all the lines that
match the pattern, but if you just want the filenames, use the -l switch. If you want to save
the results into a file, redirect the output of the command accordingly. With those changes,
we now have:

grep -irl ']*(face="sans-serif"|size="12")' *.htm *.html >font_files.txt

Great. But we can do better still. If you are comforable with the vi editor, you can call vi
with that command directly. The trick is to wrap the command in backticks (`). This is a cool
little Unix trick that runs the contained command & returns the result for whatever you want
to do with it. Thus you can simply put this command:

vi `grep -ir ']*(face="sans-serif"|size="12")' *.htm *.html`

The result of this command, as far as your tcsh shell is concerned, is something along the lines

vi index.html about.html contact.html music.html......

etc. The beautiful thing here is that if you quit vi & re-run the command later, it will be
able to effectively "pick up where you left off", since files you've already edited will
presumably no longer match the grep command.

And if you want to get really ambitious, you can use these techniques in ways that
allow you to do all your editing directly from the command line, without having to go into an
interactive editor such as vi or emacs or whatever. If you make it this far in your experiments,
then the next step is to learn to filter the results of a match and process the filtered data
in some way, using tools such as sed, awk, and perl. Using these tools, you can find all
instances of the pattern in question, break it down however you like, substitute or shuffle the
parts around however you like, and then build it all back up again. This is fun stuff! By this
point, you're getting pretty heavily into Unix arcana, and the best book that I've seen about
these tricks is O'Reilly's Unix Power Tools, by various authors. If you really want to leverage
the power of the tools that all Unixes come with, including OSX, then this is a great place to
both start & end up. There's plenty of material in there to keep you busy for months & years...

[ Reply to This | # ]

Quickly find any text string in any set of files
Authored by: ankh on Nov 21, '03 11:23:47PM

Is there a command to list the contents of a given directory, including contents of the subdirectories?

I'm trying to respond to a programmer -- I'm not one -- to help figure out why something isn't working.

And my carpal tunnel is too bad to retype all the filenames I can see (sigh).

[ Reply to This | # ]
Quickly find any text string in any set of files
Authored by: jayd on Nov 22, '03 01:04:30AM
ls -R
lists all files in a directory and its subdirectories (R for recursive).

[ Reply to This | # ]