Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

xargs and working with spaces in filenames UNIX
When I'm working on a project I don't know much about, I do this before I do the compiles, so that the file FILES only has the source code:
find . -type f -print > FILES
Then, on most Unix platforms, I can do something like this with FILES:
xargs egrep whatever < FILES
On the Mac, however, this often breaks because of spaces in file names. I know about the -print0 option in find, so I could have two files, FILES and perhaps FILES0, which I could produce with:
find . -type f -print0 > FILES0
I could then follow that with this:
xargs -0 egrep whatever < FILES0
But I just figured out another solution...

I can do this in one step like this:
tr '\n' '\0' < FILES | xargs -0 egrep whatever
And, this could be plopped into a script like:
#!/bin/sh

tr '\n' '\0' | xargs -0 "$@"
and maybe call it xargs0. Then I could do:
xargs0 egrep whatever < FILES
    •    
  • Currently 1.67 / 5
  • 1
  • 2
  • 3
  • 4
  • 5
  (9 votes cast)
 
[14,711 views]  

xargs and working with spaces in filenames | 10 comments | Create New Account
Click here to return to the 'xargs and working with spaces in filenames' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
-print0 and -0 are the only safe choice
Authored by: haa on Jul 21, '09 07:57:06AM

Filenames can also contain newlines (LF, \n) as well as other special characters, so replacing them isn't safe.

Unix filenames can contain any characters/bytes except NULs (this is why -print0 and -0 are safe). Space is not the only "weird" character out there.

When working with Unix find and xargs, using find ... -print0 and xargs -0 ... is the only safe choice. When working with shell scripts, carefully remember to put "" quotes around any variable names used as "$values".

Many people have started writing "shell" scripts with perl or other real programming language where you have more control of your data, to avoid working around the many "interesting" "features" of the Unix shell.

It is very easy to make files with \n in filenames, e.g. start bash and type
echo foo > 'barENTER
baz'ENTER

into it.

OSX ls shows the newline as ? for some safety by default. Use ls -lb to see the actual special characters in C-style or ls -lv for as-is display (try putting e.g. terminal escape sequences into file names for additional "fun").

[ Reply to This | # ]

-print0 and -0 are the only safe choice
Authored by: lar3ry on Jul 21, '09 09:14:58AM

On Unix, there are other "special" characters that cannot be present in file names. Forward slash (/) comes immediately to mind.

I think there are other special characters defined in POSIX that are not safe to have in file names, and the Mac may impose the restriction of colons as well.



[ Reply to This | # ]
-print0 and -0 are the only safe choice
Authored by: tim1724 on Jul 21, '09 03:23:29PM

On Unix, there are other "special" characters that cannot be present in file names. Forward slash (/) comes immediately to mind.

I think there are other special characters defined in POSIX that are not safe to have in file names, and the Mac may impose the restriction of colons as well.

Nope. POSIX only forbids slashes and nulls in filenames. Slashes because they're used for separating components of a pathname, and nulls because system calls that accept pathnames use null-terminated strings.

A particular filesystem may have other rules, particularly for non-ASCII characters, but POSIX doesn't say anything about that. It's true that HFS+ doesn't allow colons in filenames, but it does allow slashes. So if you put a colon in a filename, it will be turned into a slash when stored on an HFS+ disk. But when viewed via POSIX-compliant programs, such as "ls", you'll see it as a colon.

Try it yourself. Run touch : and then do an ls and see the result. You have a file called ":". Now look at it in the Finder. You'll see a file named "/".

Carbon file manipulation APIs use colon-delimited paths, and will see it as a file named "/". Cocoa file manipulation APIs (and traditional BSD APIs) use slash-delimited paths, and will see it as ":". On disk it's stored as "/" if it's a HFS+ disk, or ":" if it's a UFS disk.

It will always show up as ":" in command-line programs, but will nearly always show up as "/" in GUI programs. (The exception is in programs where POSIX paths are shown, such as the "Where" field in the Finder's info windows, or in many of the developer tools. Also, in any Cocoa program which shows raw pathnames, instead of using NSFileManager's -displayNameAtPath: method.)

---
Tim Buchheim

[ Reply to This | # ]

xargs and working with spaces in filenames
Authored by: Nem on Jul 21, '09 09:03:25AM

-print0 & -0 with find/xargs is UNIX 101 (well, maybe 102).


Why aren't you piping your find to xargs?

find . -type f -print0 | xargs -0 egrep whatever


I do something similar pretty much every other day. ;-)

---
Nem W. Schlecht
http://geekmuse.net/



[ Reply to This | # ]
xargs and working with spaces in filenames
Authored by: rbrtrx on Jul 21, '09 10:05:28AM

And, if you really need to have a copy of it in a file, you can use tee in there too:

find . -type f -print0 | tee FILES | xargs -0 egrep



[ Reply to This | # ]
xargs and working with spaces in filenames
Authored by: robleach on Jul 21, '09 01:19:09PM
You can also use the -exec or -execdir option to find:
find . -type f -execdir egrep whatever {} \;
Incidentally, is egrep the same as "grep -E"? Looks like it. Why does that bother me? ;-) Rob

[ Reply to This | # ]
xargs and working with spaces in filenames
Authored by: CarlRJ on Jul 21, '09 02:08:42PM

Yes, egrep is the same as "grep -E"; this is historical, grep was the original command, followed by fgrep ("fast grep") that was very quick but doesn't grok wildcards, and egrep ("enhanced grep") that understood full regular expressions. Later on, these three programs were folded back into one (with 3 hardlinks to the same executable), then the "-E" and "-F" options were added.

Related trivia: the name "grep" comes from "g/re/p" (where "re" is short for Regular Expression), a much-used command in the ed, ex, and vi editors to find lines matching the given regular expression anywhere in the file (i.e. "G"lobally), and "P"rint the result. Someone decided this capability would make a nifty command line utility, instead of starting up an editor to search a file, thus "grep".

Oh, and you're generally better off using a "+" instead of a ";" at the end of that find command for anything that's going to handle a lot of files, since the "+" version will run a minimal number of long command lines, each with as many filenames as possible (thus starting up grep, or whatever, a relatively small number of times), while the ";" version will run (grep or whatever) one time for every file (thus potentially many MANY more processes). As a bonus, since grep will get more than one filename at a time, it'll print the filenames at the start of each matching line.



[ Reply to This | # ]
xargs and working with spaces in filenames
Authored by: spfolly on Jul 21, '09 02:51:36PM

One of the points in the original post (and indeed the title!) - is working with *spaces* in filenames. Your example should have quotes around the {}



[ Reply to This | # ]
xargs and working with spaces in filenames
Authored by: CarlRJ on Jul 21, '09 03:50:41PM

Actually, no. Quotes on the command line are only to get around the shell's default command line processing. Whether or not you put quotes around {}, the argument that <tt>find</tt> gets will not have quotes. And it doesn't need them, as the command will not be reinterpreted by any more shells (where quoting might matter), it'll get fork'd and exec'd directly from find with no further processing of the arguments.



[ Reply to This | # ]
xargs and working with spaces in filenames
Authored by: Helge33 on Jul 29, '09 08:26:37AM

Thanks for these hints! I am still fighting with the actual *use* of a generated filelist. No matter how I assign a list like:

flist=`\find . -name .DS_Store -print0 | xargs -0`

Everytime I want to loop over this list it breaks in words and not in filenames! (i.e. for f in $flist ....)

I would like to write a little script which copies recursively each .DS_Store File from $topfolder1 to $topfolder2 (my backup space) and this for each subdirectory.

For a guru this is probably a two-liner...? :-)

Thanks, Helge



[ Reply to This | # ]