Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Remove multiple characters from filenames UNIX
Here's a method for 'squeezing' filenames that contain duplicate runs of characters; the results are similar as to when doing a 'tr -s'. For example, suppose you have a directory full of files named like this:
   LOG_FILE___STATUS1__0001.TXT
and would like to eliminate the multiple underscore characters so that the files are named this way instead:
   LOG_FILE_STATUS1_0001.TXT
A fairly quick way to do this is from the shell. Open a terminal and go to the directory that contains your files, then use:
find . -type f -name '*__*' | awk -F\? '{ s=$1 ; gsub \
( "_+","_",s ) ; print "echo n \| mv -i","\""$1"\"","\""s"\""}' \
| /bin/sh
NOTES:
  • The command is shown on three lines with continuation marks; if you have any trouble with it, copy and paste it one piece at a time onto one row.
  • I prefer to use 'find' to generate file lists instead of 'ls' because you have greater control over what gets matched. In this case, we want the names of files only, not subdirectories, etc.
  • The 'echo n | mv -i' section ensures that this command will safely fail when attempting to rename/overwrite an existing file with the same name.
  • All the ugliness with the escaped quotation marks (\""$1"\"",etc) and the '-F?' option is to handle filenames with spaces in them.
  • If you want to see test the results first (without actually renaming the files), leave off the trailing '| /bin/sh'
  • sed experts: I tried to do this with sed but couldn't. You're welcomed to prove me wrong.

    •    
  • Currently 4.00 / 5
  You rated: 4 / 5 (1 vote cast)
 
[10,244 views]  

Remove multiple characters from filenames | 12 comments | Create New Account
Click here to return to the 'Remove multiple characters from filenames' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
...
Authored by: eisforian on Aug 07, '02 10:44:09AM
I've shortened it a little...
/bin/ls | perl -nle'$o=$_; s/_+/_/g; rename $o,$_;'
;-)

[ Reply to This | # ]
oops
Authored by: eisforian on Aug 07, '02 10:46:28AM
find . -type f -name '*__*', not /bin/ls

[ Reply to This | # ]
On the subject of perl...
Authored by: ret on Aug 08, '02 12:41:55AM
This program is in the "Programming Perl" book, which is far niftier. It allows you to rename file(s) based on matching a regular expression, which in this case would be something like this:
% rename 's/_+/_/g' *.jpg
or even something like this to scan all sub-directories:
% find . -type f -name '*__*' -print0 | xargs rename 's/_+/_/g'
One of the beauties of this program is that the regexp can be a perl regexp, so you can do stuff like 'tr/A-Z/a-z/' to lower-case all file names and so on.
#!/usr/bin/perl

($op = shift) || die "Usage: rename {perlexpr|regexpr} filename [...]n";

if (!@ARGV){
  @ARGV = ;
  chop(@ARGV);
}

for (@ARGV){
  $was = $_;
  eval $op;
  die $@ if $@;

  if($was ne $_){
    print "$was -> $_n";
    rename($was,$_) unless $was eq $_;
  }
}
Enjoy. I appreciate that this is not a good example for people wanting to learn perl, as it is very terse. cheers RET

[ Reply to This | # ]
Could such a script scrub duplicate chars in a text file?
Authored by: osxpounder on Aug 08, '02 05:59:27PM

Before I try and understand this magic spell you're sharing with us, I want to ask about an idea it's given me.

Often, when I read a man page, I want to write out the man page to a file so I can read it more conveniently, scroll back & forth, and see it in a nice white window with nice black type. So I type ...

man finger>manfinger.txt;open manfinger.txt

... for example, and the resultant "manfinger.txt" file opens for me in BBEdit Lite.

Is there an easy way to remove the duplicate characters that appear in this output file? They don't appear in the Terminal window when I run man, of course, but do appear in the output to file. Quick example:

NNAAMMEE
ffiinnggeerr - user information lookup program

Would a script like yours deal with this easily, or, is there an even easier way to quickly scrape out those extra characters?

thanks,

osxpounder



[ Reply to This | # ]
Could such a script scrub duplicate chars in a text file?
Authored by: pkishor on Aug 09, '02 08:26:30AM

you should definitely keep on at learning scripting, however, for viewing man files why not just install manthor... a cocoa app that automagically loads the man your request in a regular cocoa window, with bookmarks and all. try it. you will like it.



[ Reply to This | # ]
Could such a script scrub duplicate chars in a text file?
Authored by: babbage on Aug 09, '02 10:12:14AM
You can do what you're asking for within 'man' itself, using syntax similar to that of the vi editor. Here's the help screen you get
Most commands optionally preceded by integer argument k.  Defaults in brackets.
Star (*) indicates argument becomes new default.
-------------------------------------------------------------------------------
<space>                 Display next k lines of text [current screen size]
z                       Display next k lines of text [current screen size]*
<return>                Display next k lines of text [1]*
d or ctrl-D             Scroll k lines [current scroll size, initially 11]*
q or Q or <interrupt>   Exit from more
s                       Skip forward k lines of text [1]
f                       Skip forward k screenfuls of text [1]
b or ctrl-B             Skip backwards k screenfuls of text [1]
'                       Go to place where previous search started
=                       Display current line number
/<regular expression>   Search for kth occurrence of regular expression [1]
n                       Search for kth occurrence of last r.e [1]
!<cmd> or :!<cmd>       Execute <cmd> in a subshell
v                       Starts $EDITOR or /usr/bin/vi at current line
ctrl-L                  Redraw screen
:n                      Go to kth next file [1]
:p                      Go to kth previous file [1]
:f                      Display current file name and line number
.                       Repeat previous command
-------------------------------------------------------------------------------

This could all be inherited from the current $EDITOR environment variable, and seeing as I haven't currently set $EDITOR it might be defaulting to the 'more' pager command. In other words, if you want to page through files with a different pager then set $EDITOR and the man command will use it instead.

*testing...*

No, the variable to set seems to be $PAGER, not in this case $EDITOR. So, if you want to use the less command (which has a much richer syntax) and you're using the default tcsh as your shell, then run "setenv PAGER less" to add the variable to your current enviromnent [or put this in a login script so that it works all the time] and then run man again. It should now support more interesting syntax than that offered by, well, 'more'. Sure enough, I test this now as I'm writing, setting less as my $PAGER and then viewing a long manpage, and when I hit 'h' to bring up help I get a different display than what you see above -- much more than can be reasonably pasted into this form. See for yourself if you'd like to try it :-)

[ Reply to This | # ]

Use one of the
Authored by: SeanAhern on Aug 09, '02 01:57:26PM

I was able to do what you want with the following command:

man printf | colcrt > some_file

That has the extra characters stripped out.



[ Reply to This | # ]
On the subject of perl...
Authored by: robh on Aug 09, '02 08:43:32AM

Here, again, is my slightly better Perl regexp renaming script:

http://www.imdb.demon.co.uk/OSX/rename

The key difference is that this script will show you what renaming is about to take place and ask you to confirm this is what you want. I find it way too dangerous to do regexp renaming without a warning about what action will take place.



[ Reply to This | # ]
On the subject of perl...
Authored by: look on Feb 03, '05 10:23:48AM

dead link



[ Reply to This | # ]
On the subject of perl...
Authored by: clith on Aug 10, '02 12:15:35PM
I wrote my own script to do this and called it 'rn' [since I don't read news using rn any more :-)]. I've put a copy of rn on my mac.com home page. Enjoy. You use it like this: % rn P100501 sunday- if you had files named P1005011.jpg P1005012.jpg P1005013.jpg, you would now have files named "sun-1.jpg", "sun-2.jpg", and "sun-3.jpg".

[ Reply to This | # ]
awk, sed, Perl, etc.
Authored by: victory on Aug 09, '02 06:32:11PM
What a nice series of improvements. I suppose I should have titled the submission as 'Remove multiple chars from filenames with awk one-liners' or something. I actually do most of my day-to-day work in Perl and C, but thought it would be fun to try and accomplish the task using sed/awk simply because: 1] It was a chance to mess with two tools that I haven't used much. 2] I like one-liners. Sometimes it's fun to see how much you can get away with at the shell prompt without resorting to writing a separate script/app.

Perl is quite a nifty language (you can still write one-liners. I particlarly like eisforian's example posted above) and one of the informal themes of Perl is there are usually several, equally valid ways of accomplishing a task (TMTOWTDI). As for learning the Perl language, I echo the earlier suggestion: Any of the O'reilly books is a good place to start. A collection of 5 O'reilly Perl books is also available on CD-ROM. A bit expensive, but it 's a nice way to have the references available at all times. (I got tired of lugging my printed copies back and forth from work, or invariably not having around when I needed to look something up)

[ Reply to This | # ]

ksh and it's friends
Authored by: osxpez on Aug 14, '02 10:10:03AM
There's a thread in the forums right now about renaming. I've been advertising using the shell for some cases of renaming. This one could be written:
for file in *__*; do mv "$file" "${file//__/}"; done


[ Reply to This | # ]