Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!


Click here to return to the 'Handle DOS Line Endings' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Handle DOS Line Endings
Authored by: kerbaugh on Feb 09, '03 01:48:58AM

In its raw form, the html has DOS line endings, which makes it hard to manipulate via UNIX tools. My preferred version of a script to display just the fortune part of the page would strip off the carriage return first, thus converting to UNIX line endings. Here's the script I would use for this purpose:

#!/bin/sh
#
# fetch a fortune from thinkgeek

curl -s http://www.thinkgeek.com/fortune.shtml |
sed -n '/(refresh for another)/,/table\>/{
/<p>/,/<\/p>/{
s/^M//g
/<\/*p>/d
/^$/d
s/<[BbRr]*>/\
/g
s/&gt;/>/g
s/&lt;/</g
p
}
}'

It must be noted however, that the carriage return, depicted here as it would appear on the command line, "^M" is a literal control character and must be entered with a tool that can do this. To my knowledge, only UNIX text editors can do it. I'd love to know how to make BBEdit do it, if anyone knows how. In a UNIX text editor, this character is entered with the key sequence, <Control>-v <Control>-m. I've included a substitution to convert escape sequences for "greater than" and "less than". One can add more as the need arises. I don't understand the author's s/<\;/</g and s/>\;/>/g substitutions but they may eventually become evident. If anyone knows the reason for them, please let me know. As an aside, my script has yet to return a blank line.



[ Reply to This | # ]
Handle DOS Line Endings
Authored by: kal on Feb 09, '03 05:43:49AM

>In its raw form, the html has DOS line endings, which
>makes it hard to manipulate via UNIX tools.

Not if you use the right unix tool :-) The unix "tr" (translate) command should help you get rid of the carriage return character. For instance on the file test:

cat test | tr -d "\r"

Should remove all carriage returns from the file test and print it to STDOUT.



[ Reply to This | # ]
Handle DOS Line Endings
Authored by: kerbaugh on Feb 09, '03 08:57:24AM
   You're right of course. What I should have said was that it "makes it difficult to handle with UNIX tools unless they are removed". I couldn't figure out at first why my matches weren't matching. All UNIX text editors have convenient means of deleting things; I just used sed to do it since I was already in sed.
--
Gary
~~~~
   Line Printer paper is strongest at the perforations.

[ Reply to This | # ]