Save webpage content with links via Pages Apps
I frequently like to save news articles from the web for later perusal, since so many seem to disappear, as we all know. Up until now, I've used two methods for doing this. One, of course, is to print the page to a PDF. The upside to this is that all the images and formatting is preserved, but you lose hyperlink. The other, which I use for primarily text only pages, is to copy the text into TextEdit. Upside? Smaller files than PDF. Downside? Still no hyperlinks.

So, I just got my copy of iWork, and I decided I'd give Pages a shot at this. I called up an article, went to the "Print Friendly Version," and copied the text and images into a blank document. Not only did it preserve the text formatting, with a little image placement tweaking, but the hyperlinks were still present and functional! Where TextEdit just treated them as styled text, and the same for PDF export, Pages correctly recognized them as hyperlinks, and kept them intact. Now, how cool is that?

[robg adds: In 10.3 at least, TextEdit can retain hyperlinks in pasted text -- just make sure the document is set to RTF and not plain text. Still, this is an interesting use for Pages, as it gives you more control over the image placement of the page.]
Authored by: oink on Feb 10, '05 12:26:12PM

This is interesting, I wonder if it can be combined with yesterday's hint about printing the current page url as header, so that you have a saved extract of a webpage with working links as well as the source url... wishful thinking?

Authored by: variousbronson on Feb 10, '05 01:19:58PM

MacJournal works great for this.... and can keep many articles organized..

Authored by: foilpan on Feb 10, '05 02:18:34PM

why not just save the page as HTML from your browser? all this cutting and pasting is unnecessary.

or, if you really want an archived copy, just use the command line with a tool like wget. i use wget -r -l1 -k or some combination of options to get the the desired results.

-r -> recursive
-l -> levels to recurse (1, in this case)
-k -> convert links to relative

curl has similar capabilities. and there are other site sucking tools out there that do the same but have GUIs.

Authored by: oink on Feb 10, '05 10:16:43PM

Regarding saving web pages. More often than not, I need only a few text/graphics of a web page, this, I copy and paste into Hogbay notebook which handles all my little snippets. Often I have to do a second round copy and paste just so I can get a record of the original URL. It would be nice to streamline it into one step. About wget, my alias file is full of wget aliases and it is one of my most used tool. The only advange curl has is the ability to do some limited pattern matching to the url.

What I Really Want
Authored by: bedouin on Feb 10, '05 02:47:42PM

Is Pages to export links of all kinds to PDFs properly.

Authored by: allanmarcus on Feb 10, '05 03:38:03PM

If you have the full version of Acrobat, you can use its Web capture feature to download not only a page, but a full site to a PDF document, including links.

And Firefox does it for you!
Authored by: a-bort on Feb 10, '05 08:06:14PM

Firefox is actually very good in saving contents..
When you save just an html page, it will make a folder with all the involved pictures besides the saved html doc.

don't forget firefox extensions
Authored by: ageless on Feb 10, '05 11:14:57PM

seems like there are several billion ways to do this!

look through the popular/highly rated firefox extension, you'll see several that are design for saving html/text along with URL and/or page title.

And Firefox does it for you!
Authored by: pnutslab on Feb 11, '05 01:33:59AM

I would also suggest you have a look at the Firefox Scrapbook extension. This enables you to capture a complete local copy of the webpage (while retaining all external links). I have a library of all of my archived webpages which I can access quickly through the Firefox scrapbook sidebar.

Authored by: aranor on Feb 11, '05 12:28:27AM

All I can say is, wait for Tiger. I don't remember if the details have been released to the public yet, so I won't clarify this statement.

Authored by: joey03 on Feb 11, '05 07:15:35PM

I strongly recommend the wonderful app "Webstractor." Not only does it allow you to save web pages, but you can *edit* them (for removing ads, for example), and it will build a table of contents for different saved pages as well.

Or try a less stupid browser
Authored by: VRic on Feb 12, '05 12:31:31PM

There should be no need for a hint to save something. If you can't do it like in every other app (Save: cmd-S), then the browser sucks.

Well, most do.

Every major browser basically destroys what it "saves", and/or makes stupidity statements while at it (IE re-downloads what's currently displayed, currently stored in RAM, currently stored in disk cache, others find cool to alter the content and relocate linked files, effectively destroying the page from a page author's perspective, etc.)

Safari 2 will save to "archives". Let's hope it's not a stupid archive format.

ALL browsers' authors except one should be ashamed of themselves after all these years. It should have been obvious from day one that the proper way to do this was to "save" to a non-proprietary "archive" file format.

Which is precisely what iCab has been doing all that time: a zip archive containing the exact hierarchy of files from that page.

It lets you save the current page with absolutely no alteration, meaning I can "save" some page of mine from my website and use that as, well, an archive of that page, to later use or modify, which no other browser allows without ridiculous wizardry (your text processor saves your documents unaltered, as you're most likely to want them later, if you want dumbed-down versions of them it's an option, but not the other way around).

This also means that pages saved using older versions of my browser, which didn't render properly, DISPLAY PERFERCTLY IN LATER VERSIONS with better CSS support for example.

Yet I don't suggest to switch to iCab, because I'm fed up with all the crap I'm hearing about "incomplete CSS2, blahblahblah, useless, blahblah" (which hopefully will end when preview 3 is released, as beta 3.0 has caught up on that front already). Instead I suggest you write to your browser's authors to ask for some basic iCab features like saving. How ridiculous is that?

And since no other browser seems to originate from such a brilliant individual as the single developer of iCab, you'll have to tell them how to do it cleverly: the single trick needed to save to standard unmodified zip archives and still retain full original paths, relative and absolute links functionality to data inside AND outside the archive, and instruct the browser of where to find the saved page's source -which may be burried in a deep hierarchy- is to save that file as the first one in the archive. The rest should be obvious from exploring or decompressing some iCab archives using Zipit or Stuffit Expander.

Also, iCab has a tool to convert IE's useless uncompressed proprietary archives to plain zip iCab archives. Of course other browsers require you to decompress those and hunt for the proper page's file and loose relative links to online data, but at least it can be done. Don't expect this from others, specially M$ that leaves IE users with no future way to read "saved" pages. How safe is saving to a proprietary file format? What's the point of saving universal cross-platform web pages to a single-app file format?

iCab has been my default browser from the days of NS 3, in part because it was and still is the only browser that saves properly. So it's of course perfectly usable, even if it requires a secondary browser at hand just in case, which isn't really different from others anyway. At least I'm not constantly *censored*ing about how stupid and crappy my browser is.

Hey! funny thing, that "censor" filter in the comments ;-)

By the way, don't start me on filtering, I see close to zero ad in iCab. With all the developers' not-so-cleverness in browsers and all the adds on the web, browsing has become a torture to me without iCab ;-)

