Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Click here to return to the '.webarchive components?' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
.webarchive components?
Authored by: Chas on Jun 09, '05 11:00:37AM

I've been using the .webarchive files for a while, and I'm wondering if anyone knows how the format works. I notice other browsers save the HTML and linked files, for example, Mozilla saves the raw HTML in one file, and linked images (and other files) in a separate folder.
But so far I've been unable to dissect the .webarchive files and get anything out of them. I'd like to be able to access the HTML, make a few changes, resave it; or maybe extract a couple of jpgs (or just delete the jpgs for advertisements before printing it).
You can open a .webarchive file in a text editor and see the HTML, but it looks like the attached files are stored inside this file as binary data, which is a very clumsy way to do things. It would have been a lot more useful to store these files in a Package, so you could control-click and do "Show Package Contents" to get at the data.
So.. does anyone know how to get into these .webarchive files and do anything useful with it?

[ Reply to This | # ]
.webarchive components?
Authored by: Chas on Jul 09, '05 06:05:31PM

I found the solution to extracting .webarchive files, thanks to another hint at this address:

To extract an example.webarchive file, use this command:

textutil -convert html example.webarchive

This will extract the HTML file as well as the attached jpeg, gif, and other image files.

[ Reply to This | # ]