Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Extract raw HTML from Windows' .CHM files UNIX
I recently received several work-related ebooks in .CHM format. For those who don't use Windows, CHMs are Microsoft-compiled HTML Help files. They are just HTML files that have been compressed into a single file. You can view them (on Windows) with the MS Help Viewer that ships with the OS.

As far as I can tell, there's no OS X CHM reader, so the only way to view the contents is by decompiling the CHM with chmdump. It's available as source code only, but it compiles without a hitch on 10.2. Compiling is simple:

  1. Download the file
  2. Extract it
  3. Open Terminal.app and cd to the chmtools folder
  4. Type make

And that's it. You may want to put the resulting chmdump file into a folder in your path. (I put it in /usr/local/bin.)

To decompile a CHM, type chmdump {CHM file} {destination folder}. There will be a bunch of files the Help Viewer uses for searching and indexing. Ignore those and look for a folder of HTML documents. You can then view the HTML files in your browser.

    •    
  • Currently 3.00 / 5
  • 1
  • 2
  • 3
  • 4
  • 5
  (2 votes cast)
 
[17,831 views]  

Extract raw HTML from Windows' .CHM files | 16 comments | Create New Account
Click here to return to the 'Extract raw HTML from Windows' .CHM files' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Extract raw HTML from Windows' .CHM files
Authored by: dave1212 on Aug 13, '03 02:54:55PM

Thanks, the docs for one of my cms' (I think phpB) is in that format.

---
______

[ IE Toolbar Icons, Desktop Picures, Free MP3s ]
http://www.paulprobert.com/



[ Reply to This | # ]
Extract raw HTML from Windows' .CHM files
Authored by: bluehz on Aug 13, '03 08:03:10PM

chmdump works great but it names the files with very non-userfriendly names. If you have VPC you can get the MS HTML Help Workshop app here:

http://go.microsoft.com/fwlink/?LinkId=14188 (3.3mb)

and it will decompile the files nicely with userfriendly names like "Chap01.hmtl", etc.



[ Reply to This | # ]
Extract raw HTML from Windows' .CHM files
Authored by: dave1212 on Aug 14, '03 12:26:23AM

Thanks for the tip, I'm trying to run VPC less and less, and definitely will not install the MS update they just released.

It would be nice to have it name the files better, though, so I might use VPC for now.


---
______

[ IE Toolbar Icons, Desktop Picures, Free MP3s ]
http://www.paulprobert.com/



[ Reply to This | # ]
Extract raw HTML from Windows' .CHM files
Authored by: mrb712 on Sep 17, '03 03:44:46PM

I followed the directions to the T, and when I typed in the command make, I got an error saying make: command not found. I'm using 10.2.6. What am I doing wrong?

---
Si les hommes sont égaux, l'amour est un défaut.



[ Reply to This | # ]
Extract raw HTML from Windows' .CHM files
Authored by: bluehz on Sep 18, '03 06:52:55PM

If you are not seeing the "make" command available it probably means you have not installed the Developer Tools. Have you?



[ Reply to This | # ]
Extract raw HTML from Windows' .CHM files
Authored by: mrb712 on Sep 22, '03 06:55:05PM

Ah ha! You know, that stuff has caused me so many headaches. I haven't installed them. Can I just add them to the current install of OS X 10.2.6?

---
Si les hommes sont égaux, l'amour est un défaut.



[ Reply to This | # ]
Extract raw HTML from Windows' .CHM files
Authored by: mrb712 on Sep 23, '03 11:23:24AM

OK, I installed the developers tools and the BSD subsystem from the OS X 10.2 Install Disc 1, and I still get the command not found error.

Can someone help please?

---
Si les hommes sont égaux, l'amour est un défaut.



[ Reply to This | # ]
Extract raw HTML from Windows' .CHM files
Authored by: at_sym on Sep 23, '03 04:31:14PM
What do you get when you type which make in Terminal.app?

(Man, that sounds like the beginning of a really geeky joke. :) )

[ Reply to This | # ]
Extract raw HTML from Windows' .CHM files
Authored by: mrb712 on Sep 23, '03 10:29:06PM

Never mind. Someone else hinted that the command file was not in the path environment which is why it wasn't found. So I moved it there, and now it works just fine.

---
Si les hommes sont égaux, l'amour est un défaut.



[ Reply to This | # ]
UPDATE: xCHM binary available
Authored by: at_sym on Jan 07, '04 09:33:26AM
Chanler White just announced a Mac OS X binary for xCHM, a Unix CHM viewer. The binary is available on VersionTracker. I've just played around with it for a bit, and it seems to work pretty well. So now you can read the docs without decompiling.

[ Reply to This | # ]
CHMOX: A 100% native Cocoa CHM viewer
Authored by: Nucleus on Jun 04, '04 06:11:41AM

Open source, free, use WebKit (Safari engine), CHMLIB..
http://freshmeat.net/projects/chmox/

Feedback is welcomed



[ Reply to This | # ]
CHMOX: A 100% native Cocoa CHM viewer
Authored by: Interactive on Jun 22, '04 04:23:47PM

I just downloaded CHMOX. It works perfectly. For the first time, I can open CHM files on my Mac without any side steps.

If I may, I would request that more flexible zoom (in and out) be added.



[ Reply to This | # ]
Extract raw HTML from Windows' .CHM files
Authored by: encro on Jun 23, '04 01:29:25PM

There are quite a few utilities now for working with compiled html files (.chm) on OS X available now:

CHM Viewer
http://www.jouledata.com/DesktopDefault.aspx?tabindex=1&tabid=21

xCHM
http://xchm.sourceforge.net/

Tubby
http://mikebultrowicz.com/software/tubby/

Chmox
http://sourceforge.net/projects/chmox/page/chmox



[ Reply to This | # ]
To build chmdump under 10.4.x / Darwin 8
Authored by: victory on May 05, '05 06:49:40PM
UNcomment line 23 of the chmlib.h (Yeah, the one that tell's you it was previously commmented out to build correctly under an earlier OS version):

typedef unsigned short ushort;

Under 10.4.x w/GCC 3.3/4.0 you'll get a bunch of compile warnings (mostly due to mismatched prototypes), but the app should build and work ok.

Of course, I'm not sure why anyone would be messing with this utility anymore, considering the handful of really nice Aqua-native .CHM viewers now available. But there it is...

[ Reply to This | # ]

To build chmdump under 10.4.x / Darwin 8
Authored by: qu1j0t3 on Apr 08, '10 04:13:03AM

Here is a more comprehensive diff that makes chmdump portable to 32/64 bit Intel as well. It also fixes the warnings.


diff -Naur chmtools-0.1/Makefile chmtools/Makefile
--- chmtools-0.1/Makefile 2002-01-08 15:03:47.000000000 +1100
+++ chmtools/Makefile 2010-04-08 20:19:47.000000000 +1000
@@ -1,9 +1,9 @@
LIBOBJS = chmlib.o lzx.o
-CFLAGS = -DDEBUG
+CFLAGS = -DDEBUG -g
PROGS = chmdump

chmdump: $(LIBOBJS) chmdump.o
$(LINK.c) -o $@ $^

clean:
- rm -f *.o *~ \#* core $(PROGS)
\ No newline at end of file
+ rm -f *.o *~ \#* core $(PROGS)
diff -Naur chmtools-0.1/chmlib.c chmtools/chmlib.c
--- chmtools-0.1/chmlib.c 2001-10-15 06:37:30.000000000 +1000
+++ chmtools/chmlib.c 2010-04-08 20:19:47.000000000 +1000
@@ -18,9 +18,7 @@
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/

-#include <stdlib.h>
#include "chmlib.h"
-#include "fixendian.h"

#define FILELEN_HSECT 0
#define DIR_HSECT 1
@@ -39,15 +37,17 @@
#define DPRINTF while (0) fprintf
#endif

-static void
-get_guid(ubyte *buf, guid_t *guid)
-{
- memcpy(guid, buf, sizeof(guid_t));
- FIXENDIAN32(guid->guid1);
- FIXENDIAN16(guid->guid2[0]);
- FIXENDIAN16(guid->guid2[1]);
+ulong le32(ubyte *p){
+ return (p[3]<<24) | (p[2]<<16) | (p[1]<<8) | p[0];
+}
+
+ushort le16(ubyte *p){
+ return (p[1]<<8) | p[0];
}

+#define FIXENDIAN16(x) ((x) = le16((ubyte*)&(x)))
+#define FIXENDIAN32(x) ((x) = le32((ubyte*)&(x)))
+
static void
make_guid_string(guid_t *guid, char *s)
{
@@ -57,7 +57,7 @@
guid->guid3[4], guid->guid3[5], guid->guid3[6], guid->guid3[7]);
}

-static void guid_fix_endian(guid_t *guid)
+static void guid_fix_endian(guid_t *guid)
{
FIXENDIAN32(guid->guid1);
FIXENDIAN16(guid->guid2[0]);
diff -Naur chmtools-0.1/chmlib.h chmtools/chmlib.h
--- chmtools-0.1/chmlib.h 2002-01-08 15:01:31.000000000 +1100
+++ chmtools/chmlib.h 2010-04-08 20:19:47.000000000 +1000
@@ -19,9 +19,15 @@
*/

#include <stdio.h>
-typedef unsigned long ulong;
-// typedef unsigned short ushort; Already defined for Darwin
-typedef unsigned char ubyte;
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+
+// int is 4 bytes on 32-bit and 64-bit OS X; long is not.
+// see: http://developer.apple.com/Mac/library/documentation/Darwin/Conceptual/64bitPorting/transition/transition.html#//apple_ref/doc/uid/TP40001064-CH207-SW1
+typedef uint32_t ulong;
+typedef uint16_t ushort; //Already defined for Darwin
+typedef uint8_t ubyte;

typedef struct guid_t
{
diff -Naur chmtools-0.1/fixendian.h chmtools/fixendian.h
--- chmtools-0.1/fixendian.h 2001-10-14 13:29:33.000000000 +1000
+++ chmtools/fixendian.h 1970-01-01 10:00:00.000000000 +1000
@@ -1,15 +0,0 @@
-#ifdef BIG_ENDIAN
-#define EREV32(x) ((((x)&0xFF000000)>>24) | \
- (((x)&0x00FF0000)>> 8) | \
- (((x)&0x0000FF00)<< 8) | \
- (((x)&0x000000FF)<<24))
-#define FIXENDIAN32(x) (x)=EREV32((ulong)x)
-#define FIXENDIAN16(x) (x)= ((((ushort)(x))>>8) | ((ushort)((x)<<8)))
-#define COPYENDIAN32(x) EREV32((ulong)x)
-#define COPYENDIAN16(x) ((((ushort)(x))>>8) | ((ushort)((x)<<8)))
-#else
-#define FIXENDIAN32(x)
-#define FIXENDIAN16(x)
-#define COPYENDIAN32(x) x
-#define COPYENDIAN16(x) x
-#endif
diff -Naur chmtools-0.1/lzx.h chmtools/lzx.h
--- chmtools-0.1/lzx.h 2001-10-14 14:34:41.000000000 +1000
+++ chmtools/lzx.h 2010-04-08 20:19:47.000000000 +1000
@@ -1,9 +1,12 @@
-#define UBYTE unsigned char
-#define UWORD unsigned short
-#define ULONG unsigned long
-#define BYTE signed char
-#define WORD short
-#define LONG long
+#include <stdint.h>
+#include <string.h>
+
+typedef uint8_t UBYTE;
+typedef uint16_t UWORD;
+typedef uint32_t ULONG;
+typedef int8_t BYTE;
+typedef int16_t WORD;
+typedef int32_t LONG;

int LZXinit(int window);
int LZXdecompress(UBYTE *inbuf, UBYTE *outbuf, ULONG inlen, ULONG outlen);



[ Reply to This | # ]
Extracting .hlp files
Authored by: magnamous on Nov 03, '09 11:58:58PM

Just a side note about .chm's predecessor, .hlp. I have a few .hlp files that I'm trying to decompress, and the software for .chm doesn't work. To decompress .hlp files, the best method I've found requires Windows, unfortunately, but it works.

There is a command-line program called HelpDeco and a GUI for it called HlpDecoGUI. I found both via this page, which was quite helpful. So long as you have a copy of Windows, you can download these programs and get the files into .rtf format, which you can then read on your Mac. If your needs are more complicated, you can use the link above to read how to then convert to .chm, which can then be re-converted to html (thus preserving images, etc.).

I hope this helps you. Good luck!



[ Reply to This | # ]