One method of counting words in LaTeX documents

Jun 07, '04 11:38:00AM

Contributed by: Anonymous

An internet search brought up a number of proposals for the problem of counting words in LaTeX, and maybe someone will appreciate an overview and a pointer to the simplest (though not really elegant) solution. Simply running a word count on a LaTeX document counts markup, which we don't want. There is a free tool to remove LaTeX markup (namely detex), but simple situations have been described in other forums where running the wc command (the unix word count) on the result gives the wrong answer. Same story for command-line tools extracting text from the dvi file. Another line of attack is to extract the text from the PDF that is trivially easy to produce from LaTeX or postscript. There are commercial/shareware tools for this, but why pay?

So here's the trick: by setting the view option in the Acrobat Reader to "continuous" (rather than one-page or two-page), it becomes possible to select all text in the PDF. Copy this to the clipboard, type pbpaste | wc -w in the Terminal, and voila! there's your word count. Sorry if the hint how to select all text in Acrobat seems trivial, but a quick internet search will convince you there are lots of people who think it's fact you can only select one page at a time in Acrobat. I don't think this works with Preview.

Pros:
You see what you're counting. At the end of the day, if the issue is a word limit for a thesis/dissertation, then the PDF, i.e. printed, output is what matters. You don't have to trust in the correct removal of markup because you only count what you can see. Very importantly, this means that there's no need to worry about whether include commands and other latex macros that might produce or remove text were handled correctly!

Cons:
It does count page numbers. However this is not a major problem because the number of pages is known exactly and can be substracted from the count.

Comments (5)


Mac OS X Hints
http://hints.macworld.com/article.php?story=20040606044542980