Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

One method of counting words in LaTeX documents Apps
An internet search brought up a number of proposals for the problem of counting words in LaTeX, and maybe someone will appreciate an overview and a pointer to the simplest (though not really elegant) solution. Simply running a word count on a LaTeX document counts markup, which we don't want. There is a free tool to remove LaTeX markup (namely detex), but simple situations have been described in other forums where running the wc command (the unix word count) on the result gives the wrong answer. Same story for command-line tools extracting text from the dvi file. Another line of attack is to extract the text from the PDF that is trivially easy to produce from LaTeX or postscript. There are commercial/shareware tools for this, but why pay?

So here's the trick: by setting the view option in the Acrobat Reader to "continuous" (rather than one-page or two-page), it becomes possible to select all text in the PDF. Copy this to the clipboard, type pbpaste | wc -w in the Terminal, and voila! there's your word count. Sorry if the hint how to select all text in Acrobat seems trivial, but a quick internet search will convince you there are lots of people who think it's fact you can only select one page at a time in Acrobat. I don't think this works with Preview.

Pros:
You see what you're counting. At the end of the day, if the issue is a word limit for a thesis/dissertation, then the PDF, i.e. printed, output is what matters. You don't have to trust in the correct removal of markup because you only count what you can see. Very importantly, this means that there's no need to worry about whether include commands and other latex macros that might produce or remove text were handled correctly!

Cons:
It does count page numbers. However this is not a major problem because the number of pages is known exactly and can be substracted from the count.
    •    
  • Currently 3.33 / 5
  • 1
  • 2
  • 3
  • 4
  • 5
  (6 votes cast)
 
[12,856 views]  

One method of counting words in LaTeX documents | 5 comments | Create New Account
Click here to return to the 'One method of counting words in LaTeX documents' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
One method of counting words in LaTeX documents
Authored by: joebeone on Jun 07, '04 02:00:42PM
I think the easiest method (quite similar to this) would be to do:
ps2ascii outfile.ps | wc -w


[ Reply to This | # ]
One method of counting words in LaTeX documents
Authored by: tinker on Jun 07, '04 04:13:19PM

If word of this gets out, journal editors will stop accepting the "It's in LaTeX, I can't get an exact word count" excuse!! Aaaaaaaaaaaaa!!



[ Reply to This | # ]
One method of counting words in LaTeX documents
Authored by: afried on Jun 07, '04 09:54:31PM

Great hint, I was looking for something like this for a while. Thanks



[ Reply to This | # ]
detex way of doing things
Authored by: danieleggert on Jun 16, '04 09:52:32AM

If you turst detex then you can achieve the very same thing by simply running:

detex myfile.tex | wc



[ Reply to This | # ]
One method of counting words in LaTeX documents
Authored by: simonpie on Oct 17, '05 12:33:17PM

Excalibur will give you a word count.



[ Reply to This | # ]