Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!


Click here to return to the 'A paperless office workflow' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
A paperless office workflow
Authored by: dewab on Jan 13, '13 05:47:45AM

Whilst I would love it if Apple added OCR functionality into Preview, I just don't see it happening for the simple reason that OCR is not always terribly accurate. It's gotten better over the years, but I've had to OCR a scanned PDF more than once, and gotten different results each time, before I ended up with the text that I was looking for.



[ Reply to This | # ]
A paperless office workflow
Authored by: keirthomas on Jan 14, '13 08:13:24AM

Evernote OCRs images you upload, and does fuzzy recognition. If it see what appears to be the word "Bavid" or "Oavid", for example, it'll also record "David" just in case. All three will turn-up in searches. So I can see how Apple could easily make OCR work better using this kind of technology. (Incidentally Evernote don't let you get the OCR'd text out of the PDF -- it's strictly to aid searches.)

And as for OCR not being accurate -- you'll be surprised. On a 300DPI scan the number of mistakes are surprisingly few in Acrobat. But I don't need 100% accuracy here. The word "electricity" will be mentioned several times in an electricity bill, and it's really Quick Look that lets me recognise the one I'm actually looking for. Document management in this fashion is something that only OS X could do.

---
__________
Author of Mac Kung Fu
Over 400 tips, tricks, hints and hacks for OS X

http://pragprog.com/book/ktmack2/mac-kung-fu



[ Reply to This | # ]
A paperless office workflow
Authored by: dewab on Jan 17, '13 09:00:37AM

I use the Scansnap S1500M and have tried the "built-in" OCR, as well as Acrobat's and PDFpen's. Because I'm using Hazel+AppleScript to automatically sort and rename a large number of bills and accounts, I rely heavily on identifying unique criteria (URLs, account #'s, etc) to ensure that I'm moving and renaming the bill correctly. I have found that OCR does an okay job of it, but I do have to re-OCR documents occasionally to have them get the correct information without adding space or misidentifying characters.

If you look at the raw text of OCRed PDFs (using pdftotext or the like) you'll actually see how hit-or-miss OCR really is.



[ Reply to This | # ]