OCR

A few days ago I played with different free OCR (Optical Character Recognition) programs. I would like to be able to scan all my snailmails and OCR them, thus that would give me the possibility to search in snalmails like I do in e-mails. All places on the net said that Tesseract is the best one. But I did only got crap with 1.03. The result was better with gocr, but not good. Then I found out that there are some problems with Tesseract 1.03 when it is compiled in certain ways. Yesterday I downloaded 1.02 and it worked much better. Unfortunately, it does not support non English characters like the Swedish å, ä, and ö. Which is necessary for me. If for instance å always become the same character then I could hide the problem within the search engine, but this is not the case.
Gocr is already included in Ubuntu and Tesseract will be included in Feisty.

Comments

Anonymous said…
Or HOCR - Just in case you need to scan Hebrew...
See the video here:
Hebrew optical character recognition updated

Popular posts from this blog

Circles in PostGIS

Create your own CA with TinyCA2 (part 1)

Create your own CA with TinyCA2 (part 2)