Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Cuneiform OCR

Convert page images into searchable text. Talk about software, techniques, and new developments here.
Post Reply
eslavko
Posts: 5
Joined: 13 Apr 2011, 12:44

Cuneiform OCR

Post by eslavko » 03 Sep 2011, 10:30

Hello...

Does someone know if there is commandline option for cuneiform?
I try to do OCR on my pages and tested tesseract and cuneiform. The other one have better result but lack of commandline to automate.

Slavko.

User avatar
dingodog
Posts: 108
Joined: 22 Jul 2010, 18:19
Number of books owned: 1000
Country: on the net
Location: on the net
Contact:

Re: Cuneiform OCR

Post by dingodog » 03 Sep 2011, 13:12

*Tesseract* and *Cuneiform* for Linux are fully scriptable via command line

I built binaries of

*Tesseract 3.0* + additional language data in one file (213 MB)
- http://dokupuppylinux.tk/programs:ocr#tesseract-30

for Puppy Linux 3.01, 4.3.1, 5.2.5

and

*Cuneiform*
- http://dokupuppylinux.tk/programs:ocr#cuneiform_10
usage

Code: Select all

cuneiform [-l language -o result_file –html –dotmatrix –fax] < image_file >
for Puppy Linux 5.2.5

In the same page you can find also other ocr engines scriptable via command line

I'm currently working to build latest version (4.x) of OCROPUS, used by googlebooks, but it is a long and difficulty task, since dependencies are odd and soyrce code is not properly packaged, so I need to fix before to build

eslavko
Posts: 5
Joined: 13 Apr 2011, 12:44

Re: Cuneiform OCR

Post by eslavko » 04 Sep 2011, 03:25

I forget to say for WinXp...

For linux I know that is both command line options. (there is no GUI :D)
But for now I'm stuck in win. I run Ubuntu on other machine (EMC2 machine controller) but don't want to bloat it as is precise rtai environment.
So some solution for win?

I read somewhere that dll's can be acessed (puma or something similar) but doesn't find the parameters options.

User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: Cuneiform OCR

Post by Misty » 07 Sep 2011, 11:28

Tesseract is available for Windows in official CLI form: http://code.google.com/p/tesseract-ocr/downloads/list There are also unofficial (probably unscriptable) GUIs, but I'm not familiar with those. Cuneiform for Windows is also available, but I don't know if it's the same CLI interface as the Linux/Mac version: http://en.openocr.org/download/
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.

Post Reply