Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ
Open Source OCR, an ABBY alternative?
- daniel_reetz
- Posts: 2797
- Joined: 03 Jun 2009, 13:56
- E-book readers owned: Used to have a PRS-500
- Number of books owned: 600
- Country: United States
- Contact:
Open Source OCR, an ABBY alternative?
Background: http://www.teleread.org/2008/05/26/amaz ... e-sourced/
Homepage:
http://en.openocr.org/
Forum thread with built packages:
http://openocr.org/forum/viewtopic.php? ... 63&start=0
Homepage:
http://en.openocr.org/
Forum thread with built packages:
http://openocr.org/forum/viewtopic.php? ... 63&start=0
-
- Posts: 10
- Joined: 27 Jul 2009, 10:49
- E-book readers owned: PRS-505
- Number of books owned: 3000
- Location: Scotland
Re: Open Source OCR, an ABBY alternative?
Since it has hocr output, http://jimgarrison.org/moz-hocr-edit/ might be useful too
-
- Posts: 18
- Joined: 04 Mar 2014, 00:52
Re: Open Source OCR, an ABBY alternative?
It would be nice to adopt CuneiForm to OCR the DjVu files. Exactly to say - to convert the OCR results to a format of the DjVu-TXT layer.
Full details here:
http://openocr.org/forum/viewtopic.php?f=7&p=4417
Are there any volunteers?
Full details here:
http://openocr.org/forum/viewtopic.php?f=7&p=4417
Are there any volunteers?
- daniel_reetz
- Posts: 2797
- Joined: 03 Jun 2009, 13:56
- E-book readers owned: Used to have a PRS-500
- Number of books owned: 600
- Country: United States
- Contact:
Re: Open Source OCR, an ABBY alternative?
You know our policy here, it's stated in our URL. Do It Yourself!monday2000 wrote:Are there any volunteers?

-
- Posts: 18
- Joined: 04 Mar 2014, 00:52
Re: Open Source OCR, an ABBY alternative?
daniel_reetz

I prefer to focus on such things that wouldn't be done by anyone else (as for DjVu-scanning etc.) - that's more efficient than a plain "Do It Yourself" strategy.Do It Yourself

Re: Open Source OCR, an ABBY alternative?
This http://www.onlineocr.net/ service uses tesseract http://code.google.com/p/tesseract-ocr/.
Training process described here: http://code.google.com/p/tesseract-ocr/ ... gTesseract .
Training process described here: http://code.google.com/p/tesseract-ocr/ ... gTesseract .
Re: Open Source OCR, an ABBY alternative?
Some one already mentioned tesseract http://code.google.com/p/tesseract-ocr/, its currently said to be the most accurate open source OCR, but it only does one column text. You have to do the layout analysis separately to break the source document up into portions that are only single column text. Tesseract is being developed, but not rapidly.
The most rapidly developing open source OCR system is ocropus. http://code.google.com/p/ocropus/
It has some of the leading image processing research people working on it. If you follow the names of the people that release papers on binarization, ocr, etc, the people working on ocropus are up there. It's alpha code right now, but it does layout analysis and can optionally use tesseract as a plug in.
I'm actually building cuneiform right now. It does appear to have layout analysis, so it may be farther ahead than I had heard from other information. It looks promising.
But in short none of the open source systems are as accurate as ABBY Fine reader or Omnipage, but they may get there.
The most rapidly developing open source OCR system is ocropus. http://code.google.com/p/ocropus/
It has some of the leading image processing research people working on it. If you follow the names of the people that release papers on binarization, ocr, etc, the people working on ocropus are up there. It's alpha code right now, but it does layout analysis and can optionally use tesseract as a plug in.
I'm actually building cuneiform right now. It does appear to have layout analysis, so it may be farther ahead than I had heard from other information. It looks promising.
But in short none of the open source systems are as accurate as ABBY Fine reader or Omnipage, but they may get there.
Last edited by Tim on 16 Nov 2009, 21:09, edited 1 time in total.
-
- Posts: 18
- Joined: 04 Mar 2014, 00:52
Re: Open Source OCR, an ABBY alternative?
Yes, sadly.But in short none of the open source systems are as accurate as ABBY Fine reader or Omnipage,
Only if Web-communities are going to contribute. Hard, but not desperate.but they may get there
Re: Open Source OCR, an ABBY alternative?
What are the advantages from a software developer's standpoint of making open source software? I am new to the tech industry (I am in the legal department) and am wondering from a practical standpoint why a software developer would develop open source software. Any thoughts?
________________________
keyword research ~ keyword tool ~ keyword tracking ~ affiliate elite
________________________
keyword research ~ keyword tool ~ keyword tracking ~ affiliate elite
Last edited by gabossy on 13 Aug 2010, 07:23, edited 1 time in total.
Re: Open Source OCR, an ABBY alternative?
That's a really big question, not necessarily for this forum, but from the the OSI's website:gabossy wrote:What are the advantages from a software developer's standpoint of making open source software? I am new to the tech industry (I am in the legal department) and am wondering from a practical standpoint why a software developer would develop open source software. Any thoughts?
You can also look at what the Free Software Foundation has to say about it, but they aren't very good about explaining the practical why. Wikipedia's article has some more of the why's but it's not all that well written.Open source is a development method for software that harnesses the power of distributed peer review and transparency of process. The promise of open source is better quality, higher reliability, more flexibility, lower cost, and an end to predatory vendor lock-in.