Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Open Source OCR, an ABBY alternative?

Convert page images into searchable text. Talk about software, techniques, and new developments here.
User avatar
daniel_reetz
Posts: 2797
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Open Source OCR, an ABBY alternative?

Post by daniel_reetz » 27 Jul 2009, 13:10


Kirtai
Posts: 10
Joined: 27 Jul 2009, 10:49
E-book readers owned: PRS-505
Number of books owned: 3000
Location: Scotland

Re: Open Source OCR, an ABBY alternative?

Post by Kirtai » 27 Jul 2009, 13:39

Since it has hocr output, http://jimgarrison.org/moz-hocr-edit/ might be useful too

monday2000
Posts: 18
Joined: 04 Mar 2014, 00:52

Re: Open Source OCR, an ABBY alternative?

Post by monday2000 » 22 Oct 2009, 04:36

It would be nice to adopt CuneiForm to OCR the DjVu files. Exactly to say - to convert the OCR results to a format of the DjVu-TXT layer.

Full details here:

http://openocr.org/forum/viewtopic.php?f=7&p=4417

Are there any volunteers?

User avatar
daniel_reetz
Posts: 2797
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Open Source OCR, an ABBY alternative?

Post by daniel_reetz » 22 Oct 2009, 08:42

monday2000 wrote:Are there any volunteers?
You know our policy here, it's stated in our URL. Do It Yourself! :)

monday2000
Posts: 18
Joined: 04 Mar 2014, 00:52

Re: Open Source OCR, an ABBY alternative?

Post by monday2000 » 28 Oct 2009, 04:20

daniel_reetz
Do It Yourself
I prefer to focus on such things that wouldn't be done by anyone else (as for DjVu-scanning etc.) - that's more efficient than a plain "Do It Yourself" strategy. ;)

qwer
Posts: 11
Joined: 04 Mar 2014, 00:52

Re: Open Source OCR, an ABBY alternative?

Post by qwer » 10 Nov 2009, 09:38


Tim

Re: Open Source OCR, an ABBY alternative?

Post by Tim » 16 Nov 2009, 18:38

Some one already mentioned tesseract http://code.google.com/p/tesseract-ocr/, its currently said to be the most accurate open source OCR, but it only does one column text. You have to do the layout analysis separately to break the source document up into portions that are only single column text. Tesseract is being developed, but not rapidly.

The most rapidly developing open source OCR system is ocropus. http://code.google.com/p/ocropus/
It has some of the leading image processing research people working on it. If you follow the names of the people that release papers on binarization, ocr, etc, the people working on ocropus are up there. It's alpha code right now, but it does layout analysis and can optionally use tesseract as a plug in.

I'm actually building cuneiform right now. It does appear to have layout analysis, so it may be farther ahead than I had heard from other information. It looks promising.

But in short none of the open source systems are as accurate as ABBY Fine reader or Omnipage, but they may get there.
Last edited by Tim on 16 Nov 2009, 21:09, edited 1 time in total.

monday2000
Posts: 18
Joined: 04 Mar 2014, 00:52

Re: Open Source OCR, an ABBY alternative?

Post by monday2000 » 13 Jan 2010, 16:18

But in short none of the open source systems are as accurate as ABBY Fine reader or Omnipage,
Yes, sadly.
but they may get there
Only if Web-communities are going to contribute. Hard, but not desperate.

gabossy

Re: Open Source OCR, an ABBY alternative?

Post by gabossy » 10 Aug 2010, 06:52

What are the advantages from a software developer's standpoint of making open source software? I am new to the tech industry (I am in the legal department) and am wondering from a practical standpoint why a software developer would develop open source software. Any thoughts?
________________________
keyword research ~ keyword tool ~ keyword tracking ~ affiliate elite
Last edited by gabossy on 13 Aug 2010, 07:23, edited 1 time in total.

Tim

Re: Open Source OCR, an ABBY alternative?

Post by Tim » 10 Aug 2010, 07:55

gabossy wrote:What are the advantages from a software developer's standpoint of making open source software? I am new to the tech industry (I am in the legal department) and am wondering from a practical standpoint why a software developer would develop open source software. Any thoughts?
That's a really big question, not necessarily for this forum, but from the the OSI's website:
Open source is a development method for software that harnesses the power of distributed peer review and transparency of process. The promise of open source is better quality, higher reliability, more flexibility, lower cost, and an end to predatory vendor lock-in.
You can also look at what the Free Software Foundation has to say about it, but they aren't very good about explaining the practical why. Wikipedia's article has some more of the why's but it's not all that well written.

Post Reply