Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Bitonal conversion using Photoshop (Gimp & Imagemagic...)

Share your software workflow. Write up your tips and tricks on how to scan, digitize, OCR, and bind ebooks.
User avatar
Heelgrasper
Posts: 70
Joined: 19 Feb 2012, 21:04
E-book readers owned: None
Number of books owned: 500
Location: Randers, Denmark

Re: Bitonal conversion using Photoshop (Gimp & Imagemagic...

Post by Heelgrasper » 08 May 2012, 09:08

I just did at small test using Gimp 2.8 and everything seemed to work fine. I used the first version of the script but I guess it's unlikely that matters.

I took a single shot of an old bible I have for the test. It's good for this test since it's printed in fraktur with a lot of fine details, some of it rather important since for example f and the so called long s are very similar. I measured the DPI of the original image to 686 and keept that in ScanTailor so the images should be the same size in pixels.

Below here I've put some illustrations (bit big, sorry):
1. Part of original image
2. Same part after ScanTailor (set at -20 thinner)
3. Same part after using the Gimp script.

I didn't manage to get it perfect with the script (keept having some white in the first - very big - letter of the text) but more details are shown.

Since fractur isn't read by most people and the text is in Danish here's a transcription of the text (note how f, s (sometimes) and k look in the text and imagine the troubles with OCR):

Hører, I Sønner! en Faders Underviis-
ning, og giver Agt, for at faae For-
stand;
2. thi jeg har givet Eder en god Lærdom;
forlader ikke min Lov.
3. Thi jeg var min Faders Søn, *min
Moders ømme og eneste Barn.
*1 Krøn. 22,5.
4. Og han lærte mig og sagde til mig:
lad dit Hjerte *holde fast ved mit Ord,
bevar mine Bud, saa skal du leve.
*Luc. 10,28.
IMG_1402-part.jpg
scant-test-part.jpg
test-part.jpg
---
Jakob Øhlenschlæger
Randers, Denmark

The past is a foreign country: they do things differently there
L. P. Hartley

xorpt
Posts: 42
Joined: 24 Feb 2012, 01:37
E-book readers owned: Sony PRS-T1
Number of books owned: 2000

Re: Bitonal conversion using Photoshop (Gimp & Imagemagic...

Post by xorpt » 08 May 2012, 10:11

Thanks for the feedback!

Did you try changing the Opacity setting to 50% or less? This should take care of the white in the middle of the characters. I tried with 30% and it seems to do the trick... I will take your image and see what I can improve.

User avatar
Heelgrasper
Posts: 70
Joined: 19 Feb 2012, 21:04
E-book readers owned: None
Number of books owned: 500
Location: Randers, Denmark

Re: Bitonal conversion using Photoshop (Gimp & Imagemagic...

Post by Heelgrasper » 08 May 2012, 11:33

I tried experimenting with the different settings and did put the opacity setting down at some point but might not have been quite down to 30 - perhaps 40. I just tried 30 to see how it looked and it was close but still a few white pixels in the big "H".

Perhaps I could have gotten better results if I had gone on testing different settings but since it was just to see how well it would work I just stopped at some point. I might be doing some scans of books with this typeface at some point but not right now so I'll cross that bridge when I get to it.

For the fun of it I tried running OCR on the two different outputs using Tesseract. Tesseract has a "dan-frak" setting that seems to be working okay even though the first line seems to have been messed up by something. Note the second last line where the improved output from Gimp resulted in the correct "saa" and not the the incorrect "faa" from the ScanTailor output.

ScanTailor output:

Hørneirn, gI ,S oøgn gnievre! re An gFta, dfeorrs a Ut fnadaeer vFioisrsta1ld;
2. thi jeg har givet Eder en god Lærdom;
forlader ikke min Lov.
S. Thi jeg var min Faders Søn, «-«-min
Moders ømme og eneste Barn.
Et Krav. 22, 5.
4. Og han lærte mig og sagde til mig:
lad dit Hjerte i«holde fast ved mit Ord,
bevar.mine Bud, faa skal du leve.
'·«Luc. 10, 28.

Gimp output:

hørenri,n Ig ,S oøgn ngeivr!e re nA Fgta,d feorrs aUt nfadaeer vFioisrstand;
2. thi jeg har givet Eder en god Lærdom;
forlader ikke min Lov.
3. Thi jeg var min Faders Søn, ·min
Moders ømme og eneste Barn.
·1 skraa. 22, B.
4. Og han lærte mig og sagde til mig:
lad dit Hjerte «««holde fast ved mit Ord,
bevar mine Bud, saa skal du leve.
·Luc. 10, 28.
---
Jakob Øhlenschlæger
Randers, Denmark

The past is a foreign country: they do things differently there
L. P. Hartley

rdx
Posts: 8
Joined: 08 Jun 2012, 18:50
E-book readers owned: iRex iLiad, Sony PRS-550, Kindle, iRiver Story HD
Number of books owned: 100
Country: United States

Re: Bitonal conversion using Photoshop (Gimp & Imagemagic...

Post by rdx » 09 Aug 2012, 16:56

hi xorpt
I just want to let you know that I come to your gimp plugin for a better bitonal conversion solution and I need to say, it works so well! Thanks a lot.

Rick

xorpt
Posts: 42
Joined: 24 Feb 2012, 01:37
E-book readers owned: Sony PRS-T1
Number of books owned: 2000

Re: Bitonal conversion using Photoshop (Gimp & Imagemagic...

Post by xorpt » 19 Aug 2012, 06:22

rdx wrote:hi xorpt
I just want to let you know that I come to your gimp plugin for a better bitonal conversion solution and I need to say, it works so well! Thanks a lot.

Rick
Thanks :) I'm happy it's useful to you. If you want to suggest improvements, please do not hesitate. I'm not a developer, but if I can find a way to do it I'll be please to do so...

DrCheap
Posts: 48
Joined: 07 Jan 2012, 19:27
E-book readers owned: pdf
Number of books owned: 750

Re: Bitonal conversion using Photoshop (Gimp & Imagemagic...

Post by DrCheap » 29 Aug 2012, 10:05

Ok, mad crazy kudos and thanks for the batch GIMP processor. I am playing with it now and the results took some tinkering but look excellent. While this additional step may not make into my mundane daily use cycle, for important material for public consumption, this improves my output quality greatly.

My settings, by the way, for best results at least on the current book I am processing, were:
LB 20
LW 220
LG .67
DoGr1 6.0
DoGr2 1.0
GBH 1.0
GBV 1.0
LO 15
FT 255
ST 170

Only one note: Setting GBH and GBC to zero causes the script to crash.

Thanks!

xorpt
Posts: 42
Joined: 24 Feb 2012, 01:37
E-book readers owned: Sony PRS-T1
Number of books owned: 2000

Re: Bitonal conversion using Photoshop (Gimp & Imagemagic...

Post by xorpt » 29 Aug 2012, 10:32

Hi DrCheap,

thanks for the feedback! I'll see what I can do about the crash (probably possible to forbid using 0 for the Gaussians)

I was planning to write a few tips and advices about using the settings, but the most important one, as you noticed, is the Layer Opacity. The smaller it is, the less you will have white parts in the middle of big characters, but you can loose some details.

The second most important one is the Level White, since it will allow to remove the background and avoid getting too many speckles.

DrCheap
Posts: 48
Joined: 07 Jan 2012, 19:27
E-book readers owned: pdf
Number of books owned: 750

Re: Bitonal conversion using Photoshop (Gimp & Imagemagic...

Post by DrCheap » 14 Sep 2012, 23:35

Yeah, I have found the following settings getting pretty good results on most of my images lately:

LB 20
LW 230 (higher produces speckles, lower eats away at the text)
LG .67
DoG1 6
DoG2 1
GBH 1
GBV 1
LO 35 (lower is good if the text is really black but many book pages are too gray for lower)
1st Thresh 255
2nd Thresh 170

xorpt
Posts: 42
Joined: 24 Feb 2012, 01:37
E-book readers owned: Sony PRS-T1
Number of books owned: 2000

Re: Bitonal conversion using Photoshop (Gimp & Imagemagic...

Post by xorpt » 27 Sep 2012, 07:36

A few changes, so I made a new version. The biggest change is the ability for the single image converter (not batch) to handle images in the middle of the text. This is an equivalent to the "mixed" output in ScanTaylor, except you'll have to select the images a la mano :lol: It is intended to use after the batch converter, for pages which contain both text and images.

The good thing with it is that it allows you to reduce the size of the images dramatically using various dithering options. Of course if you use these options the image quality will drop but I find the weight/quality to be satisfying, especially when producing PDFs for ereaders (usually supporting only 16 levels of Gray).

I'll do another tutorial soon (hopefully) about this in this same topic.

Bitonal Converter for GIMP v.0.3

Change log:
- image handling. To use this function select the images you want to work on using the mouse
(hold Shift to select more than one image) before using the script
- changed default opacity to 35
- changed minimum values of DoG and Gaussians to 1 instead of 0 (which made the script crash)

Download:
Attachments
bitonal-converter-v0.3.zip
(1.77 KiB) Downloaded 298 times

xorpt
Posts: 42
Joined: 24 Feb 2012, 01:37
E-book readers owned: Sony PRS-T1
Number of books owned: 2000

Re: Bitonal conversion using Photoshop (Gimp & Imagemagic...

Post by xorpt » 10 Oct 2012, 02:47

Sorry, sorry, sorry... there was a parameter missing in the batch bitonal converter and it didn't work anymore.

Here is the correct v.0.3:
Attachments
bitonal-converter-v.0.3.zip
(1.77 KiB) Downloaded 348 times

Post Reply