Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Binarisation Question

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.
Post Reply
univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Binarisation Question

Post by univurshul » 01 Nov 2010, 22:43

Hey Tulon,

what are your thoughts regarding how Scan Tailor binarises to black and white?

I think it's the best conversion I've tried, but was wondering what conditions would create smother, sharper text characters?

I conducted a test with binary results here: http://www.diybookscanner.org/forum/vie ... 6118#p6118

Is this simply a normal side effect of binarisation? Know of any post-processing tools that smooth characters?

Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Binarisation Question

Post by Tulon » 02 Nov 2010, 04:07

In Scan Tailor, I apply Savitzky-Golay smoothing filter before binarization and do some hit-miss-transform based patching after. Binarization itself is global (not adaptive) but happens after illumination equalization.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.

univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: Binarisation Question

Post by univurshul » 04 Nov 2010, 09:47

OK.

I was wondering what you think of applying smoothing after the image has been rendered to B/W. For example, Adobe ClearScan applies some kind of definition to text characters. An example of this is here: http://www.diybookscanner.org/forum/vie ... 6169#p6169

Would you consider this type of rendering to the image more of altering/changing the intended output from it's origins or is it something that could/should be used across the entirety of B/W processed field (e.g., graphics, text, lines, borders, etc).

As you can see, capture device hardware can't aid in improving binarisation side effects.

User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: Binarisation Question

Post by Misty » 04 Nov 2010, 09:57

I think you're a bit too focused on looking at the text at a pixel level. Scan Tailor produces 600dpi output, which is meant to be viewed scaled-down on a screen or printed out at full resolution. When scaled down, you get an anti-aliasing effect that smooths text. When printed, I find that Scan Tailor's binarized output is essentially indistinguishable from the original. I'm comparing a printout I just did on a laser printer with an original book page, and the two look so close to identical as to be impossible to tell apart without a magnifying glass.

Or am I misunderstanding? Are you getting text that is visibly chunky scaled to screen size or printed?
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.

univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: Binarisation Question

Post by univurshul » 04 Nov 2010, 10:15

Misty wrote: Or am I misunderstanding? Are you getting text that is visibly chunky scaled to screen size or printed?
Yes, you might be thinking another direction. And no, I am not getting visibly chunky scaled to screen size. But I am noticing there's a very interesting reason why Adobe developed ClearScan.

I also agree that at 100% viewing or actual size reprint is not a big issue--and I'm not interested in going there because this is a non-sequitur when approaching Tulon with these questions.

That said, we're getting amazing clarity with ClearScan imbedded in the OCR within Acobat. On all reading devices with exception to e-ink (which I haven't tried yet), this is a big optical improvement, especially when approaching binarisation text in detail greater than 6 lines text per vertical inch.

If you head over to the NEX=5 discussion it's laid out with additional points: http://www.diybookscanner.org/forum/vie ... 6169#p6169

User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: Binarisation Question

Post by Misty » 04 Nov 2010, 12:47

I saw your thread there! However, I noticed that your examples seem to always been when you're zoomed in at 400-600%. Since this isn't a problem when you're zoomed out, I'm trying to get a sense of why this is an issue, which would probably give me a better idea of whether I can recommend anything. You mention on all reading devices - is the appearance different from scaling down on a computer screen?
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.

univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: Binarisation Question

Post by univurshul » 04 Nov 2010, 13:09

Correct, some of these are deep zooms. But what I'm after is how to retouch/perfect binarisation, and of course test these ideas & techniques employed by other dev teams.

Misty, you shouldn't take it as an issue. Take it as a small pearl of exploration in testing within binarisation. I never took issue with the baseline people are running; and I'm not suggesting people go out and implement it. But it is interesting. And I didn't bring it up. Shaknum did. I'm simply pursuing the idea, and noticing what's good for most can be made better very easily.

I don't look at scaling too much because once it loads on an iPad or kind-like device it's 'hardened concrete' if you will.

And ultimately, I just want to pick Tulon's brain before his retirement from Scan Tailor. + the huge fact that I learned that literally hundreds if-not-all capture devices succumb to binarisation blocking/side effects. ST nearly levels the playing field for most of these devices.
Last edited by Anonymous on 04 Nov 2010, 15:58, edited 1 time in total.

User avatar
daniel_reetz
Posts: 2786
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Binarisation Question

Post by daniel_reetz » 04 Nov 2010, 13:23

Tulon's brain before his retirement from Scan Tailor
I wouldn't read his statement quite the way you did. Doesn't mean he'll go away or stop supporting ST, and it doesn't mean that we won't fork it and continue the software. Just means no new features (other than dewarping) are on the horizon.

Shaknum
Posts: 91
Joined: 16 Aug 2010, 13:10

Re: Binarisation Question

Post by Shaknum » 04 Nov 2010, 23:08

univurshul wrote: And I didn't bring it up. Shaknum did.
Quite true. I don't mean to upset people or anything, I'm just looking to raise the bar on quality a bit. Truth is, I read a lot of my scans zoomed to 200% or more, to make it easier on my eyes (I need them to last). Also, I am scanning rare out of print books that I may never have a chance to get my hands on again, so I want to make sure I have scans that I won't come to regret in the future. Anyway, Misty is completely right that these things look beautiful at 100-150%, so others should not at all be discouraged. However, if there are those out there who want to push the envelope a little, I certainly appreciate all the input I've received here, and will do my best to add to that.

Post Reply