Bitonal conversion using Photoshop (Gimp & Imagemagic...)
Posted: 24 Apr 2012, 20:51
I'd like to share with you the results of long hours of research... I hope you Photoshop experts can suggest me new ways!
Background
I use ScanTailor for processing my scans. Although I find it very efficient to split pages, deskew and set margins around contents, sometimes I am frustrated by the bitonal output. It is usually good (or very good ), especially on standard alphabetic text, but for text containing smaller details, or for other languages, it can be far from satisfactory.
I scan a lot of japanese text for my own language study, and I need to be able to read it on my ereader. Using a grayscale instead of bitonal gives very clean characters, but there are two drawbacks: larger PDF files (no DJVU), and less contrast on the eInk (actually not readable, in my opinion).
Bitonal conversion using ScanTailor
Here is an example of japanese output from ST, using setting "0"
Scan Tailor using "Thinner -20"
As you can see in the first image some of the characters are too thick. If you try to lighten the output it improves a little bit, but the other characters tend to disappear. Note that I get slightly better results using ABBYY Finereader, but not significantly better...
Bitonal conversion using Photoshop
1. First, output the page from Scan Tailor using the color / grayscale output
2. Image > Mode > Grayscale
3. Image > Adjustments > Levels. Remove as much as possible from the background (or you will get speckles...) and increase contrast as much as possible without loosing too much details.
4. Select Background Layer > Duplicate Layer
5. Select New layer -> Filter > Sketch > Photocopy. Use Detail = 1, Darkness = 50. The "photocopy" filter nicely draws the edges of each character. Drawback: it leaves the center white
6. Select New layer -> Filter > Blur > Gaussian Blur. Use 0.5 to 1 (here 0.6). The "photocopy" filter's output is crenelated, so we can limit that a little by adding Gaussian blur. You can try different values, to get the best compromise between details and smoothness.
7. Select New layer -> Change "Fill" to 70%. This will show the Background layer through the new layer, allowing to fill the inside of the characters, that the "photocopy" filter messed up.
8. New adjustement layer > Threshold. Value = 190. This converts to bitonal
compared with ST:
As you can see the result is not as smooth as ST, but with much more details. In my experience, this is not a problem since the PDF viewer will correct that. On a ereader the characters are quite smooth, no problem to read... and very contrasty
If you have better ideas, please share
Background
I use ScanTailor for processing my scans. Although I find it very efficient to split pages, deskew and set margins around contents, sometimes I am frustrated by the bitonal output. It is usually good (or very good ), especially on standard alphabetic text, but for text containing smaller details, or for other languages, it can be far from satisfactory.
I scan a lot of japanese text for my own language study, and I need to be able to read it on my ereader. Using a grayscale instead of bitonal gives very clean characters, but there are two drawbacks: larger PDF files (no DJVU), and less contrast on the eInk (actually not readable, in my opinion).
Bitonal conversion using ScanTailor
Here is an example of japanese output from ST, using setting "0"
Scan Tailor using "Thinner -20"
As you can see in the first image some of the characters are too thick. If you try to lighten the output it improves a little bit, but the other characters tend to disappear. Note that I get slightly better results using ABBYY Finereader, but not significantly better...
Bitonal conversion using Photoshop
1. First, output the page from Scan Tailor using the color / grayscale output
2. Image > Mode > Grayscale
3. Image > Adjustments > Levels. Remove as much as possible from the background (or you will get speckles...) and increase contrast as much as possible without loosing too much details.
4. Select Background Layer > Duplicate Layer
5. Select New layer -> Filter > Sketch > Photocopy. Use Detail = 1, Darkness = 50. The "photocopy" filter nicely draws the edges of each character. Drawback: it leaves the center white
6. Select New layer -> Filter > Blur > Gaussian Blur. Use 0.5 to 1 (here 0.6). The "photocopy" filter's output is crenelated, so we can limit that a little by adding Gaussian blur. You can try different values, to get the best compromise between details and smoothness.
7. Select New layer -> Change "Fill" to 70%. This will show the Background layer through the new layer, allowing to fill the inside of the characters, that the "photocopy" filter messed up.
8. New adjustement layer > Threshold. Value = 190. This converts to bitonal
compared with ST:
As you can see the result is not as smooth as ST, but with much more details. In my experience, this is not a problem since the PDF viewer will correct that. On a ereader the characters are quite smooth, no problem to read... and very contrasty
If you have better ideas, please share