Needed DPI - a few thoughts

DIY Book Scanner Skunk Works. Share your crazy ideas and novel approaches. Home of the "3D structure of a book" thread.

Moderator: peterZ

Post Reply
User avatar
Heelgrasper
Posts: 70
Joined: 19 Feb 2012, 21:04
E-book readers owned: None
Number of books owned: 500
Location: Randers, Denmark

Needed DPI - a few thoughts

Post by Heelgrasper »

Note: This might not be posted in the right forum, so please move it if that's the case. Had a bit trouble finding the right place for it.

Reading about efforts to improve bitonal output from scanning in http://www.diybookscanner.org/forum/vie ... =19&t=2554 brought me to some interesting info from Cornell University Library: http://www.library.cornell.edu/preserva ... tents.html. The tutorial is rather old in digital terms since the original is from 2000 and it haven't been updated since 2003 (and some external links are dead) but in part 3 under "benchmarking" I found something useful that still seems to be true.

In particular it's the use of the Quality Index (QI) I find useful. For bitonal scanned printed text it's defined as:

QI = (dpi x .039h)/3

Here h = size of characters in millimeters. If they are measured in inches the .039 part is omited. The formula can of course be used to calculate h based on desired QI and known DPI or calculate needed DPI to get a certain QI with a particular h:

h = 3QI/.039dpi
dpi = 3QI/.039h

The scale is so that 3.0 is barely legible quality, 3.6 is marginal, 5.0 is good and 8.0 is excellent. A quick test seems to make that reasonable. If letters are 2 mm high and we scan at 300 dpi we get:

QI = (300 x .039 x 2)/3 = 7.8 (or between good and excellent, closest to excellent)

Same way one could calculate what dpi is needed to get "good quality" with characters of the same size. That would be 3x5/.039x2 = 192.3 dpi. This all seems very close to the general recommendations of at least 200 dpi to be able to do OCR, 300 dpi as desired and more than 300 dpi only really needed for text with small fonts etc.

A real world example might be in place. I recently scanned a book with a resulting dpi of 370. The letters in general are 2 mm high, the numbers used in the notes only 1 mm (round figures). Here's a part of the notes on a page (not downsized):
IMG_0608-part.jpg
IMG_0608-part.jpg (29.53 KiB) Viewed 6567 times
QI on the 2 mm sized letters would be 9.62 (above excellent) and on the 1 mm sized note numbers 4.81 (just under good). That the QI is a bit low on the small note numbers is clear when the page has been through ScanTailor with standard settings (not downsized):
0011-part.jpg
Yes, it's pretty clear that it's a "2" and a "3" but every pixel was needed and a few more wouldn't have hurt.

So the conclusion must be that the QI is a useful tool when it comes to figurering out what kind of DPI you need for at particular text. For special texts (fraktur typefaces etc) you might need a bit more than the QI would suggest but I haven't tested this.

There's also a QI for greyscale scans (replace 3 with 2) and - a bit confusing - a QI based on stroke width where the QI-scale is different. Stroke width is clearly a more relevant variabel when it comes to line art etc. but is also much harder to measure. It's pretty hard to measure if the finest line is .08 mm, .10 mm or .12 mm even though the later is 50% wider than the first.

Inspired by the text from Cornell I would suggest a more hands on approach to figurering out if a setup is good enough for line art in a particular book:

1. Find the line art with the finest lines in the book
2. Take a test shot of the page with the line art. It doesn't need to be in the actual scanning setup just as long as it's a photo of a whole page just as it would be in the scanning situation so that the dpi is about the same
3. Load the photo in PhotoShop, GIMP or whatever you like and measure how many pixes wide the finest line is.

I haven't tested it but my guess is that if you find that the line is a least 3 pixels wide the setup is good enough for bitonal output. If it's 2 pixels you might get something useful in grayscale/color but bitonal is likely to mess it up. At least this lines up nicely with the formulas from Cornell based on stroke width.
---
Jakob Øhlenschlæger
Randers, Denmark

The past is a foreign country: they do things differently there
L. P. Hartley
User avatar
Heelgrasper
Posts: 70
Joined: 19 Feb 2012, 21:04
E-book readers owned: None
Number of books owned: 500
Location: Randers, Denmark

Re: Needed DPI - a few thoughts

Post by Heelgrasper »

Just wanted to add the link for the book that the tutorial relates to:
Anne R. Kenney and Oya Y. Rieger (ed.):
Moving theory into practice : digital imaging for libraries and archives
Research Libraries Group, Mountain View, California 2000
ISBN: 0-9700225-0-6
189 pages
http://cdm15003.contentdm.oclc.org/util ... me/270.pdf
---
Jakob Øhlenschlæger
Randers, Denmark

The past is a foreign country: they do things differently there
L. P. Hartley
xorpt
Posts: 42
Joined: 24 Feb 2012, 01:37
E-book readers owned: Sony PRS-T1
Number of books owned: 2000

Re: Needed DPI - a few thoughts

Post by xorpt »

Very interesting :) thanks

When scanning Japanese text, a lot of issues are caused by the furigana = the reading of kanji characters. For difficult-to-read chinese characters, the reading is added next to it, in smaller "kana" letters.

Example:

Image

Most of the time the "finest lines" in the book are the two small strokes you can see on the right side of the first reading character (it's called a dakuten)... it's probably where I should measure the QI! I'm afraid I'd need new cameras to improve this though :lol:

<spam>
BTW, I get this result on your example using my "Gimp bitonal converter v.0.2" (see http://www.diybookscanner.org/forum/vie ... =19&t=2554)

Image

I didn't fine-tune the parameters so it might be possible to do better... but it's already not so bad I think. The 2 and 3 are clearly defined.
</spam>
Post Reply