Let's Make A DIY Book Scanner Test Chart

A place to tell us about your work and projects. Self-links encouraged!

Moderator: peterZ

User avatar
reggilbert
Posts: 49
Joined: 28 Sep 2010, 19:57
Number of books owned: 3000
Location: Buffalo, New York

Re: Let's Make A DIY Book Scanner Test Chart

Post by reggilbert »

kasslloyd wrote:Do we want to test functionality of say content selection algorithms like in ST for the Mixed mode?
That sounds complicated enough (and ever-changing, as Tulon seems to be ongoingly addressing such issues in the current dewarping thread in the Scan Tailor forum) to require a separate test page.
kasslloyd wrote:if it's just to test the quality of cameras and resulting books then pictures would be good like those, but also maybe an actual modern color picture over old bw/paintings.. A very dark one where black extends to all the borders and a very light one where white extends to all the borders, to see how it handles the two, imo.
My logic for the predominance of the b&w images (they are four copies of a photo, not a painting, if that matters) was two-fold: 1) b&w images are more common in books, due to the expense of color printing and 2) just my barely educated guess, b&w is better for determining even lighting, because human brains translate colors as lighter/darker in ways not always related to their lightness/darkness, still, as I say, jsut a guess. Thus, for the two reasons, in a choice of the b&w image or a color image spread more predominantly around the test page, I chose the b&w. Btw, a good feature of this particular b&w image is that it goes from near complete white (the shirt collar) to near complete black (the nearby tie). I understand that is a good feature of images being used to test photographic range.

A third, more minor, rationale for more b&w than color image on the page square-inch-wise is that color is not as reproducible as b&w. We have settled on the test page *not* needing to be true to some given original, since we a) per Daniel's last post, don't care that much about that to begin with, and 2) want to accommodate people printing the test page on their own, with inevitably varying results quality-wise. But, taking those two criteria as given, I think we still would like a printed test page as common to all of us participating in this electronic community as possible, and my guess is that b&w is more likely to be similarly produced than color will be. Thus a b&w-dominated page will overall be more similar as produced around the country (or world) than otherwise.

Whatever we decide on the issue of the predominance of color vs. b&w on the test page, the nature of the images remains an issue. I selected the b&w image on the basis of a vague idea about the needed characteristics of a b&w image for the defined testing purposes (all contributions on that score welcome). But I can't say the same for the color image. I just picked it because it seemed to have lot of different colors. Kasslloyd, can you say more about the benefits of the two kinds of color images you are proposing?
kasslloyd
Posts: 41
Joined: 19 Dec 2010, 21:25

Re: Let's Make A DIY Book Scanner Test Chart

Post by kasslloyd »

I'm more interested in content recognition... which is where my suggestions are based from..

As for your PDF, I would remove the separating lines, whats the basis for the lines between the text and pictures? It will also need made on something that can output a high quality pdf, likely we should use 600 dpi images and probably a more common book font like maybe one off of this list: http://fontfeed.com/archives/top-ten-ty ... n-winners/
User avatar
reggilbert
Posts: 49
Joined: 28 Sep 2010, 19:57
Number of books owned: 3000
Location: Buffalo, New York

Re: Let's Make A DIY Book Scanner Test Chart

Post by reggilbert »

kasslloyd wrote:I'm more interested in content recognition... which is where my suggestions are based from..
I do think that is a different mission from the one I am attempting. It might work fine for people to propose and discuss varying test pages with different purposes on this thread, or maybe it would work better to open new threads for them, in this case perhaps titled "let's make a content-based diy scanner test page." I use an Acrobat-based workflow and don't face the content-recognition problems I see discussed in the Scan Tailor forum and the mission as Daniel lays it out and I am trying to implement seems oriented toward camera operation, not post processing.
kasslloyd wrote:I would remove the separating lines, whats the basis for the lines between the text and pictures?
The idea there was to make certain distortions obvious, but maybe the edges of the images, being so squared up, serve the same purpose.
kasslloyd wrote:It will also need made on something that can output a high quality pdf, likely we should use 600 dpi images
That's a good suggestion. Another poster on this thread, if I understood him or her properly, suggested that books are rarely printed to that high a resolution, thus no need to test for it. I am not so sure about that. Trade books, maybe not. But those glossy coffee table books, and anything from National Geographic, very possibly so. The source images on the proposed test page are pretty high res and at the sizes on this proposed test page may be 600 dpi. If not we could size them down to be so, or get other images. But such resolution will only mean anything if the user's output device (or our own, if we produce a version to be sent out to people) can produce it. And even if it can, there is all that insanely complicated stuff about software and printer rendering engines that Daniel mentioned that will be mucking things up. I don't think high resolution images are a major priority for the test page project as Daniel lays it out and I am trying to implement it, but if it can be carried off it should be.
kasslloyd wrote:and probably a more common book font like maybe one off of this list: http://fontfeed.com/archives/top-ten-ty ... n-winners/
By amazing coincidence I have kicking around here somewhere Adobe's high-quality version of the number-one font on that list, Minion, complete with fancy curlicue symbols (called "ornaments" in the trade). And I would love to use that font both for its beauty and its simulation of real books. But I fear using it would introduce difficulties when printed by a local user -- a specialized font cannot be called on a user machine, and embedding fonts in documents sometimes doesn't work. Also, I am not sure what using an official book font really does for us. Although Times New Roman and other standard computer fonts are low quality by type industry standards, the quality issues have to do with aesthetics, letter combinations (for example, a high-quality font has separate letter forms to substitute for the "ff", "fl" and "fi" combinations) and other factors that do not impact the resolution issues that concern us as scanners. However, maybe I am missing something valuable gained by using professional fonts, and maybe the embedding issues I ran across years ago in Acrobat files submitted to printers no longer exist. Or we may end up preferring a high resolution image format for the document rather than Acrobat. That would treat the body of type as a whole as an image and not rely on fonts, so we could use any one we wanted. So please push back on this issue if needed.

By the way, kasslloyd, did you have anything more to say about your earlier suggestions related to the ideal characteristics of the color images>
kasslloyd
Posts: 41
Joined: 19 Dec 2010, 21:25

Re: Let's Make A DIY Book Scanner Test Chart

Post by kasslloyd »

Only that those suggestions are two types that give ST a hard time with content selection, if its dark and black all around the edges it has an issue with trying only to select the light parts but not the dark parts, and if its really light it doesn't select the lighter parts... Not sure either type would be useful to just test camera function without a ST workflow.

I use ST then Acrobat, yields a very clean nice final product, I think.

About the DPI, the pdf you uploaded doesn't seem big enough to be 600 dpi images... Thats why I said what I said.
User avatar
reggilbert
Posts: 49
Joined: 28 Sep 2010, 19:57
Number of books owned: 3000
Location: Buffalo, New York

Re: Let's Make A DIY Book Scanner Test Chart

Post by reggilbert »

kasslloyd wrote:About the DPI, the pdf you uploaded doesn't seem big enough to be 600 dpi images
True, but isn't that the idea behind compressed formats -- they store much smaller than they present?

The source images were originally in jpeg format, which is a compressed format. The b&w image was a 1.1MB jpeg, but if you look at its properties via the "resize and resample" function of a graphic program, the image has a seemingly much higher resolution of 2089 x 3000. Using the simple multiplication math I see all the time on this forum to describe photo files and their relation to CMOS megapixel stats, that should be a 6MB file.

And then somehow four copies of that 1.1 MB jpeg image plus two copies of an originally 1632 x 2016 color image, plus type and design, adds up to a a PDF of only 530KB. Yet you can blow the thing up to 300% -- 25 inches high if my screen could show it all -- and still not see any significant artifacts.

So through the magic of math I think you can have 600 dpi images without logically big enough files, but someone who actually knows how it all works, please weigh in.
kasslloyd
Posts: 41
Joined: 19 Dec 2010, 21:25

Re: Let's Make A DIY Book Scanner Test Chart

Post by kasslloyd »

No... Compressed PDF's actually decrease the DPI to compress it. You need to export it in "Press Quality" mode where it does not change the dpi and saves at maximum quality.
Ryan_phx
Posts: 63
Joined: 29 Dec 2010, 14:51
E-book readers owned: Nook, Kindle DX
Number of books owned: 0
Country: USA
Location: Sandusky, OH

Re: Let's Make A DIY Book Scanner Test Chart

Post by Ryan_phx »

reggilbert wrote: True, but isn't that the idea behind compressed formats -- they store much smaller than they present?
Who knew we were using Time Lord technology--"it's bigger on the inside!"
Anonymous1

Re: Let's Make A DIY Book Scanner Test Chart

Post by Anonymous1 »

reggilbert wrote:
kasslloyd wrote:About the DPI, the pdf you uploaded doesn't seem big enough to be 600 dpi images
True, but isn't that the idea behind compressed formats -- they store much smaller than they present?

The source images were originally in jpeg format, which is a compressed format. The b&w image was a 1.1MB jpeg, but if you look at its properties via the "resize and resample" function of a graphic program, the image has a seemingly much higher resolution of 2089 x 3000. Using the simple multiplication math I see all the time on this forum to describe photo files and their relation to CMOS megapixel stats, that should be a 6MB file.

And then somehow four copies of that 1.1 MB jpeg image plus two copies of an originally 1632 x 2016 color image, plus type and design, adds up to a a PDF of only 530KB. Yet you can blow the thing up to 300% -- 25 inches high if my screen could show it all -- and still not see any significant artifacts.

So through the magic of math I think you can have 600 dpi images without logically big enough files, but someone who actually knows how it all works, please weigh in.
My pictures are ~6 MB, but the resolution is 5184 x 3456. I think they should be a bit smaller (for bitonal text with this camera, I get ~200 KB per page. It's a bit big, but you can zoom forever).
kasslloyd wrote:No... Compressed PDF's actually decrease the DPI to compress it. You need to export it in "Press Quality" mode where it does not change the dpi and saves at maximum quality.
Hmm, if that's for Acrobat, it could be a preset compression mode. Most non-Acrobat outputs retain the original DPIs (being a DjVu user, with my higher compression ratios, I don't have these problems ;) ), I'm pretty sure.
kasslloyd
Posts: 41
Joined: 19 Dec 2010, 21:25

Re: Let's Make A DIY Book Scanner Test Chart

Post by kasslloyd »

In acrobat there is options to downsample images, depending on what setting you pick the more it downsamples and compresses... web mode is one of the highest compressions, i.e. it takes print quality images and resizes down to be displayed on the screen only.. press quality mode is the highest quality of the preset settings...

Just saying, for this project it should be exported into highest quality I think, is all Im saying...
User avatar
reggilbert
Posts: 49
Joined: 28 Sep 2010, 19:57
Number of books owned: 3000
Location: Buffalo, New York

Re: Let's Make A DIY Book Scanner Test Chart

Post by reggilbert »

Just tickling this topic. To summarize the discussion so far:

The test page discussed here is intended to measure the image produced by a scanner setup. It is not intended to post-processing setups, a worthy goal that could be the subject of another topic.

The test page discussed here should be designed to be available both for download and for uniformly printed, relatively high-quality versions to be physically sent out from a central source. I have volunteered to produce and distribute such images, but once expenses get to $25 or so I would want help from the community.

Commenters have suggested that the test page proposed on page 2 of this thread -- look at the attached PDF, not the post's poorer-quality inline image -- should:

--have an image specifically designed for testing resolution like the Siemens star described in Wikipedia
--lose the lines between images and text and image (as unnecessary -- straightness in the image is covered by the image edges)
--be produced in higher resolution (600 dpi max), since my freeware publishing program limits output options. kasslloyd has suggested he/she might be willing to produce the final image in Adobe InDesign

Some commenters questioned the need for the type and placement of the images. I think my response dealt with those objections, but am not certain. Please comment again if not.

With this post I am looking for:

1) note of any important aspect of the discussion so far that is absent or misrepresented in the above summary
2) finalizing comments that allow us to make content decisions and move to production
3) clear commitment by kasslloyd or other commenters on producing the final image in a publishing program that can output a high-dpi PDF, or another format that we determine is sufficiently universal

Thanks.
Post Reply