Noob questions about DPI, Scan Tailor and PDF creation

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.

Moderator: peterZ

michael
Posts: 10
Joined: 04 Mar 2014, 00:52

Noob questions about DPI, Scan Tailor and PDF creation

Post by michael »

Hello, my name is Michael, and I'm a book scanning noob. ;)

I've been trying to figure many things on my own, but I'm getting too confused about a couple things (book scanning related, of course), so it's time to ask for help. But before, just a couple details about what I'm doing so you can see where I'm coming from.

I'm studying in the history of literature field (in french mostly, so pardon my english) and I often have to work with documents I can't take out of libraries. So, about 6 months ago, I decided to get a digital camera to archive some material I work on instead of going to the library all the time. My goal was basic and humble : I planned to work directly with the .jpg straight out of the camera... That is until I discovered this amazing forum/community just before last christmas. Now you all motivated me to do more with my images and I'm very interested in getting better results by processing the images with Scan Tailor. The problem is I'm trying to learn too many aspects at same time (I obviously don't know much about digital photography for a start :s) and I can't solve some problems on my own. I searched a lot on forum and on the web to find answers, but I couldn't find answers clear or specific enough for what I'm doing (please direct me to somewhere else if you think I missed something - a post answering my questions, for example). Meanwhile, I'm here to ask my very basic questions :)

First, the DPI question. After reading a bit, I understand and I don't understand what it is... :( Concretely, for example, the images I'm working on are 72 dpi, so when I start a new project in Scan Tailor, it tells me I need to "Fix DPI" on all pages. Why can't I keep them in 72 dpi? And for the result I'm looking for (I don't OCR anything, I want to keep original layout so I just want to get all the images in a single pdf document at the end, to read on screen, not printing), is "300x300" the good choice to take?

Second, the PDF question. My first (kinda blindly) processed book in Scan Tailor got me a satisfying result. To put all images in PDF, I just selected everything in windows explorer, right-clicked to print, and then converted with PrimoPDF. It worked, but also crashed 2 times before it finally worked, and sometimes the order of files seemed random (not good!). So after reading a bit on forum here, I decided to get NitroPDF pro to make this step. My problem is now that PDF created through Nitro (with the "Combine Files" option) is too "small" (visually, I mean) and I can't get it to correct format. Not sure how to explain more clearly, so here are two examples :

a)A page created with PrimoPDF (from the print menu)

b)Same page created with NitroPDF

[edit. november 1st 2010: sorry, I changed server and lost the example files, just removed the links now to avoid confusion...]

As you can see, the "zoom level" (or I don't know how to call this) is different. What do I need to change in NitroPDF settings to get it to create PDFs at the same zoom level or my a) example? What am I doing wrong? Is it related to DPI settings (I think not, as both are using the same .tiff image as source)? Do you think I should use different software?

Sorry for my noobness, but I'm lost and would really appreciate some help at solving these issues. Lastly, this is my first message here, but I read regularly the on-going discussions and I want to say congratulations to you all, you are doing awesome projects (I'm far from building an evoluated book scanner!), and thanks a lot for sharing information and ressources here. :)

Michael

PS. I wasn't too sure where to post this, so feel free to move it if you think it fits better elsewhere.
Last edited by Anonymous on 01 Nov 2010, 12:31, edited 1 time in total.
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Noob questions about DPI, Scan Tailor and PDF creation

Post by rob »

OK, the first thing you need to know is that what comes out of the camera is not 72 dpi. The file may claim it is 72 dpi, but it is not because of the zoom factor of the camera and the distance of the camera to the page.

When Scan Tailor says you need to fix the dpi, what it means is that you need to tell Scan Tailor how many pixels (or dots) per inch your image has. The way to do that is to pull one of your images up in some program that will let you measure in pixels. Measure one side of the book, and then physically measure the same side of the physical book with a ruler, and then divide the pixels by the inches to get actual dpi. Assuming that all your images were taken at the roughly same zoom factor and with the camera at roughly the same distance from the page, you can tell Scan Tailor to apply that dpi to all the images.

After you do a couple of books, you'll find that you're getting roughly the same dpi each time for the same sized book. I usually get something like 270 dpi or so for large books, 400 or so for paperbacks.

One thing you should know: the image you get out of the camera is color -- effectively grayscale, though. There is an algorithm which will take a lower-resolution grayscale image and convert to a higher-resolution bilevel (black and white) image. This is perfect for text, and Scan Tailor does this in its output stage. I like to set Scan Tailor to output at 600 dpi because that seems to give the best print quality. OCR may not need more than 300 dpi.

As for your PDFs, it definitely looks like PrimoPDF kept the right dpi, while NitroPDF downsampled to a smaller dpi. Since I am on a Mac, I don't have much experience with Windows PDF creation tools, so I'm afraid you'll have to search the forum for advice. What you can do is take maybe ten images and put them in a separate test folder, and then play with PrimoPDF and NitroPDF until you find the settings that work. Then you can run it on your entire book.

I hope that helps!
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
michael
Posts: 10
Joined: 04 Mar 2014, 00:52

Re: Noob questions about DPI, Scan Tailor and PDF creation

Post by michael »

Wow, that definately helps. Thank you very much for taking the time to help me understand clearly.

About the PDF creation, I also got (quite recently, so not yet very familiar with software available) a Macbook pro, so I can use it to create the final PDF. What software do you use for this on Mac?

And I will try to find the correct settings for NitroPDF. Again, thanks for your help, I appreciate it.
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Noob questions about DPI, Scan Tailor and PDF creation

Post by rob »

Well, I use Adobe Acrobat to create a PDF. But as long as you have 10.6, you can also select all of your images, and then right click on any one and open with Preview. Then save as PDF.

Also, Preview will work for measuring your images, but only under 10.6.
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
michael
Posts: 10
Joined: 04 Mar 2014, 00:52

Re: Noob questions about DPI, Scan Tailor and PDF creation

Post by michael »

Will go test - thanks again for the fast answers.
StevePoling
Posts: 290
Joined: 20 Jun 2009, 12:19
E-book readers owned: SONY PRS-505, Kindle DX
Number of books owned: 9999
Location: Grand Rapids, MI
Contact:

Re: Noob questions about DPI, Scan Tailor and PDF creation

Post by StevePoling »

rob wrote:When Scan Tailor says you need to fix the dpi, what it means is that you need to tell Scan Tailor how many pixels (or dots) per inch your image has. The way to do that is to pull one of your images up in some program that will let you measure in pixels. Measure one side of the book, and then physically measure the same side of the physical book with a ruler, and then divide the pixels by the inches to get actual dpi.
Every time I scan a book, I end up with some pages where the left camera sees a page and the right camera gets an empty page. Or vice-versa. Therefore, I intend to put a ruler on the empty page so that I can do what you just suggested with greater accuracy.

Is there any way I can force both cameras to use the same zoom level? Or to tell Scan Tailor that all my even images have a dpi of X and all my odd images have a dpi of Y? OR is it no big deal if dpi is different, but close?
michael
Posts: 10
Joined: 04 Mar 2014, 00:52

Re: Noob questions about DPI, Scan Tailor and PDF creation

Post by michael »

rob wrote:Well, I use Adobe Acrobat to create a PDF. But as long as you have 10.6, you can also select all of your images, and then right click on any one and open with Preview. Then save as PDF.

Also, Preview will work for measuring your images, but only under 10.6.
Hmmm, resaving in PDF in Preview gives me the same size as NitroPDF (the b) example or my original post). :? Now I don't understand why PrimoPDF gives me a "correct" size... Guess I will stick with PrimoPDF to create PDFs and use something like pdftk to merge the files after.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Noob questions about DPI, Scan Tailor and PDF creation

Post by Tulon »

In case you don't have access to the physical book any more to measure it, there is still a way to roughly estimate the DPI. You do it by loading one page into a graphics editor (Gimp will do) and use a rectangular selection tool to select 6 lines of text. Gimp displays the pixel size of your selection while you are selecting it. The height of that selection is roughly your DPI. That's because most books are printed in such a way that 6 lines of text fit vertically into one inch. Some day I'll try to implement automatic DPI guessing, based on the Fourier transform and the above observation.

Unfortunately, currently there is no easy way to apply a particular DPI to odd or even pages only, so you better find a way to equalize the zoom on both cameras.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
michael
Posts: 10
Joined: 04 Mar 2014, 00:52

Re: Noob questions about DPI, Scan Tailor and PDF creation

Post by michael »

Tulon wrote:In case you don't have access to the physical book any more to measure it, there is still a way to roughly estimate the DPI. You do it by loading one page into a graphics editor (Gimp will do) and use a rectangular selection tool to select 6 lines of text. Gimp displays the pixel size of your selection while you are selecting it. The height of that selection is roughly your DPI. That's because most books are printed in such a way that 6 lines of text fit vertically into one inch. Some day I'll try to implement automatic DPI guessing, based on the Fourier transform and the above observation.

Unfortunately, currently there is no easy way to apply a particular DPI to odd or even pages only, so you better find a way to equalize the zoom on both cameras.
Thanks for the tip Tulon (I indeed don't have access to the physical books anymore), I read about it in another thread but it gave ma strange result when I tried it so I thought I did something wrong. Will try again and will try to change the values in the project file (with notepad++), as I probably falsely entered 300x300 when starting the project, without knowing what I was doing. Could this be the cause of the problematic format of my PDFs at the end, then?
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Noob questions about DPI, Scan Tailor and PDF creation

Post by daniel_reetz »

michael wrote: Could this be the cause of the problematic format of my PDFs at the end, then?
Since you used two different programs to create the PDFs, the result is much more likely to be the default settings in those programs than it is to be in Scan Tailor. In other words, you gave two different programs the same input and you got different output from each one. That means that the differences are probably in NitroPDF and PrimoPDF did something different. As Rob said above,
rob wrote: As for your PDFs, it definitely looks like PrimoPDF kept the right dpi, while NitroPDF downsampled to a smaller dpi. Since I am on a Mac, I don't have much experience with Windows PDF creation tools, so I'm afraid you'll have to search the forum for advice. What you can do is take maybe ten images and put them in a separate test folder, and then play with PrimoPDF and NitroPDF until you find the settings that work. Then you can run it on your entire book.
I hope that helps!
Fixing your DPI in Scan Tailor might work, but it's probably more useful to play with PrimoPDF or NitroPDF at this point to figure out what they're doing. Look for menus with names like "settings" and "preferences" and see what settings you can change now that you have a little experience. Please share back what you learned, too! We could all use a little more knowledge on all kinds of tools.
Post Reply