HELP - Scan Tailor Project --> .pdf

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

User avatar
clemd973
Posts: 121
Joined: 22 Aug 2010, 21:20

HELP - Scan Tailor Project --> .pdf

Post by clemd973 »

I've just finished building my scanner, and I've even got the cameras up and running with SDM...everything working beautifully. Now I'm ready to test the post processing so I loaded Scan Tailor, watched the video tutorials and even processed a test-project of about 10 pages. But I'm at a loss now as to how to get the STProject to .pdf in order to view it on my computer and mobile device. What's the most common procedure? I'm using a MacBook Pro, with Windows XP installed as well. Thanks. :?
univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: HELP - Scan Tailor Project --> .pdf

Post by univurshul »

You need PDF binding/building software. There is freeware, shareware and flagship software which does this.

First, locate where the output TIFFs that were produced in Scan Tailor.

OSX has the app "Preview' already on your Mac which can convert a series of TIFFs from Scan Tailor into a PDF. Simply open Preview with the cover image TIFF, and drag more images onto the opened TIFF. It should combine the TIFFs where you can then save the correlated images as a single PDF. In ColorSync utility app (built-in on OSX), you can make custom compression settings for your PDFs too.

When you have the desire to OCR your TIFFs, I personally recommend OmniPage Pro X. It performs OCR before it compresses and converts the image to PDF. There are several apps like AABBY Express, Adobe Acrobat, and Readiris, etc.

There is also plans regarding some interesting PDF & DJVU software for the community being written by DIY members here, so you should also explore djvubind (http://www.diybookscanner.org/forum/vie ... ?f=3&t=521) and look for an upcoming PDF builder app as well.

But try and stay with PDFs for awhile to ensure application compatibility and don't delete your master processed images; you need to determine the most ideal compression settings and what format will be best for you. This takes time, trial and error.
spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: HELP - Scan Tailor Project --> .pdf

Post by spamsickle »

Since you say you have Windows XP installed, here's what I'm doing.

A product called ImageMagick converts from Scan Tailor's TIF images to pdf, with one console command:

mogrify -format pdf *.tif

Once I have PDF versions of all the pages, I use a second tool, pdftk, to put them together with the command

pdftk p*.pdf cat output mybook.pdf

The only thing to be careful of here is not to get into an infinite loop by accidentally mixing your output with your input. All my separate pages are named either p0001.pdf or simply 0001.pdf, so my input specification is either p*.pdf or 0*.pdf, and I make sure my output name doesn't begin with either "p" or "0".

For most books now, I'm also going through a third step, using Adobe Acrobat to OCR and output a Clearscan version of the PDF. This is a commercial product, though, unlike Image Magick and pdftk.
User avatar
dingodog
Posts: 110
Joined: 22 Jul 2010, 18:19
Number of books owned: 1000
Country: on the net
Location: on the net
Contact:

Re: HELP - Scan Tailor Project --> .pdf

Post by dingodog »

spamsickle wrote:Since you say you have Windows XP installed, here's what I'm doing.

mogrify -format pdf *.tif
I use
*sam2p*
- http://pts.szit.bme.hu/sam2p/

with this script:

Code: Select all

#!/bin/bash

directory=`pwd`

for file in $directory/*.tiff
do
   filename=${file%.tiff}
   sam2p $filename.tiff $filename.pdf
done
spamsickle wrote:
then I also use pdftk
Once I have PDF versions of all the pages, I use a second tool, pdftk, to put them together with the command

pdftk p*.pdf cat output mybook.pdf
it is important to perform a further refinement, after joined the single pdfs, XREF table must be rebuilt

Code: Select all

pdftk *.pdf cat output mybook.pdf ; pdftk mybook.pdf output fixed.pdf ; mv fixed.pdf mybook.pdf
Since when pdftk (but also other softwares) join the single pdfs, internal XREF table goes corrupted. This does not makes unreadable the file, but pdf is not in standard and some apps (like ghostscript) refuse to operate about a pdf non standard, showing the error message INVALID XREF TABLE
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: HELP - Scan Tailor Project --> .pdf

Post by Misty »

Apologies I've been taking so long with my PDF maker. I've been busy on non-scanning projects for the past few months, which has kept me away from it, and I originally left it off when I ran into a problem with ImageMagick. I'm still aiming to get it finished in the relatively near future, and I have most of the technical issues sorted through now.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
User avatar
clemd973
Posts: 121
Joined: 22 Aug 2010, 21:20

Re: HELP - Scan Tailor Project --> .pdf

Post by clemd973 »

univurshul wrote:You need PDF binding/building software. There is freeware, shareware and flagship software which does this.

First, locate where the output TIFFs that were produced in Scan Tailor.

OSX has the app "Preview' already on your Mac which can convert a series of TIFFs from Scan Tailor into a PDF. Simply open Preview with the cover image TIFF, and drag more images onto the opened TIFF. It should combine the TIFFs where you can then save the correlated images as a single PDF. In ColorSync utility app (built-in on OSX), you can make custom compression settings for your PDFs too.

When you have the desire to OCR your TIFFs, I personally recommend OmniPage Pro X. It performs OCR before it compresses and converts the image to PDF. There are several apps like AABBY Express, Adobe Acrobat, and Readiris, etc.

There is also plans regarding some interesting PDF & DJVU software for the community being written by DIY members here, so you should also explore djvubind (http://www.diybookscanner.org/forum/vie ... ?f=3&t=521) and look for an upcoming PDF builder app as well.

But try and stay with PDFs for awhile to ensure application compatibility and don't delete your master processed images; you need to determine the most ideal compression settings and what format will be best for you. This takes time, trial and error.
Thanks for the information. I'm trying to shorten the learning curve with Scan Tailor, and as soon as I become adept at formatting the images/pages. I'll use this information and some of the things from the other replies to put everything in a .pdf. It would be great if Scan Tailor would have this built in - sort of an all-in-one post processing program. Thanks again.
univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: HELP - Scan Tailor Project --> .pdf

Post by univurshul »

clemd973 wrote:It would be great if Scan Tailor would have this built in - sort of an all-in-one post processing program. Thanks again.
Scan Tailor just works with the images and cleans them for later ebook construction. That alone is worth buffering apps before and after it. However, we do have a member spearheading the all-in-one route: http://www.diybookscanner.org/forum/vie ... ?f=3&t=302

I haven't had a chance to test it myself.

I'm actually busy testing software tools that focus on preparing images pre-Scan Tailor. I'll have a discussion posted about Adobe Lightroom 3 soon.
User avatar
clemd973
Posts: 121
Joined: 22 Aug 2010, 21:20

Re: HELP - Scan Tailor Project --> .pdf

Post by clemd973 »

Misty wrote:Apologies I've been taking so long with my PDF maker. I've been busy on non-scanning projects for the past few months, which has kept me away from it, and I originally left it off when I ran into a problem with ImageMagick. I'm still aiming to get it finished in the relatively near future, and I have most of the technical issues sorted through now.
Can't wait to see it. How will we know when it's up and running??? As for me, please PM me when it's ready...I'd love to beta-test it if you're planning on going that route! Philip
User avatar
clemd973
Posts: 121
Joined: 22 Aug 2010, 21:20

Re: HELP - Scan Tailor Project --> .pdf

Post by clemd973 »

univurshul wrote: I'm actually busy testing software tools that focus on preparing images pre-Scan Tailor. I'll have a discussion posted about Adobe Lightroom 3 soon.
For Mac users, here's a good alternative route for pre-ScanTailor processing: http://www.diybookscanner.org/forum/vie ... ?f=3&t=527. Please let us know about the Adobe Lightroom 3 discussion.
spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: HELP - Scan Tailor Project --> .pdf

Post by spamsickle »

dingodog wrote: I use
*sam2p*
- http://pts.szit.bme.hu/sam2p/
I see that sam2p has Windows binaries as well as Linux. The author claims that it's better than ImageMagick for creating PDFs, and the reasons he gives seem reasonable.

I'll give it a try. Just doing a straight no-fiddling conversion of a single TIF file from an old scan, the sam2p version was quite a bit smaller (308K vs 465K), and I can't see the difference between them. That's not necessarily a big deal if I'm going to use Acrobat's Clearscan option after the PDF has been built, but it does appear to confirm the author's claim of smaller files. He also claims faster creation and finer control. I still need to learn more about the PDF format.
Post Reply