Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

HELP - Scan Tailor Project --> .pdf

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.
User avatar
clemd973
Posts: 121
Joined: 22 Aug 2010, 21:20

Re: HELP - Scan Tailor Project --> .pdf

Post by clemd973 » 23 Oct 2010, 11:48

Tulon wrote:Running batch processing on the Output stage would generate the output files and put them to the "out" subdirectory under your input directory (unless another output directory was explicitly specified).
Thanks.

User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: HELP - Scan Tailor Project --> .pdf

Post by Misty » 25 Oct 2010, 10:39

dingodog wrote:jbig2 -s -p -v *.tif ; pdf.py output > out.pdf
Just a warning about doing it that way. Processing all TIFFs in one command like that means that all of your TIFFs are sharing the same dictionary. That means better compression, but it also means more complex decompression for software reading your PDF. I've found that even powerful computers will become very sluggish towards the end of a 200+ page PDF produced that way. Handheld devices would have even more problems! Even though it means a bigger PDF, I find it's worth it to process PDFs with that many pages using a separate dictionary for each page, which means processing the TIFFs separately and joining them later.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.

User avatar
dingodog
Posts: 106
Joined: 22 Jul 2010, 18:19
Number of books owned: 1000
Country: on the net
Location: on the net
Contact:

Re: HELP - Scan Tailor Project --> .pdf

Post by dingodog » 25 Oct 2010, 11:42

this is why I use

*Jbig2enc+ akrykukov patch*
- http://dokupuppylinux.co.cc/programs:encoders
(it needs its own thessalonica-pdf.py)

having a new swicth
-P <number> --pages-per-dict <number>: pages per dictionary (default 15)
-d --duplicate-line-removal: use TPGD in generic region coder
-p --pdf: produce PDF ready data
-P <number> --pages-per-dict <number>: pages per dictionary (default 15)
-s --symbol-mode: use text region, not generic coder
-t <threshold>: set classification threshold for symbol coder (def: 0.85)
-T <bw threshold>: set 1 bpp threshold (def: 188)
-r --refine: use refinement (requires -s: lossless)
-O <outfile>: dump thresholded image as PNG
-2: upsample 2x before thresholding
-4: upsample 4x before thresholding
-S: remove images from mixed input and save separately
-j --jpeg-output: write images from mixed input as JPEG
-v: be verbose
or, with non-patched version of jbig2enc

jbig2 -s -p -v `ls *.tiff | sed -n -e "1,+100 p" | tr '\n' ' '` ; pdf.py output > 01.pdf
jbig2 -s -p -v `ls *.tiff | sed -n -e "101,+100 p" | tr '\n' ' '` ; pdf.py output > 02.pdf


and so on...

User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: HELP - Scan Tailor Project --> .pdf

Post by Misty » 25 Oct 2010, 12:07

Very useful idea. It's too bad that patch has not been contributed upstream for use on other platforms.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.

User avatar
dingodog
Posts: 106
Joined: 22 Jul 2010, 18:19
Number of books owned: 1000
Country: on the net
Location: on the net
Contact:

Re: HELP - Scan Tailor Project --> .pdf

Post by dingodog » 25 Oct 2010, 14:17

Even being in Linux environment, it is possible to build .exe for windows, unfortunately, at present, I don't have Mingw or other environment needed for build 32-bit executable for windows and. generally speaking, for cross-compiling

for other linux users, my build including Akryurov patch adding -P switch, can be used regardless to eventually warning to libpng informations

thessalonica-pdf.py is needed, since naming pattern of files has changed in this version
http://dokupuppylinux.co.cc/programs:encoders

User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: HELP - Scan Tailor Project --> .pdf

Post by Misty » 28 Oct 2010, 16:00

Finally caught the resolution bug. I think you probably still have it in Linux too, because it looks like it's a bug in pdf.py that has not been patched yet. I have a hacky workaround but not a fix yet; I've contacted the developer about it.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.

User avatar
dingodog
Posts: 106
Joined: 22 Jul 2010, 18:19
Number of books owned: 1000
Country: on the net
Location: on the net
Contact:

Re: HELP - Scan Tailor Project --> .pdf

Post by dingodog » 28 Oct 2010, 16:13

Latest version

*Jbig2enc+ akrykukov patch*
- http://dokupuppylinux.co.cc/programs:encoders

has 2 patches:

- fix DPI resolution keeping
- fix naming pattern

As far as I remember (at least for Linux) incorrect DPI resolution keeping, was a jbig2enc BUG, not in PDF.py

User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: HELP - Scan Tailor Project --> .pdf

Post by Misty » 28 Oct 2010, 16:45

Can you convert a page to PDF using it and post it? I'm interested to see whether the resolution is correct or not.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.

User avatar
dingodog
Posts: 106
Joined: 22 Jul 2010, 18:19
Number of books owned: 1000
Country: on the net
Location: on the net
Contact:

Re: HELP - Scan Tailor Project --> .pdf

Post by dingodog » 28 Oct 2010, 17:54

original image (scanned at 300 DPI, 17 MB)
resulting B/W pdf (66KB)

Code: Select all

jbig2 -T 155 -s -p -v *.png ; thessalonica-pdf.py *.jbig2 > out.pdf

User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: HELP - Scan Tailor Project --> .pdf

Post by Misty » 29 Oct 2010, 09:17

It looks like that is the wrong resolution, then. According to Acrobat, the page is 35 by 48 inches - surely not correct. Your original shows that it's 8.5x11.

That confirms my belief that pdf.py is producing incorrect PDFs on all platforms, not just Windows. The problem is in the code which generates PDF metadata; it's feeding size in pixels to data fields that are expecting physical size. Unfortunately, while I know where to apply the code, I don't know how to fix it yet; I'm not sure where to extract DPI from in the binary JBIG2 data. I've reported the bug to the author, so hopefully it will be fixed soon.

Also, since I realize now that you're the maintainer of the DokuPuppyLinux page - please submit your source patches to the jbig2enc author! They seem very worthwhile to include in the upstream release.

Edit: I believe the ability to OCR these PDFs will also be affected in some software until the bug is fixed.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.

Post Reply