PDFBeads — Convert Scanned Images to a Single PDF File

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.

Moderator: peterZ

Lazy_Kent
Posts: 37
Joined: 26 Oct 2010, 10:06
Number of books owned: 0
Location: Moscow

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Post by Lazy_Kent »

Lazy_Kent, any ideas?
It works well in openSUSE, rubygems-1.5.0.

Code: Select all

% for i in *.tif ; do cuneiform -l ruseng -f hocr -o ${i%tif}html $i ; done
Cuneiform for Linux 1.1.0
Cuneiform for Linux 1.1.0
Cuneiform for Linux 1.1.0
Cuneiform for Linux 1.1.0
Cuneiform for Linux 1.1.0
Cuneiform for Linux 1.1.0
Cuneiform for Linux 1.1.0
Cuneiform for Linux 1.1.0
Cuneiform for Linux 1.1.0
Cuneiform for Linux 1.1.0
Cuneiform for Linux 1.1.0
Cuneiform for Linux 1.1.0
Cuneiform for Linux 1.1.0
Cuneiform for Linux 1.1.0
Cuneiform for Linux 1.1.0
PUMA_XFinalrecognition failed.
Cuneiform for Linux 1.1.0
% pdfbeads -b JP2 *.tif > out_pdfbeads.pdf
Prepared data for processing 001.tif
Prepared data for processing 002.tif
Prepared data for processing 003.tif
Prepared data for processing 004.tif
Prepared data for processing 005.tif
Prepared data for processing 006.tif
Prepared data for processing 007.tif
Prepared data for processing 008.tif
Prepared data for processing 009.tif
Prepared data for processing 010.tif
Prepared data for processing 011.tif
Prepared data for processing 012.tif
Prepared data for processing 013.tif
Prepared data for processing 014.tif
Prepared data for processing 015.tif
Prepared data for processing 016.tif
JBIG2 compression complete. pages:15 symbols:7386 log2:13
JBIG2 compression complete. pages:1 symbols:294 log2:9
Processed 001.tif
  Added background image from 001.bg.jp2
Processed 002.tif
Processed 003.tif
Processed 004.tif
Processed 005.tif
Processed 006.tif
Processed 007.tif
Processed 008.tif
Processed 009.tif
Processed 010.tif
Processed 011.tif
Processed 012.tif
Processed 013.tif
Processed 014.tif
Processed 015.tif
Processed 016.tif
%
La_Tristesse, report to PDFBeads bug-tracker: http://rubyforge.org/tracker/?func=brow ... atid=37737
La_Tristesse
Posts: 11
Joined: 18 Jun 2011, 21:47

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Post by La_Tristesse »

After further investigation I discovered that tesseract must be responsible for that error. I did run pdfbeads on every single tif-file and monitored the terminal output. It turns out that some files did have ancient greek characters which tesseract recognized as german umlauts. I'm not exactly sure if this is the final reason. I will try to build cuneiform for mac os x but I thought that the latest version was not able to produce hocr output?

Off-topic: I also discovered that annotating optimized pdf pdfbeads versions on Mac OS X Lion (10.7) with the built-in Application "Preview" blows the pdfs up again in it file size. From 3 MB to 65 MB! While Adobe Acrobat keeps the small file size. Does someone know the reason for that? So I would be bound to adobes propriety reader.
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Post by Misty »

While I don't know for sure, my guess would be that Preview probably recompressed the pages and did something much less efficient than PDFBeads. Having to use Acrobat would kind of suck, so I'd suggest looking for 3rd party PDF tools that let you add annotations without the bloat. There must be something freeware.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
La_Tristesse
Posts: 11
Joined: 18 Jun 2011, 21:47

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Post by La_Tristesse »

Misty wrote:There must be something freeware.
Sadly there is no open source or freeware solution which embed the annotations inside the pdf without destroying the work of pdfbeads. I already tried Skim which is the only one really capable of annotation but it's framework isn't supported by many Applications. Since I'm using GoodReader http://www.goodiware.com/gr-man-view-pdf.html#annots for iPad to read/annotate my papers I'm bound to Adobe Acrobat Reader as it seems ...
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Post by Misty »

Windows user needed - I have a test version of pdfbeads available that I need a Windows user to try out.

Requirements:
ImageMagick installer from here: http://rubyforge.org/frs/download.php/6 ... 6-8-Q8.zip
Don't worry about installing the gem - I took care of that for you. But you must install ImageMagick itself using the .exe file from that installer, and choose the "update executable search path" option.

jbig2enc from here: http://soft.rubypdf.com/software/window ... -jbig2-exe
This is optional, but you won't gain access to the superior JBIG2 compression without using this. You should place this somewhere that's accessible from your PATH environment variable. If you don't know what that means, then put jbig2.exe in c:\windows

pdfbeads is a commandline program, so you should run it from cmd.exe. Basic usage is something like this

Code: Select all

pdfbeads *.tif > mybook.pdf
Attachments
pdfbeads.exe.zip
(3.65 MiB) Downloaded 757 times
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
Anonymous2
Posts: 97
Joined: 18 Oct 2011, 16:05

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Post by Anonymous2 »

Misty, I know this is quite an old post, but could you outline how you made this portable executable?
maciejr
Posts: 1
Joined: 20 Jul 2012, 04:25
E-book readers owned: iPad3
Number of books owned: 1000
Country: Poland

Re: PDFBeads — Convert Scanned Images to a Single PDF File

Post by maciejr »

Misty wrote:Windows user needed - I have a test version of pdfbeads available that I need a Windows user to try out.
Thanks! :D I photographed a book, processed the images with ScanTailor and I struggled to convert the results to PDF. This bundle is very helpful. BIG THANKS!
Misty wrote:Requirements:
ImageMagick installer from here: http://rubyforge.org/frs/download.php/6 ... 6-8-Q8.zip
Don't worry about installing the gem - I took care of that for you. But you must install ImageMagick itself using the .exe file from that installer, and choose the "update executable search path" option.
I did not see this option in the setup program. :? I modified the Path variable myself and it worked. :D

I got several errors about missing dictionaries in the TIF files, but they do not seem to affect the resulting PDF. :)

The files come out well compressed. They open well in GoodReader on iPad3 and the annotations made in that program do not seem to explode the PDF file (as some user reported for the Preview application in MacOSX). :)

Two minor problems: file name handling seems to be restricted to
  • :arrow: the current directory only, and
    :arrow: no more than one dot (period) in the file name.
For an example of the first problem, if I am in the directory f:\books and my TIF files are in f:\books\a, then neither neither of these work:

Code: Select all

pdfbeads a\*.tif -o a.pdf
pdfbeads a/*.tif -o a.pdf
I get the error:

Code: Select all

pdfbeads: no pages to process
I did not try absolute paths.

The second problem (with dots) is that an input file named, say, a.b.c.tif would be matched by neither *.tif nor a.b.*.tif.
Post Reply