Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

pdfbeads on Ubuntu 18.10

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.
Post Reply
gareth
Posts: 11
Joined: 18 Jul 2014, 07:56
Number of books owned: 0
Country: UK

pdfbeads on Ubuntu 18.10

Post by gareth » 09 Feb 2019, 07:05

Another version of Ubuntu, another breakage in PDFbeads and dependencies.

Has anyone seen an issue like this in post-processing ScanTailor output in Ubuntu 18.10 (cosmic):

Code: Select all

gareth@comte:/mnt/data/gareth/scans/pcpilot118/out$ pdfbeads EPSON038.tif > tmp.pdf
[DEPRECATION] requiring "RMagick" is deprecated. Use "rmagick" instead
WARNING: Nokogiri was built against LibXML version 2.9.8, but has dynamically loaded 2.9.4
Traceback (most recent call last):
	9: from /usr/local/bin/pdfbeads:23:in `<main>'
	8: from /usr/local/bin/pdfbeads:23:in `load'
	7: from /var/lib/gems/2.5.0/gems/pdfbeads-1.1.1/bin/pdfbeads:220:in `<top (required)>'
	6: from /var/lib/gems/2.5.0/gems/pdfbeads-1.1.1/bin/pdfbeads:220:in `new'
	5: from /var/lib/gems/2.5.0/gems/pdfbeads-1.1.1/lib/pdfbeads/pdfpage.rb:417:in `initialize'
	4: from /var/lib/gems/2.5.0/gems/pdfbeads-1.1.1/lib/pdfbeads/pdfpage.rb:417:in `each'
	3: from /var/lib/gems/2.5.0/gems/pdfbeads-1.1.1/lib/pdfbeads/pdfpage.rb:420:in `block in initialize'
	2: from /var/lib/gems/2.5.0/gems/pdfbeads-1.1.1/lib/pdfbeads/pdfpage.rb:90:in `fillStencilArray'
	1: from /var/lib/gems/2.5.0/gems/pdfbeads-1.1.1/lib/pdfbeads/pdfpage.rb:249:in `processMixed'
/var/lib/gems/2.5.0/gems/rmagick-2.16.0/lib/rmagick_internal.rb:1681:in `opaque': ImageMagick library function failed to return a result. (RuntimeError)
If I run it again, it completes:

Code: Select all

gareth@comte:/mnt/data/gareth/scans/pcpilot118/out$ pdfbeads EPSON038.tif > tmp.pdf
[DEPRECATION] requiring "RMagick" is deprecated. Use "rmagick" instead
WARNING: Nokogiri was built against LibXML version 2.9.8, but has dynamically loaded 2.9.4
This version of ImageMagick doesn't support JPEG2000 compression.
	I'll use JPEG compression instead.
Prepared data for processing EPSON038.tif
JBIG2 compression has been requested, but the encoder is not available.
  I'll use CCITT Group 4 fax compression instead.
Processed EPSON038.tif
  Added background image from EPSON038.bg.jpg
Or does anyone have an alternative post-processing tool to pdfbeads that can do the layer separation and compression without me having to fix up the toolchain every six months...

cheers
Gareth

zbgns
Posts: 37
Joined: 22 Dec 2016, 06:07
E-book readers owned: Tolino, Kindle
Number of books owned: 600
Country: Poland

Re: pdfbeads on Ubuntu 18.10

Post by zbgns » 11 Feb 2019, 05:53

gareth wrote:
09 Feb 2019, 07:05
Another version of Ubuntu, another breakage in PDFbeads and dependencies.
The standard ImageMagick version in Ubuntu’s repositories doesn’t support jpeg2000 compression. You need to compile newer version of ImageMagick to have this (it may be tricky as there are other dependencies that need to be resolved first). Moreover there is no jbig2 compression available by default, and you need to compile the jbig2 encoder from source code before pdfbeads is installed.
gareth wrote:
09 Feb 2019, 07:05

Or does anyone have an alternative post-processing tool to pdfbeads that can do the layer separation and compression without me having to fix up the toolchain every six months...
It depends what functions of pdfbeads you find most important. I do not think there is any free program which provides similar functionality to pdfbeads in terms of separation of color and b&w graphics in DjVu-like way. But there are replacements for other functions (OCR layer, ToC, metadata etc.). I described my approach in this thread: viewtopic.php?f=19&t=3543
It works for me as (better) pdfbeads replacement unless there are no color elements in a book that need to be separated from pages with text. But I’m focusing more on the OCR and the highest possible compression rather than on color pictures so you may have completely other priorities.

gareth
Posts: 11
Joined: 18 Jul 2014, 07:56
Number of books owned: 0
Country: UK

Re: pdfbeads on Ubuntu 18.10

Post by gareth » 11 Feb 2019, 06:48

zbgns wrote:
11 Feb 2019, 05:53
The standard ImageMagick version in Ubuntu’s repositories doesn’t support jpeg2000 compression. You need to compile newer version of ImageMagick to have this (it may be tricky as there are other dependencies that need to be resolved first). Moreover there is no jbig2 compression available by default, and you need to compile the jbig2 encoder from source code before pdfbeads is installed.
Hmm, don't think it's this that is the cause - Ubuntu package for ImageMagick has never had jpeg2000 in my memory and pdfbeads has been falling back to JPEG quite happily for years (iunluding in at least but not limited to 18.04, 17.10, 17.04 and 16.04). The hard crash is new in Ubuntu 18.10.
zbgns wrote:
11 Feb 2019, 05:53
It depends what functions of pdfbeads you find most important. I do not think there is any free program which provides similar functionality to pdfbeads in terms of separation of color and b&w graphics in DjVu-like way. But there are replacements for other functions (OCR layer, ToC, metadata etc.). I described my approach in this thread: viewtopic.php?f=19&t=3543
It works for me as (better) pdfbeads replacement unless there are no color elements in a book that need to be separated from pages with text. But I’m focusing more on the OCR and the highest possible compression rather than on color pictures so you may have completely other priorities.
I'm mainly using it for magazine article scans which I don't care about OCR, but image layers including grayscale/halftone and color alongside black and white text is important. I could just ram it all through an imgtopdf conversion then concatenate I guess.

Thanks for the reply, I appreciate it.

Gareth

zbgns
Posts: 37
Joined: 22 Dec 2016, 06:07
E-book readers owned: Tolino, Kindle
Number of books owned: 600
Country: Poland

Re: pdfbeads on Ubuntu 18.10

Post by zbgns » 11 Feb 2019, 08:14

gareth wrote:
11 Feb 2019, 06:48

Hmm, don't think it's this that is the cause - Ubuntu package for ImageMagick has never had jpeg2000 in my memory and pdfbeads has been falling back to JPEG quite happily for years (iunluding in at least but not limited to 18.04, 17.10, 17.04 and 16.04). The hard crash is new in Ubuntu 18.10.
There were opinions that pdfbeads was somewhat buggy. Now it looks abandoned for very long time so compatibility issues seem to be possible. I guess that there may be also issues with ruby version.
gareth wrote:
11 Feb 2019, 06:48

I'm mainly using it for magazine article scans which I don't care about OCR, but image layers including grayscale/halftone and color alongside black and white text is important.
Well, so you need something like MRC (Mixed raster content) compression, but I'm not aware of any free implementation of this method.
gareth wrote:
11 Feb 2019, 06:48

I could just ram it all through an imgtopdf conversion then concatenate I guess.
Do you mean img2pdf? Concatenating may not be necessary as img2pdf is able to bind number of pictures into one pdf.

BTW, I found a script that is able to convert a djvu document to a pdf with preserving of segmentation. So maybe it is option to create a djvu file first (provided it is able to separate foreground and background and apply proper masks) and after this convert it to pdf. The author claims that the output pdfs are 3 times bigger in size than input djvus (probably due to less effective compression). I haven't tried it yet but maybe this is a sort of a workaround to your problem. Here is the link (Russian): https://www.linux.org.ru/forum/general/13356015

EDIT
The script seems to be based on pdfbeads so it is not a workaround at all. Sorry for the confusion.

Post Reply