MRC compression + text under images

Convert page images into searchable text. Talk about software, techniques, and new developments here.

Moderator: peterZ

Post Reply
seasalt

MRC compression + text under images

Post by seasalt »

hello -

1) I would like to understand what is "MRC compression", and can I do it manually???.
can anyone help me in "lay" language. would be great.

I ask because Abbyy Finereader EXPRESS mac version does not have this option. whereas the windows ABbyy finreader pro 10 version does. this fact cane from ABbyy support. also next year mac will have pro 10 ABbyy equivalent (per support)


2) in OCR application ABbyy finereader PDF output options are
in windows
a) text under images
b) text and images

in mac (express)
a) preset to text under images

(a) gives larger PDF, but more reliable OCR (apparently - as the reader will always read the text even if the output in PDF is displayed incorrectly)

q1 what is the impact on these two options to the PDF page stream?

thankyou

q1) what is the impact on the PDF page stream of the 2 options in windows?
q2) do these have any impact in c

q1)
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: MRC compression + text under images

Post by Misty »

MRC is "mixed raster content" compression. It works by separating the image to be compressed into layers based on which type of compression is most efficient for that particular type of image.

For instance, images from Scan Tailor of pages processed in "mixed" mode contain two types of image: binarized (pure black/white) text, and 8-bit greyscale or colour images. Binarized text can be compressed extremely efficiently using compression methods designed for that kind of image, like G4 or JBIG2, but takes up a lot more space when compressed using the compression methods designed for 8-bit images. It gets even worse when you're trying to optimize a PDF for small file size - in that case you might want to use lossy compression and/or reduce the resolution of your images, while keeping the resolution of your text high, which is impossible to do when you're compressing it all as one image. In MRC compressed PDFs I created at my previous employer, I've reduced master files from sizes that are often several hundred megabytes to a web-ready delivery PDF at 10-40MB (depending on the number and size of images).

I don't have familiarity with ABBYY's support for MRC compression, I'm afraid. There are several open-source utilities that have been discussed for doing it on these forums, like PDFBeads or my own very simple script PDFMaker, which may or may not work in your workflow.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
seasalt

Re: MRC compression + text under images

Post by seasalt »

thankyou misty
this info was exactly what I was looking for
does your app work on MAC?

do you have a suggested forum where I may go learn more or do you run a forum?
thankyou kindly
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: MRC compression + text under images

Post by Misty »

My script is for Windows only (and may not be maintained any further), but PDFBeads is written in Ruby, and Mac OS X includes all of the Ruby tools you need. I haven't used it, but it seems like it should work. To use it, you'll need to install three gems using RubyGems, and their dependencies - either ImageMagick or GraphicsMagick. I'm not sure if PDFBeads requires ImageMagick, or whether it can work with GraphicsMagick. If you don't have a package manager installed (like Homebrew or MacPorts), and you don't think you'll need one to install other tools, you can use this ImageMagick installer for Mac OS X: https://github.com/maddox/magick-installer Make sure you've installed XCode from your OS disc first.

(If you do think you'll want a package manager, give Homebrew a try. It is truly the octocat's pyjamas.)

Once you've installed ImageMagick using that guide, go to a terminal and run the following command:

Code: Select all

sudo gem install pdfbeads rmagick hpricot
And wait. This will take awhile, but once it finishes it will create the pdfbeads commandline utility. I haven't used it, so you'll have to check its readme for help, sorry!

One caveat: PDFBeads requires jbig2enc to use JBIG2 image compression under Mac OS X, and jbig2enc currently doesn't compile under OS X. Bummer. If/when I can figure out how to get it to build, I'll be contributing it to Homebrew for easy OS X installation. Without jbig2enc, you can only use the less-efficient G4 compression for binarized text.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
seasalt

Re: MRC compression + text under images

Post by seasalt »

thankyou
lupocos

Re: MRC compression + text under images

Post by lupocos »

Hey Misty!
any news about running jbig2enc on Mac?
thanks,

Cosimo
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: MRC compression + text under images

Post by Misty »

I've already let Lupo know privately, but I've got a Homebrew formula working now that will be submitted soon. I'll post an update here when it's available.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
seasalt

Re: MRC compression + text under images

Post by seasalt »

Hi misty - i failed at loading IMAGEMagick - i have a bunch of scripts and I am not technical enough to work out how to get ports thingy to do it stuff... on my MAC

so if you hear of anyone who has a "package" that I can just double click on, would be mighty grand if you could flick me a line.

i also trying other options in meantime
I use PDFclerk to create my outline (bookmarks/table of contents) and indexes (linked to correct pages)
this application is mac only, so not sure if you know it
two options
1)
this application uses a mac utility called coloursync which includes image compression (JPEG) only
but you have ability to add others
-- iff you knew of this utility, as a longshot, I wondered if know of anyone whose added a compression filter to do MRC compression

2) PDF clerk supports applescripts (and ruby and python) too
now I am not technical
but I think your telling PDFbeads is a ruby script
if so should I not be able to integrate it into PDFclerk?

thankyou for your help
cheers

FROM COLOURSYNC UTILITY

Editing Quartz filters
You can modify an existing Quartz filter, so you can use it to customize the color in a file.

To edit a Quartz filter

Open ColorSync Utility, in the Utilities folder in the Applications folder.

Open ColorSync Utility
Click Filters in the toolbar.

Click the triangle beside a filter in the list, or click the Add (+) button to create a new one.

The characteristics that this filter affects are listed below the filter.

To change components of the filter, adjust the settings.

To edit a particular component, click the disclosure triangle to the left of it.

To add a new component, click the downward-pointing triangle to the right of the filter’s name, and then choose the component:

Add Color Management Component: Choose an item from this submenu to add color modifications to your filter. Choose a default profile for files without one.

Add Image Effects Component: Choose an item from this submenu to change the size, bit depth, interpolation, and compression settings for the images in your file.

Add PDF Retouch Component: Choose an item from this submenu to set how monochrome data is encoded, whether images are interpolated, and whether to create PDF/X-3 documents.

Add Domain Information: Choose this item to specify where the filter can be used. The filter can be used in applications, PDF workflows, or the Print dialog.

If a filter’s components are dimmed, that filter is locked and you cannot edit it.

To prevent a filter from being edited accidentally, click the downward-pointing triangle to the right of it, and then choose Lock. To allow it to be edited, choose Unlock.

To edit a filter while viewing how the edited filter modifies a PDF file, choose File > Open. When the PDF file appears, choose “Live Update from Filter Inspector” from the Filter pop-up menu. As you edit the filter, your changes are reflected in the PDF file.

Related Topics
Quartz filter
ColorSync Utility
Post Reply