Acrobat Tips

Discussions, questions, comments, ideas, and your projects having to do with DIY Book Scanner software. This includes the Stereo Data Maker software for the cameras, post-processing software, utilities, OCR packages, and so on.

Moderator: peterZ

User avatar
reggilbert
Posts: 49
Joined: 28 Sep 2010, 19:57
Number of books owned: 3000
Location: Buffalo, New York

Re: Acrobat Tips

Post by reggilbert »

i'm surprised to hear all these reports - i have ocr'd hundreds of books in Acrobat, many with 300+ images of two pages, the last hundred or so books with all scans in grayscale (which could potentially give rise to umpausewhat's neighboring-letter issue), and never experienced anything more than a moderate slowdown in multitasking, and that rarely. my processor has never been a big deal, a q6600 quad core at the moment, 6gb ram, but previously it was a laptop processor and 3 or 4 gb ram. xp, vista oses.

is it possible that the issue here is the source images? four years ago when i started scanning books i switched from jpg to bmp as my scanner output format and the size of the resultant Acrobat files (not the scanner output files) went down by 80+ percent. i don't remember if the binding speed increased. perhaps certain source image formats ocr less efficiently.
User avatar
clemd973
Posts: 121
Joined: 22 Aug 2010, 21:20

Re: Acrobat Tips

Post by clemd973 »

reggilbert wrote:is it possible that the issue here is the source images?
It's quite possible that the source image format is the culprit, considering Scan Tailor outputs .tif files...don't know if that could be changed, but even if it could, don't .tif's hold image quality better? Granted, they are huge files, but I'd rather loose time on the processing end than quality on the output end. I'm VERY satisfied with the output quality of my scans...just wish it were a little faster. Thanks for the input.
User avatar
clemd973
Posts: 121
Joined: 22 Aug 2010, 21:20

Re: Acrobat Tips

Post by clemd973 »

daniel_reetz wrote:That's pretty damning in and of itself. You might consider sending it in to Adobe as a bug report.
I think I'll do just that...although a friend who processes the very same way I do, and who has been doing it much longer, seems to think that's just the way it is. I'll send something off to Adobe soon just for kicks. I'll post any reply they give.
daniel_reetz wrote:So what's the ultimate solution here - if they won't bugfix - a script to submit a few hundred pages at a time?
Other than living with it, I'm sure an Acrobat "Action" can be compiled to process incrementally. I plan to pursue that as well.
User avatar
clemd973
Posts: 121
Joined: 22 Aug 2010, 21:20

Re: Acrobat Tips

Post by clemd973 »

Producing a few books with black on white text, and reading them with iBooks on my iPad. Scanning the covers as well and inserting as the first page in the book/pdf. Noting that when I process that image through Scan Tailor and importing it into the pdf in Acrobat, it causes iBooks to crash. However, when I export the image from Lightroom 3 as a .jpg and then import that into the pdf file in acrobat, it works fine in iBooks. Anyone else run into this?
User avatar
clemd973
Posts: 121
Joined: 22 Aug 2010, 21:20

Re: Acrobat Tips

Post by clemd973 »

clemd973 wrote:OCR and RAM: I'm using a Macbook Pro with 4GB RAM. When performing OCR in Acrobat, I've found that it's better to go in increments of about 100 pages at a time since the OCR process seems to process all the pages at once and holds it in memory rather than one page at a time and releasing it from memory; therefore, going over about 100 pages may end up in using all your RAM, which will then result in an error message and will cancel the process. This really sucks if you were trying to OCR 500 pages, which takes a lot of time, only to get the error message half way or more of the way through and have to start from the beginning again.
I think i've figured this one out...I'll do some tests and hope to update this post by next weekend. Most of the time I was not saving the file to .PDF in Acrobat before running OCR/Clear Scan. At that point, the compiled .tif files are 500MB - 1.5GB each...a HUGE problem both in size and for memory. I theorize that if I save the file as a .PDF prior to running OCR, then the file size will drastically decrease and OCR will be running on a much smaller file, thereby bypassing the need for more than 4GB RAM for any one project, and thus making OCR in one pass for the entire book a possibility. Just a theory at this point...stay tuned!
DamnedOwl

Re: Acrobat Tips

Post by DamnedOwl »

I've had this problem with ClearScan too.

However, as of yesterday I don't seem to have any problems with it anymore; indeed, I've just ClearScanned a 1664 page document without any problems, whereas before, 350 pages was typically the limit for me.

Not entirely certain what has made the difference, but I strongly suspect that it is because I've just installed Service Pack 1 for Windows 7 (I'm running the 64bit Ultimate version of Windows).

I read in the release notes about something that they're calling 'Dynamic Memory', which, in short, seems to be about a more efficient use of system memory. Certainly, I noticed that while a document is being ClearScanned now the system seems to periodically 'shed' a portion of the memory that is in use.

Anyway, I'd be interested to see if anybody else has noticed a similar increase in the threshold when ClearScanning documents.
User avatar
clemd973
Posts: 121
Joined: 22 Aug 2010, 21:20

Re: Acrobat Tips

Post by clemd973 »

DamnedOwl wrote:I've just ClearScanned a 1664 page document without any problems, whereas before, 350 pages was typically the limit for me.

Not entirely certain what has made the difference, but I strongly suspect that it is because I've just installed Service Pack 1 for Windows 7 (I'm running the 64bit Ultimate version of Windows).
WOW!!! Even 350 pages is a problem for me...can't seem to get past 75 without the error message...1664 pgs. blows me a way. I'm running OSX on a MacBook Pro (Acrobat X Pro) and can't seem to break this ceiling of ~75 pages. I'll probably check out the Acrobat forums this weekend. Will follow-up later.
User avatar
clemd973
Posts: 121
Joined: 22 Aug 2010, 21:20

Re: Acrobat Tips

Post by clemd973 »

I changed the properties for importing images into one PDF file to include OCR/Clear Scan in the import process, and it seems to work better. One thing that I found out, though, is that when Clear Scan runs, if it doesn't recognize the font due to some minor alteration, etc., it will create its own font and embed it under the graphic. What this does is explode the file size...consider how many created fonts were added to the 1200 page book I just finished...too many to count, I'll tell you that. I'm very satisfied with the end result, but not so much with the file size. Trying to work on that and see if there are any other options to decrease the file size while still using Acrobat. If anyone uses Acrobat X, run OCR/ClearScan and go to the "font" tab in the document properties and take a look at the number of created fonts. If anyone has a work around for this, please post it here!!!
DamnedOwl

Re: Acrobat Tips

Post by DamnedOwl »

You could try getting hold of a copy of PitStop Pro by Enfocus, as well as a PDF editor such as the one by Infix.

Between them I find that you can edit all the parts of the ClearScanned document that are left as images (with the text embedded invisibly underneath).

This isn't really much of a work around though if there are loads of such instances because it would take far too long.

ClearScan is particularly poor with underlined text I've noticed - especially when the word underlined contains letters that cross the underlining. You can replace the text and underline it with Infix, but again, it takes a very long time to do this if you have lots of pages full of underlined text.

By the way, I mentioned in my previous post about the update that had solved the RAM issue with ClearScan - I'm sorry I didn't make it clearer, but the update was an operating system update (Windows 7 64bit), rather than an update to Acrobat. I notice that you're running a Mac so unless you would have access to a computer running Windows 7 I suppose this isn't much use to you (though perhaps you could create a dual boot system running both Mac OS X and Windows 7?)
User avatar
clemd973
Posts: 121
Joined: 22 Aug 2010, 21:20

Re: Acrobat Tips

Post by clemd973 »

DamnedOwl wrote:You could try getting hold of a copy of PitStop Pro by Enfocus, as well as a PDF editor such as the one by Infix.

Between them I find that you can edit all the parts of the ClearScanned document that are left as images (with the text embedded invisibly underneath).

This isn't really much of a work around though if there are loads of such instances because it would take far too long.

ClearScan is particularly poor with underlined text I've noticed - especially when the word underlined contains letters that cross the underlining. You can replace the text and underline it with Infix, but again, it takes a very long time to do this if you have lots of pages full of underlined text.
Yeah, the sheer number of created fonts would take forever to go through. I'll take a look at those programs, though. You never know what can be gleaned from different sources
DamnedOwl wrote:By the way, I mentioned in my previous post about the update that had solved the RAM issue with ClearScan - I'm sorry I didn't make it clearer, but the update was an operating system update (Windows 7 64bit), rather than an update to Acrobat. I notice that you're running a Mac so unless you would have access to a computer running Windows 7 I suppose this isn't much use to you (though perhaps you could create a dual boot system running both Mac OS X and Windows 7?)
I do have a dual boot system - running XP - but it seems that by incorporating Clear Scan into the initial process has all but eliminated the original problem for me. Thanks for the suggestion. Still working on the work around for file size.
Post Reply