Acrobat Tips

clemd973 · Post by **clemd973** » 28 Dec 2010, 11:33

I thought I'd start this thread for closed-source junkies like myself who use Adobe Acrobat for part of their post-processing. I'm actually a new user and have pulled out some of my hair over some of the issues I've run into. I wish I would have run across a resource like this...but never found one...so why not start one now. Even if you don't use programs like Acrobat, some of the ideas/problems presented here might help you work out scripts and codes to use in your own post-processing. I've just finished my first multicolored-text scan and am very satisfied with the results; although, there still needs to be some tweaking of the final presentation. PLEASE ADD YOUR OWN TIPS/TRICKS, ETC.

clemd973 · Post by **clemd973** » 28 Dec 2010, 11:41

Problem: With pages containing multi-colored text, you can't process in Scan Tailor using the "black and white" setting, for obvious reasons.

Therefore, you must use the "color/grayscale" setting, but the background ends up looking very blotchy. I found that even when manipulating the colors and settings in Lightroom 3 (the program I use for pre-processing the images) once output from ScanTailor using the "color/grayscale" mode, the background was still coming out slightly blotchy and not really clean like I wanted it.

Solution: I researched what could possibly be done in Acrobat and I found that I could use Edit>Preflight to separate the pages into different layers: Image layer, Text layer, and Vector Object layer. Then, in the layer command, I could hide the image layer, which effectively removes/hides the image layer. That layer can even be locked out, i.e., made to never be visible either in viewing in Acrobat or in exporting or printing.

clemd973 · Post by **clemd973** » 18 Jan 2011, 10:32

clemd973 wrote: I researched what could possibly be done in Acrobat and I found that I could use Edit>Preflight to separate the pages into different layers: Image layer, Text layer, and Vector Object layer. Then, in the layer command, I could hide the image layer, which effectively removes/hides the image layer.

It's been a while since my last update, and I've made some changes in my workflow, etc. that addresses the issue in my first post. I had to rework my original colored-text book because of a problem I ran into with separating the document into different layers. I found that when separating into text and image/background layers, while I wanted to be able to separate all text from the background image, that wasn't what was always happening. At times, the Acrobat OCR was not recognizing certain words - mostly at random - and therefore they were remaining on the image layer. Because of this when I hid the background/image layer, some of the text would go with it. I've got an idea on how I can resolve this issue and still use the "layer option," but until I can look into that further, I found another work around. I process my images in Lightroom 3 before sending it to Scan Tailor, and in becoming more familiar with this program, I was able to effectively whiten the background and darken both the black and the colored text. In Scan Tailor, then, using the "color/grayscale" mode to maintain the colored text, the background was able to be further processed by selecting "white margins" and "equalize illumination". There are some minor blotches that remained visible in the final product, but I'm OK with that. It comes up clean and easily readable on my iPad. Moreover, I was able to save the settings in Lightroom 3 as a preset so that I can use it again in the future. What also adds to the ease of processing is the two LCD monitors (scroll to the bottom) I mounted on my scanner that allows me to see what the camera is seeing as I'm working through the scanning process...which then allows me to make any needed adjustments along the way.

clemd973 · Post by **clemd973** » 23 Jan 2011, 18:41

OCR and RAM: I'm using a Macbook Pro with 4GB RAM. When performing OCR in Acrobat, I've found that it's better to go in increments of about 100 pages at a time since the OCR process seems to process all the pages at once and holds it in memory rather than one page at a time and releasing it from memory; therefore, going over about 100 pages may end up in using all your RAM, which will then result in an error message and will cancel the process. This really sucks if you were trying to OCR 500 pages, which takes a lot of time, only to get the error message half way or more of the way through and have to start from the beginning again.

Post by **daniel_reetz** » 23 Jan 2011, 23:57

I know I've seen similar reports around the forums -- a lot of people doing 100 or 200 pages at a time, and then binding the results. Does the mac platform have some kind of profiler that would confirm this behavior?

Seems a shame that such nice software has problems like this in the year 2011, but since we're really pushing the limits of technology, it's in a way unsurprising.

Thanks for keeping track of this stuff, Clemd973.

clemd973 · Post by **clemd973** » 25 Jan 2011, 06:36

daniel_reetz wrote:Does the mac platform have some kind of profiler that would confirm this behavior? Seems a shame that such nice software has problems like this in the year 2011, but since we're really pushing the limits of technology, it's in a way unsurprising.

To be honest, I'm not really sure. I'll look into it and post back.

umpausewhat · Post by **umpausewhat** » 26 Jan 2011, 21:34

I don't know how exactly Acrobat uses Ram, but I've found its ocr performance depends a lot on how clean the text is. This isn't just an image quality issue, but sometimes an issue with the printed material itself. I have mass market paperbacks in which the ink of a printed letter commonly touches the next letter. Acrobat ocr does not like this. I've crashed the ocr a few times on these types of texts. To make matters more frustrating, in these scenarios Acrobat's "clearscan" ocr doesn't end up making the file smaller (sometimes the process increases the file size). I take it that this is the case because every time in runs across multiple touching letters, it has to come up with a new custom vectorized font and as those fonts multiply, the ocr process gets cumbersome and file size bigger. Maybe this is where Ram comes in, if you are using clearscan--too many custom fonts multiplying.

When a printed text is clean and all the letters are separate, Acrobat's ocr seems to be able to handle the big books without much trouble. I don't think the RAM is too much of an issue here--when I look at my memory usage during the ocr process (using Task Manager), it tends to remain pretty constant; I've got the same amount of RAM mentioned above (4 GB). But I'm not a computer expert, so forgive oversights in any of the above.

clemd973 · Post by **clemd973** » 27 Jan 2011, 01:14

Dan, I hope this is what you were looking for. I used the Activity Monitor to assess RAM usage during the OCR process. I've got 4GB installed, and it seems as the process went on that number began to fall. I'm in no way a Mac or Adobe expert, but I know that when I include too many pages at one time to OCR with Clear Scan, it tells me I've run out of memory. When I decrease that number to about 100 - 125 pages at a time, it works fine. I think the images below help illustrate what's going on. Pay attention to what's circled in red. I've got a friend writing a script to document more accurately. I'll show those results when I get them. Eating the RAM really sucks, but as long as there's a work around, I'm OK with it. So far, I'm on book #5. I'll post some pics of my work soon...might have to start a new thread for that.

: The beginning of the 127 page OCR process with Clear Scan.

: Page 45 of 127.

: Page 78 of 127.

: Page 114 of 127.

: Page 126 of 127.

Post by **daniel_reetz** » 27 Jan 2011, 11:06

That's pretty damning in and of itself. You might consider sending it in to Adobe as a bug report.

So what's the ultimate solution here - if they won't bugfix - a script to submit a few hundred pages at a time?

rob · Post by **rob** » 27 Jan 2011, 12:04

Clearscan is an incredible memory hog. I have 2 GB on my Mac, and converting to Clearscan died after a few hundred pages (I don't remember the exact figure...)

DIY Book Scanner

Acrobat Tips

Acrobat Tips

Re: Acrobat Tips

Re: Acrobat Tips

Re: Acrobat Tips

Re: Acrobat Tips

Re: Acrobat Tips

Re: Acrobat Tips

Re: Acrobat Tips

Re: Acrobat Tips

Re: Acrobat Tips