Cleaning up Background after ClearScan
Posted: 12 Mar 2015, 16:25
Most of the steps in my workflow are absolutely classic, but I'd thought I'd share it as I haven't seen the last step mentioned. The typical kind of book I scan has images, so I run ScanTailor in grayscale.
1. Clean up in Scan Tailor (grayscale because of the images)
2. Assemble
3. ClearScan
4. Separate to Layers
5. PitStop (see below)
6. Optimize
I like how ClearScan leaves the document looking close to the original (acknowledging the risk of mis-recognized characters). One thing I don't like so much is that it leaves images from all the page backgrounds, which makes the file heavier.
I haven't yet found the perfect way to deal with these BG images, but one thing has helped a lot. I'm lucky to have access to an expensive Acrobat plugin at a friend's shop (PitStop Pro).
1. After ClearScan, I separate the PDF to layers (Tools / Print Production / Preflight / Create Separate Layers for vectors, text and images). Then I hide the text layer in order to inspect what images are left.
2. At that stage, I go through the whole document and write down the page ranges (or individual pages) where all images can be zapped (typically, an almost white background).
3. On my friend's machine, I fire PitStop and create an Action List with the following actions:
- Select Layers by Name (equals Images)
- Select Page Range (the ranges identified in step 2)
- AND
- Select Images
- AND
- Remove Selection
Running the Action List zaps all the images from the page ranges.
Although this works to an extent, I am not fully satisfied with this solution because (a) it's not sustainable (I don't own PitStop Pro), (b) pages that have both BG and valid images must be addressed manually, and (c) there has to be a better way of mass-selecting image assets to be zapped. As mentioned on this other post, I am looking for a software that shows all the image assets inside a PDF in a GUI folder-style view for easy selection and removal.
1. Clean up in Scan Tailor (grayscale because of the images)
2. Assemble
3. ClearScan
4. Separate to Layers
5. PitStop (see below)
6. Optimize
I like how ClearScan leaves the document looking close to the original (acknowledging the risk of mis-recognized characters). One thing I don't like so much is that it leaves images from all the page backgrounds, which makes the file heavier.
I haven't yet found the perfect way to deal with these BG images, but one thing has helped a lot. I'm lucky to have access to an expensive Acrobat plugin at a friend's shop (PitStop Pro).
1. After ClearScan, I separate the PDF to layers (Tools / Print Production / Preflight / Create Separate Layers for vectors, text and images). Then I hide the text layer in order to inspect what images are left.
2. At that stage, I go through the whole document and write down the page ranges (or individual pages) where all images can be zapped (typically, an almost white background).
3. On my friend's machine, I fire PitStop and create an Action List with the following actions:
- Select Layers by Name (equals Images)
- Select Page Range (the ranges identified in step 2)
- AND
- Select Images
- AND
- Remove Selection
Running the Action List zaps all the images from the page ranges.
Although this works to an extent, I am not fully satisfied with this solution because (a) it's not sustainable (I don't own PitStop Pro), (b) pages that have both BG and valid images must be addressed manually, and (c) there has to be a better way of mass-selecting image assets to be zapped. As mentioned on this other post, I am looking for a software that shows all the image assets inside a PDF in a GUI folder-style view for easy selection and removal.