I was playing around with a few sample pages to get the hang of using Scan Tailor and Acrobat to go from original images to final PDF. When I did OCR of the pages using ClearScan, Acrobat did something pretty scary: Deleted chunks of text from the PDF. Attached are a sample PDF page before and after doing ClearScan OCR. As you can see there are pieces of missing text in the Sample2.pdf. Any idea how this could happen??
Adobe Acrobat deleting parts of page during OCR
Moderator: peterZ
-
- Posts: 18
- Joined: 22 Dec 2011, 20:00
- E-book readers owned: kindle
- Number of books owned: 4000
- Location: Nr. London, UK
Re: Adobe Acrobat deleting parts of page during OCR
I'm not sure what the problem is as I downloaded both files, but reprocessed the first one. Initially I did a straight OCR in Acrobat X Pro and got good recognition, and then I went into the settings and changed to clearscan and got a reasonable recognition (it is all there) but encountered a problem I have had before of weird spacing issues.
This is the first OCR: This is the second with clearscan turned on: Personally I give clearscan a very wide berth as the output text just isn't up to scratch for what I am doing (I don't think it is up to scratch for anything really, as keyword searching is a joke when you have extra spaces thrown in randomly). What I don't get is your missing text.
What version of Acrobat are you using?
This is the first OCR: This is the second with clearscan turned on: Personally I give clearscan a very wide berth as the output text just isn't up to scratch for what I am doing (I don't think it is up to scratch for anything really, as keyword searching is a joke when you have extra spaces thrown in randomly). What I don't get is your missing text.
What version of Acrobat are you using?