Page 1 of 3

how to remove punch holes automatically?

Posted: 07 Nov 2016, 09:14
by Adam32
I am trying to remove punch holes from thousands of scanned documents. I can do this manually but it is too time consuming, so am looking for an automatic solution.

The problem is the punch holes are often in different places on different documents. Some holes are not punched level, where the document was inserted into the puncher at an angle. Another factor is text is also often aligned with the punch holes - see attached images where I have highlighted text in red as an example. For these reasons a basic crop won't work without cutting off information.

Does anyone have any suggestions how I can do this automatically?

Re: how to remove punch holes automatically?

Posted: 07 Nov 2016, 14:01
by L.Willms
How are those scanned pages intended to be used? Just as images, as material for an OCR program?

In the latter case, the OCR program will take care of it since it is not recognizeable as text.

On the other hand, the images of the punch holes seem to have all the same shape and form, so there might be some kind of software which can in a pattern search remove those patterns.

Re: how to remove punch holes automatically?

Posted: 07 Nov 2016, 15:03
by Adam32
Thanks for your reply. The scanned images are assembled into PDF in the form of a searchable image.
L.Willms wrote: the images of the punch holes seem to have all the same shape and form, so there might be some kind of software which can in a pattern search remove those patterns.
That's exactly what I thought regarding pattern search, but have had no luck finding a solution.

Re: how to remove punch holes automatically?

Posted: 07 Nov 2016, 16:46
by BruceG
If you were to OCR without an image as said the punch holes are not seen.
Text without holes.pdf
(3.97 KiB) Downloaded 194 times
By having no image you cannot be 100% certain the OCR has every thing right. I notice the copyright symbol was not picked correctly. OCR programs can be trained so this is not repeated. The size of OCR text is very small.

Re: how to remove punch holes automatically?

Posted: 07 Nov 2016, 19:31
by Adam32
BruceG wrote:If you were to OCR without an image as said the punch holes are not seen.
but I don't want to OCR without an image. If I do that it is even more time consuming as I have to review all the OCR output as it makes a lot of mistakes and if I send the file to someone else and I have not proofed it, it looks very unprofessional.

I just want to make searchable images. As L. Willms suggested, I like the idea best of finding some kind of software which can do a pattern search and remove the punch holes. I have seen some scanners advertised which remove punch holes at the scanning stage, so I guess such automatic software exists.

Re: how to remove punch holes automatically?

Posted: 07 Nov 2016, 20:17
by BruceG
A searchable image also come with errors which cannot be seen until one copy and pastes. If the error is a word to be searched it will never be found. All editing takes time whether text or image. The quality of original and the jpeg is very good so would not expect many errors.

Re: how to remove punch holes automatically?

Posted: 08 Nov 2016, 04:29
by cday
BruceG wrote:A searchable image also come with errors which cannot be seen until one copy and pastes.
The key benefit of searchable image is that the text displayed on screen is always the original text...
BruceG wrote:The quality of [your] original and the jpeg is very good so I would not expect many [OCR recognition] errors.
Your sample page from OmniPage 19 did seem to come out very well...

I think Nuance PaperPort is relatively inexpensive software with the capability to remove punched holes, although I haven't used it myself, possibly there is a trial download available??

Re: how to remove punch holes automatically?

Posted: 08 Nov 2016, 05:31
by BruceG
One of the features of Omnipage is Image Enhancement which has a number of tools, one being 'Punch Hole Remover'. This however would not remover the Punch Holes in this case. Doing OCR did. I expect because the holes looks like the moon on it back it did not conclude them to be Punch Holes so never did anything.

Re: how to remove punch holes automatically?

Posted: 08 Nov 2016, 05:54
by Adam32
BruceG wrote:This however would not remover the Punch Holes in this case. I expect because the holes looks like the moon on it back it did not conclude them to be Punch Holes so never did anything.
That's the exact same problem I have encountered with other software claiming to remove punch holes. There must be a program that can do a basic image search and remove objects with the characteristics of that shape? Do you think some sort of imagemagick script wold work?

Re: how to remove punch holes automatically?

Posted: 08 Nov 2016, 17:00
by BruceG
Do you have Acrobat? The redaction tool may be the answer. Redact to 'no colour'. I do not have Acrobat running on my machine at the moment to check. I am assuming redaction tool will redact the whole doc.