Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

how to remove punch holes automatically?

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.
Adam32
Posts: 28
Joined: 28 Jun 2014, 08:55
Number of books owned: 500
Country: United Kingdom

how to remove punch holes automatically?

Post by Adam32 » 07 Nov 2016, 09:14

I am trying to remove punch holes from thousands of scanned documents. I can do this manually but it is too time consuming, so am looking for an automatic solution.

The problem is the punch holes are often in different places on different documents. Some holes are not punched level, where the document was inserted into the puncher at an angle. Another factor is text is also often aligned with the punch holes - see attached images where I have highlighted text in red as an example. For these reasons a basic crop won't work without cutting off information.

Does anyone have any suggestions how I can do this automatically?
Attachments
punch holes_Page_2.jpg
punch holes_Page_1.jpg

L.Willms
Posts: 129
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: how to remove punch holes automatically?

Post by L.Willms » 07 Nov 2016, 14:01

How are those scanned pages intended to be used? Just as images, as material for an OCR program?

In the latter case, the OCR program will take care of it since it is not recognizeable as text.

On the other hand, the images of the punch holes seem to have all the same shape and form, so there might be some kind of software which can in a pattern search remove those patterns.

Adam32
Posts: 28
Joined: 28 Jun 2014, 08:55
Number of books owned: 500
Country: United Kingdom

Re: how to remove punch holes automatically?

Post by Adam32 » 07 Nov 2016, 15:03

Thanks for your reply. The scanned images are assembled into PDF in the form of a searchable image.
L.Willms wrote: the images of the punch holes seem to have all the same shape and form, so there might be some kind of software which can in a pattern search remove those patterns.
That's exactly what I thought regarding pattern search, but have had no luck finding a solution.

BruceG
Posts: 67
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: how to remove punch holes automatically?

Post by BruceG » 07 Nov 2016, 16:46

If you were to OCR without an image as said the punch holes are not seen.
Text without holes.pdf
(3.97 KiB) Downloaded 89 times
By having no image you cannot be 100% certain the OCR has every thing right. I notice the copyright symbol was not picked correctly. OCR programs can be trained so this is not repeated. The size of OCR text is very small.

Adam32
Posts: 28
Joined: 28 Jun 2014, 08:55
Number of books owned: 500
Country: United Kingdom

Re: how to remove punch holes automatically?

Post by Adam32 » 07 Nov 2016, 19:31

BruceG wrote:If you were to OCR without an image as said the punch holes are not seen.
but I don't want to OCR without an image. If I do that it is even more time consuming as I have to review all the OCR output as it makes a lot of mistakes and if I send the file to someone else and I have not proofed it, it looks very unprofessional.

I just want to make searchable images. As L. Willms suggested, I like the idea best of finding some kind of software which can do a pattern search and remove the punch holes. I have seen some scanners advertised which remove punch holes at the scanning stage, so I guess such automatic software exists.

BruceG
Posts: 67
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: how to remove punch holes automatically?

Post by BruceG » 07 Nov 2016, 20:17

A searchable image also come with errors which cannot be seen until one copy and pastes. If the error is a word to be searched it will never be found. All editing takes time whether text or image. The quality of original and the jpeg is very good so would not expect many errors.

cday
Posts: 226
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: how to remove punch holes automatically?

Post by cday » 08 Nov 2016, 04:29

BruceG wrote:A searchable image also come with errors which cannot be seen until one copy and pastes.
The key benefit of searchable image is that the text displayed on screen is always the original text...
BruceG wrote:The quality of [your] original and the jpeg is very good so I would not expect many [OCR recognition] errors.
Your sample page from OmniPage 19 did seem to come out very well...

I think Nuance PaperPort is relatively inexpensive software with the capability to remove punched holes, although I haven't used it myself, possibly there is a trial download available??

BruceG
Posts: 67
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: how to remove punch holes automatically?

Post by BruceG » 08 Nov 2016, 05:31

One of the features of Omnipage is Image Enhancement which has a number of tools, one being 'Punch Hole Remover'. This however would not remover the Punch Holes in this case. Doing OCR did. I expect because the holes looks like the moon on it back it did not conclude them to be Punch Holes so never did anything.

Adam32
Posts: 28
Joined: 28 Jun 2014, 08:55
Number of books owned: 500
Country: United Kingdom

Re: how to remove punch holes automatically?

Post by Adam32 » 08 Nov 2016, 05:54

BruceG wrote:This however would not remover the Punch Holes in this case. I expect because the holes looks like the moon on it back it did not conclude them to be Punch Holes so never did anything.
That's the exact same problem I have encountered with other software claiming to remove punch holes. There must be a program that can do a basic image search and remove objects with the characteristics of that shape? Do you think some sort of imagemagick script wold work?

BruceG
Posts: 67
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: how to remove punch holes automatically?

Post by BruceG » 08 Nov 2016, 17:00

Do you have Acrobat? The redaction tool may be the answer. Redact to 'no colour'. I do not have Acrobat running on my machine at the moment to check. I am assuming redaction tool will redact the whole doc.

Post Reply

Who is online

Users browsing this forum: No registered users and 3 guests