how to remove punch holes automatically?

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.

Moderator: peterZ

cday
Posts: 451
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: how to remove punch holes automatically?

Post by cday »

BruceG wrote:Do you have Acrobat? The redaction tool may be the answer. Redact to 'no colour'. I do not have Acrobat running on my machine at the moment to check. I am assuming redaction tool will redact the whole doc.
Googling 'remove punch holes' gives some hits, but at a quick look not obviously very useful...

A method using the Acrobat redaction tool is given in this link: it requires selecting each hole individually, so not what you're looking for, and I think that in any case it would be quicker to use software like XnView where it is possible to copy some white background to the clipboard, and then paste it repetitively into any selections made from page to page: very quick, but a manual method so not suitable for a large number of pages or documents.

I imagine it would be possible to design an algorithm to automatically detect punch holes of general shape, and then it could be coded, but I think you will probably be lucky to find a suitable ready-made solution...
L.Willms
Posts: 134
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: how to remove punch holes automatically?

Post by L.Willms »

It looks that your scans are all of ISO A4 paper with the punch holes for standard 4-hole binders where the holes are in distances of 8 cm (eight centimeters). And I hope that they are all scanned with the same resolution...
Adam32 wrote: Another factor is text is also often aligned with the punch holes - see attached images where I have highlighted text in red as an example.
If the above assumptions are true, one could build a workflow for an image editing program which puts a white circle large enough to cover slight displacements on all eight positions of the punch holes. I'll try to do that with Picture Window Pro, but have to learn for the first time to save workflows, which I have not done yet.
Adam32 wrote: The problem is the punch holes are often in different places on different documents. Some holes are not punched level, where the document was inserted into the puncher at an angle.
That is a real problem. What percentage of the many thousand of scanned pages is affected?
L.Willms
Posts: 134
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: how to remove punch holes automatically?

Post by L.Willms »

L.Willms wrote: I'll try to do that with Picture Window Pro, but have to learn for the first time to save workflows, which I have not done yet.
But that seems not be possible with Picture Windows Pro - it seems that only global operations on the image as a whole are possible in a batch workflow, and the operations of the "tools" menu, namely "paint" which is necessary for this task are not available.

One might try other image processing programs - Corel Paint Shop Pro is scriptable in Python, and Corel Photo Paint (which is part of the Corel Draw package) in VBscript.

Or build something yourself based on the much more complex Python scripts without a proprietary costly program which Matt Zucker presents on Github (I learned about them by posts to the Programs, Software releases, and more section of this forum):

https://mzucker.github.io/2016/09/20/noteshrink.html
https://mzucker.github.io/2016/10/11/un ... ipses.html
https://mzucker.github.io/2016/08/15/pa ... rping.html

These are much more complicated than the relatively simple task posed by the punch holes which are in fixed distances from each other, but show how to use libraries from a Python script.

As to the images where the original entered the scanner with an angle - those might be rectified by other tools. It might be that "Scan Taylor" might to that.
cday
Posts: 451
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: how to remove punch holes automatically?

Post by cday »

L.Willms wrote:It looks that your scans are all of ISO A4 paper with the punch holes for standard 4-hole binders where the holes are in distances of 8 cm (eight centimeters). And I hope that they are all scanned with the same resolution...
It would be possible to paste the required oversize white circles with the required relative spacing using ImageMagik or NConvert, for example, as a single action in a batch file... That would reduce the problem to determining where to paste them...

If it weren't for the red text in the margin in some images, tall white rectangles each side with fixed spacing could be used, otherwise eight oversize circles in a fixed relative positions would be needed.

The problem of where to paste the oversize white circles might be approached by considering the way autocrop actions evidently scan in from the edges of an image until black pixels are detected (ideally text but in practice sometimes a dark mark on a scan)...

Another approach might be to move a mask with eight holes in it across the image until there are black pixels in each of the holes, that might work well if it could be implemented... To avoid having to scan the mask in two axes, it might be possible to use a mask with eight narrow vertical rectangular cutouts in it.
mera461
Posts: 7
Joined: 27 Dec 2013, 07:08
Number of books owned: 0
Country: Denmark

Re: how to remove punch holes automatically?

Post by mera461 »

Another options would be to train then OpenCV object detection algorithm (http://docs.opencv.org/3.1.0/dc/d88/tut ... scade.html) to match the punch holes. You would probably need a lot of training material, but it sounds as if your already have the raw material ready, and as there is not a lot of variability in the punch hole marks, it is probably relatively easy(?)

You can see how it works for face recognition here: http://docs.opencv.org/3.1.0/db/d28/tut ... ifier.html.
L.Willms
Posts: 134
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: how to remove punch holes automatically?

Post by L.Willms »

cday wrote:
L.Willms wrote:It looks that your scans are all of ISO A4 paper with the punch holes for standard 4-hole binders where the holes are in distances of 8 cm (eight centimeters). And I hope that they are all scanned with the same resolution...
It would be possible to paste the required oversize white circles with the required relative spacing using ImageMagik or NConvert, for example, as a single action in a batch file... That would reduce the problem to determining where to paste them...
[...]
The problem of where to paste the oversize white circles might be approached by considering the way autocrop actions evidently scan in from the edges of an image until black pixels are detected (ideally text but in practice sometimes a dark mark on a scan)...
As said, these are obviously standard four hole binders, where the holes have distances of 8 cm to each other and ~28 mm to the upper and lower edge (measured from the center of the hole). A slight displacement is taken care of by making the white circle large enough, but small enough to avoid covering the edges of the text.

On the front (odd numbered) pages the holes are on the left edge of the paper, for the back sides on the right edge of the scan. It is safe, in my opinion, to cover all eight the potential positions of such a hole.


Links: https://en.wikipedia.org/wiki/ImageMagick
cday
Posts: 451
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: how to remove punch holes automatically?

Post by cday »

Based on your insights that the punch holes are in fixed positions relative to each other, and also in close to fixed positions relative to the page, I think I have now have a proof-of-concept Windows batch file script that remove all the holes on mixed left or right pages in one pass...

The script uses NConvert, a command line utility that is part of the XnView family of software, and pastes four small rectangles [can't see a way to do circles...] over any punch holes on the left, and a single narrow vertical rectangle over any holes on the right of the page, as there should be no text in the right margin.

I think I could probably upload the script complete with the small NConvert utility required to run it in a ZIP archive, so that it could be tested directly, if anyone using Windows is brave enough to try running it...
Adam32
Posts: 29
Joined: 28 Jun 2014, 08:55
Number of books owned: 500
Country: United Kingdom

Re: how to remove punch holes automatically?

Post by Adam32 »

cday wrote:Based on your insights that the punch holes are in fixed positions relative to each other, and also in close to fixed positions relative to the page, I think I have now have a proof-of-concept Windows batch file script that remove all the holes on mixed left or right pages in one pass...

The script uses NConvert, a command line utility that is part of the XnView family of software, and pastes four small rectangles [can't see a way to do circles...] over any punch holes on the left, and a single narrow vertical rectangle over any holes on the right of the page, as there should be no text in the right margin.

I think I could probably upload the script complete with the small NConvert utility required to run it in a ZIP archive, so that it could be tested directly, if anyone using Windows is brave enough to try running it...

That would be great if you could upload the script. What about the fill colour of the rectangles? Is there a way to automatically generate this, by referencing the colour surrounding the punch holes? A bit like the photoshop clone stamp tool? I know in the example images I posted, the colour is white, but on other documents it is a slightly different colour.
Adam32
Posts: 29
Joined: 28 Jun 2014, 08:55
Number of books owned: 500
Country: United Kingdom

Re: how to remove punch holes automatically?

Post by Adam32 »

cday wrote:
L.Willms wrote: Another approach might be to move a mask with eight holes in it across the image until there are black pixels in each of the holes, that might work well if it could be implemented... To avoid having to scan the mask in two axes, it might be possible to use a mask with eight narrow vertical rectangular cutouts in it.
I really like that idea. Are there any examples of this being implemented, as I don't really know where to start.
L.Willms
Posts: 134
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: how to remove punch holes automatically?

Post by L.Willms »

Adam32 wrote:
cday wrote:
L.Willms wrote: Another approach might be to move a mask with eight holes in it across the image until there are black pixels in each of the holes, that might work well if it could be implemented... To avoid having to scan the mask in two axes, it might be possible to use a mask with eight narrow vertical rectangular cutouts in it.
I really like that idea. Are there any examples of this being implemented, as I don't really know where to start.
There is a quote error in the above -- I did not suggest this with the moving mask of eight holes...

Besides, there will hardly any image with eight holes. There should be only four, and either at the right or the left edge. Unless somebody had punched the holes for the binder accidentally with the back side of the pages up, and then corrected it with punching on the correct edge...
Post Reply