Page 3 of 3

Re: how to remove punch holes automatically?

Posted: 10 Nov 2016, 03:43
by cday
L.Willms wrote:There is a quote error in the above -- I did not suggest this with the moving mask of eight holes...

Besides, there will hardly any image with eight holes. There should be only four, and either at the right or the left edge. Unless somebody had punched the holes for the binder accidentally with the back side of the pages up, and then corrected it with punching on the correct edge...
You are quite correct, when I was writing I was thinking too fast and visualising all the possible punch holes as being on the same page which, of course, isn't the case. More brainstorming than a developed idea...

However, if it is necessary to process pages with less well defined hole positions than the example pages uploaded, some kind of hole identification algorithm would be necessary, and possibly moving a mask with suitable cutouts in a short distance from each edge in turn until the punced holes are located could be one approach: the problem would be easier if there wasn't sometimes text in the left margin. Just brainstorming again.

But that's not where we're at, I must write up my proof-of-concept script for the example pages in the original post...

Re: how to remove punch holes automatically?

Posted: 10 Nov 2016, 06:22
by cday
I'm not a programmer, and I never expected even to use the command line until I got into processing scans, so my proof-of-concept script should be viewed in that light.

The attached ZIP file contains:

The punch hole removal script Convert_script.bat
A copy of the NConvert utility nconvert.exe required to run the script (actually not the current version)
Two JPEG files lh_erase and rh_erase which are the white rectangles placed over the punch holes
A folder Convert
A folder Test pages containing the two example JPEGs in the first post of the thread

To use the script, place the files to be processed in the Convert folder and then double-click on the script: a Windows command window should open, text will flash by, and when it stops the files in the Convert folder should have been processed; to close the command window press any key.

Download the ZIP archive, then extract it to any convenient location, and then check that the above files are all present: if the nconvert.exe file is missing, it will likely have been removed by your security software and it will be necessary to download a copy from

If you encounter any security warnings when double-clicking on the script, you can assume that they are standard Windows alerts for unknown executables, and that it is safe to proceed.

The script uses relative addressing for convenience, to simply the code and allow it to be run anywhere, so the files and folders in the extracted folder must remain in exactly the same relative positions, unless the script is edited.

The script uses the NConvert watermark code to place a copy of the small white rectangle lh-erase over each of the punch hole positions on the left of the page, and a copy of the tall narrow white rectangle rh_erase over the punch hole positions on the right of the page, all white rectangles being placed on all pages to avoid the need to determine whether pages are left or right.

The sizes and positions of the white rectangles placed on the pages are set for the two example pages, and positioned using pixel coordinates relative an origin at the top left of the page. Pages that are significantly different cannot be processed successfully using this basic method, although there would be some scope for adjusting the size of the white rectangles and their exact positions to increase tolerance of slightly different page characteristics, if an increased risk of clipping or obscuring any text in the margins is accepted. The present code assumes there will be no text in the right margin, so that a single white rectangle can be used.

The downloaded script runs this code:

Code: Select all

nconvert -wmfile lh_erase.jpg -wmpos 20 148 -o "Convert\%%.jpg" -overwrite Convert\*.jpg
nconvert -wmfile lh_erase.jpg -wmpos 20 622 -o "Convert\%%.jpg" -overwrite Convert\*.jpg
nconvert -wmfile lh_erase.jpg -wmpos 20 1104 -o "Convert\%%.jpg" -overwrite Convert\*.jpg
nconvert -wmfile lh_erase.jpg -wmpos 20 1582 -o "Convert\%%.jpg" -overwrite Convert\*.jpg
nconvert -wmfile rh_erase.jpg -wmpos 1130 135 -o "Convert\%%.jpg" -overwrite Convert\*.jpg
Inspection of the code may allow the position of the white rectangles placed to be tweaked, and different sized masking white JPEGs could also be substituted for those downloaded.

The fill colour used is pure white taken from a small area of one of the original example images which were black and white, I believe. If necessary, the fill colour could be adjusted to match another page background colour by creating new masking JPEGs. It might be possible to automate production of matching masking rectangles, but that would take more development.

The script works for me on my computer for the two test pages included, but the command line is very unforgiving and I can't guarantee that it will straight off for someone else, although I do think it should be safe to run...

Re: how to remove punch holes automatically?

Posted: 11 Nov 2016, 16:09
by cday
Small update:

I have now downloaded and tested my ZIP on a public library computer running Windows 7, and the script ran successfully and without any warnings.

Also, while the script perfectly covers the punch holes in the two test pages, the coordinates used in the script are not quite optimal as the tool I used to determine them was tricky to use, but I now have developed a better method I can use if required after initial tests.

Re: how to remove punch holes automatically?

Posted: 12 Nov 2016, 14:00
by Adam32
Thanks for your script Cday . There is also another method which I am trialling. I had a little help from snibgo on the ImageMagick forum.

The problem breaks down into:

1. Find the holes.

2. Replace them with the background colour.

To find the holes snibgo suggested thresholding the image to black and white and finding all the black components larger than a certain size, eg 190 pxels. We also know that the holes occur in the margins of the page, so this also helps in pinpointing

once the punch-holes are found, they are filled with one of snibgo scripts e.g. blurFill.bat. You can get the scripts from here:

Re: how to remove punch holes automatically?

Posted: 13 Nov 2016, 06:58
by L.Willms
Adam32 wrote: The problem breaks down into:

1. Find the holes.
Which is helped by you using the standard 4-hole binder where the holes are in distances of 8 cm, and centered between the top and bottom edges of an ISO A4 paper.

Re: how to remove punch holes automatically?

Posted: 13 Nov 2016, 13:42
by cday
Glad you've found what looks like a promising approach to the general case. Punch holes on a scan would typically be black circles, whereas in the example pages in your first post they are little more than black arcs containing many fewer pixels, but given a reasonably clean scan the algorithm you are using which only looks in the areas where punch holes would be expected to be found should avoid the risk of falsely identifying bold text in a heading, for example, as a punch hole.

In the unlikely event that you need to consider again the erasing method used in my example script, I can give the following update:

Although you have a very large number of documents of varying page sizes and formats, it seems likely that most or all of the documents could be placed into one or other of a fairly small number of standard formats that could be processed using the same script. I now have a better method of accurately determining the position of the required erasing images using a freeware Windows software, so that if needed a series of masks for processing each standard format could be produced reasonably easily. And when, as may be the case, a document has no text in the margins producing a suitable mask would be significantly simplified.

The processing of the pages could also be performed using the same Windows software as an alternative to using batch files, if desired, and it should also be possible to process PDF files containing multiple pages directly. However, unlike your proposed method, my method would not automatically cope well with pages that have been poorly scanned or punched.
L.Willms wrote:
Adam32 wrote: The problem breaks down into:

1. Find the holes.
Which is helped by you using the standard 4-hole binder where the holes are in distances of 8 cm, and centered between the top and bottom edges of an ISO A4 paper.
That was a useful insight which could have significantly assisted the development of possible alternative algorithms for processing the pages; I said that I'm not a programmer, but I have in the past conceived and developed many algorithms for purposes not related to image processing, and coded them in assembly language to be run as firmware on microprocessors.

Re: how to remove punch holes automatically?

Posted: 27 Nov 2016, 13:01
by cday
Adam32 evidently developed a satisfactory solution to his punch hole removal problem using ImageMagick as set out in this long thread.

Meanwhile, for my own interest, I developed the method that I posted above, which removes multiple punch holes using multiple obscuring pastes, with a method using a single paste of a transparent mask containing the required obscured areas; creating a suitable mask could take a longer than editing the basic code to place multiple masking areas in the required positions, but the single line of code in the script should run substatially faster when there are a large number of images to process.

Code: Select all

nconvert -wmfile Mask.png -wmpos 0 0 -out jpeg -q 80 -o "Convert\%%.jpg" -overwrite Convert\*.jpg

I have added a term to the code to apply JPEG compression with quality = 80 to the image files produced.

The above methods potentially have a more general application than obscuring punch holes in images, and could be used when it is necessary to obscure any area in a series of images, or alternatively to apply a watermark to pages, but with the limitation that the areas to be obscured must be in similar positions on each image to be processed.

Instructions to test the new method, if anyone wishes to, are similar to those given above for the original method:

o Place the images to be processed in the 'Convert' folder

o Run the script by double-clicking on the script file

o Examine the processed images in the Convert folder

For further information scroll up to my earlier post.