Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.
Post Reply
dtic
Posts: 445
Joined: 06 Mar 2010, 18:03

BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by dtic » 12 Oct 2018, 05:29

BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

https://github.com/nod5/BookGapCheck

dpc
Posts: 286
Joined: 01 Apr 2011, 18:05
Number of books owned: 0
Location: Issaquah, WA

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by dpc » 12 Oct 2018, 13:43

Thanks for posting. It's an interesting tool and could be quite helpful. I first heard about this technique in the video describing the Google linear traveling book scanner in 2012 and thought it could be handy (Google open sources a DIY page scanner). The section of the original YouTube video describing this page number mosaic technique can be viewed here.

At one time I thought about doing something similar but using OCR and fully-automating the process. Before starting a scan you'd define a subrect of the page where the page numbers typically lie (as BookGapCheck does) and after each page is scanned the contents of that subrect (that hopefully include the page number) are saved as a separate small jpg image that would then be passed to a batch command that runs an OCR program and produces a txt file containing just the page number. The contents of that txt file could be read as input to another program that compared its page number with previous results and let the operator of the scanner know if they have missed/duplicated a page. It's also possible to correlate an image file with the actual page number from the book, which could come in handy if you're looking at a folder with several hundred image files for a particular page from the scanned book.

As I was thinking of other clever ways to use this info I believe it was about that time that my lovely wife wanted to know when I was going to get around to painting the house, so I shelved that idea and haven't done anything with it since.

BillGill
Posts: 98
Joined: 18 Dec 2016, 17:13
E-book readers owned: Calibre, FBReader
Number of books owned: 7000
Country: USA

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by BillGill » 13 Oct 2018, 09:29

I'm not sure that using something like that would be worth the effort. Currently I check for gaps/duplicates by going to File Explorer and viewing each page to see if there are any skipped/duplicate pages. If I had to edit each image to get the page number it would take much longer, so there would be a large loss in efficiency.

If it could be automated that might make a difference, but I'm not sure how the page number detection would work.

Bill

dpc
Posts: 286
Joined: 01 Apr 2011, 18:05
Number of books owned: 0
Location: Issaquah, WA

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by dpc » 13 Oct 2018, 13:57

I think you only need to define the subrect of a page that contains the page number once. After you've done that the program assumes the page number is in the same location relative to the edge of the page on subsequent pages. At least I hope that's the case.

The program that I wrote that controls my DSLRs while scanning renames the image files to match the page number (based on an offset that I manually enter) so when I'm scanning I just need to occasionally compare the page number with the name of the image file being written to see if I'm still on track. Also makes it easier to locate the image file associated with a particular page number in a folder containing hundreds of images.

dtic
Posts: 445
Joined: 06 Mar 2010, 18:03

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by dtic » 14 Oct 2018, 12:40

dpc wrote:
12 Oct 2018, 13:43
Thanks for posting. It's an interesting tool and could be quite helpful. I first heard about this technique in the video describing the Google linear traveling book scanner in 2012 and thought it could be handy (Google open sources a DIY page scanner). The section of the original YouTube video describing this page number mosaic technique can be viewed here.
Thanks, I've seen that youtube video but hadn't noticed (or forgot) that detail.
dpc wrote:
12 Oct 2018, 13:43
At one time I thought about doing something similar but using OCR and fully-automating the process.
I considered going that route but in the end liked the image grid/mosiac method better. Someone who wants OCR could pretty easily add code for a step that runs Tesseract on the BookGapCheck output image.
BillGill wrote:
13 Oct 2018, 09:29
If I had to edit each image to get the page number
You only open one single image and draw a rectangle around the pagenumber. The program does the rest.

BillGill
Posts: 98
Joined: 18 Dec 2016, 17:13
E-book readers owned: Calibre, FBReader
Number of books owned: 7000
Country: USA

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by BillGill » Yesterday, 09:24

Is the location of that rectangle based on the image edges? The way I scan I wind up with the images moving around in the camera's field of view. Also I sometimes wind up with the images being in different orientations, at least as seen in Windows Explorer. With at least one of my scanners I wound up with the pages alternating which way was up.

Bill

dtic
Posts: 445
Joined: 06 Mar 2010, 18:03

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by dtic » Yesterday, 14:07

BillGill wrote:
Yesterday, 09:24
Is the location of that rectangle based on the image edges?
Yes. Should work ok if the page numbers are even only roughly at the same x/y position in each photo. Probably easiest if you give it a try and see if it works well with the type of photos you have to work with.

dpc
Posts: 286
Joined: 01 Apr 2011, 18:05
Number of books owned: 0
Location: Issaquah, WA

Re: BookGapCheck: Quickly check if there are gaps or duplicates in a set of book page scan images

Post by dpc » Yesterday, 16:14

My mistake. I thought it based the subrect from the edge of the page and not the edge of the image. Still could be helpful for a number of scanner designs that contain the page to the same area of the platen (i.e. camera frame) across the scan. It doesn't work for situations such as Bill's scanner, as well as pages that have the page number printed in the upper left corner on the left side pages, and the upper right corner on the right side pages. If you were to handle that left/right issue by allowing the user to specify two subrects (one for the left side pages, one of for the right), that might cover Bill's case as well?

Post Reply

Who is online

Users browsing this forum: Bing [Bot] and 1 guest