Book Scan Wizard

Discussion about Steve DeVore's Book Scan Wizard, a power-user package to automate scan processing.

Moderator: peterZ

Post Reply
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

Book Scan Wizard

Post by steve1066d »

I decided to develop a new application designed for Book scanning post-processing. Its not quite ready for a beta, but its getting close.

What I wanted was a tool that I could define certain actions to be preformed, like deskewing, correcting keystone distortion, and cropping, and have them automatically apply to the entire batch. I wanted a tool that could be run interactively to set up the job, then having the option to run the actual full processing without user intervention. So I decided to come up with a new tool, which I’m calling BSW (Book Scan Wizard) http://bookscanwizard.sourceforge.net. I am releasing it as open source, under the GPL license.

Its a bit different animal than Scan Tailor in that you define just what you want done to the pages. So while Scan Tailor will try to figure out the margins by examining the pages, with BSW you click on the image corners and add a crop operation.

This works on the premise that the book scanner keeps the pages more or less in the same position from one scan to the next, so that once that operations are defined (with some separate configuration of the left and right pages), it can be applied to the remainder of pages. The goal is to be able to set up the configuration of an entire book in less than 5 minutes, and be able to set up a bunch of books and convert them all without any user intervention.

Features


Optionally works with a separate left and right images folder:
This will match left & right images by the timestamp of the images, but will use the last images from the two directories to sync the remainder. That way, even if the cameras do not have their time synchronized, it can still match up the images correctly. If there is an image that doesn't have a matching image, it will get flagged and put at the end of the list. So if a camera doesn't fire, or if you take some test shots, it won't screw up the placement of the remaining images.

Rotating, fixing keystone distortion (perspective):
This is done by bringing up an image of the page, and selecting 4 corners that should be straightened out to a rectangle. It could also crop the image using the same selected corners. Or if just the rotation needs to be fixed it can be specified by clicking to points that should be horizontal. Barrel or pincushion distortion can also be corrected.

Performs basic color corrections such as "auto levels", "levels", and gamma adjustments.

Converts to grayscale or black & white.

Calling external scripts:
If in the middle of the processing you wanted to call an ImageMagic script to do something fancy, that can be defined as part of the process. Or if you know Java and want to add a new operation to the program, it can be integrated easily into the program.

Define operations for certain pages:
For any operation, you have the ability to specify what pages the command should be run for. You can choose to do an operation on the left side pages, or on a certain page or range of pages. For example, if you want to make everything black and white but the exception of a few photo pages it in the center of the book, it can be done. Or leave the cover in color, cropped less so that the whole cover is visible, while cropping the internal pages tighter. You can also indicate certain pages that shouldn’t be included in the output.

Optionally estimates the source dpi by examining the focal length from the jpeg metadata
The way this works is you take two pictures, one zoomed a bit out, and another zoomed a bit in, then measure the dpi of those two images. Using that information the program will interpolate to find the source dpi of other images. Assuming you keep the camera at roughly the same distance whenever you scan, and just change the zoom, you only have to do this step once. Accurately indicating the source dpi of an image will help with OCR tasks and will ensure if you print from the scan, it will be about the same size as the original.

Scales the image to the desired dpi.
It will create as an output a scaled version of each page. If your two cameras had two different zoom settings, it can adjust for that and have each page be the same size.

Fast:
Depending on the size of the images and the speed of the computer it will process each one in less than a second. If your computer has multiple cores, it will make use of them.

Will run anywhere that Java runs:
It is written in Java with the JAI toolkit so will run on many platforms, including Windows, Linux, and Mac. Note that the Mac version will run slower, because the JAI toolkit doesn’t have a native library for the Mac.

Easy to rerun the process:
Because the configuration is saved, if a mistake is make in cropping. or if a page was missing in the initial conversion, it is easy to correct it and rerun the process. Also, because resulting tiff files can be easily regenerated, there is no need to hold onto them after creating the final pdfs or other files. Just save your source images and the configuration file, and if you ever have a need, they can be regenerated.

Status of the project
Its pretty much working, but it is still a bit rough, and I don't have a good install process, and not much in the way of documentation. But if you know Java, are comfortable with using svn to download java source code, and using ant or NetBeans to compile the code, feel free to check it out.

http://sourceforge.net/projects/bookscanwizard/

Steve Devore
Steve Devore
BookScanWizard, a flexible book post-processor.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Book Scan Wizard

Post by daniel_reetz »

Steve, this is really great! Do you have a screenshot or two so I can blog it quickly (I plan to get it running tonight).
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

Re: Book Scan Wizard

Post by steve1066d »

Here's a few:
05.jpg
05.jpg (127.81 KiB) Viewed 50696 times
01.jpg
02.jpg
02.jpg (102.36 KiB) Viewed 50696 times
03.jpg
03.jpg (101.53 KiB) Viewed 50696 times
04.jpg
04.jpg (78.24 KiB) Viewed 50696 times
05.jpg
05.jpg (127.81 KiB) Viewed 50696 times
Steve Devore
BookScanWizard, a flexible book post-processor.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Book Scan Wizard

Post by daniel_reetz »

blogged - thanks Steve!
Anonymous1

Re: Book Scan Wizard

Post by Anonymous1 »

YES! I could never figure out how to use Phatch, and it was a real pain to tweak around. If this application supports batch cropping (I love the scripting idea, as I am an avid 'convert' user), it is in my toolkit! I'm also preparing to release a little surprise within the next few weeks ;)
matt

Re: Book Scan Wizard

Post by matt »

Looks really great, Steve!

I've grabbed the code from svn and am trying to build using ant (on Mac OSX), but am stuck at this point

Code: Select all

-unavailable-generate-task:
     [echo] Task required to generate JNLP file is missing, probably the library 'JWS Ant Tasks' is missing either from shared folder or from IDE installation.

BUILD FAILED
/Users/matt/bookscanwizard/nbproject/jnlp-impl.xml:67: The following error occurred while executing this line:
/Users/matt/bookscanwizard/nbproject/jnlp-impl.xml:247: No message
I have the Java Advanced Imaging software installed but perhaps in a location the build script doesn't know about? (in the Mac /Developer location).

Matt
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

Re: Book Scan Wizard

Post by steve1066d »

The webstart stuff wasn't working yet so I just removed it from the project.

So go ahead and update and try again.

or it might have built enough for it to work anyways... is there a jar file in the dist directory?
Steve Devore
BookScanWizard, a flexible book post-processor.
matt

Re: Book Scan Wizard

Post by matt »

I did a svn up and the jar now builds and runs fine. Excellent! However it throws an exception when I attempt to load image files: http://cl.ly/420T2m211z2K1e0m3q23 Am I doing anything obviously wrong? Thanks!
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

Re: Book Scan Wizard

Post by steve1066d »

I've got a sample project in /examples/basic... try loading that and see if it works.

I did notice that loading from an absolute path isn't working right now, I'll fix that tomorrow.

So right now it is expecting a path that doesn't start with a /. The path should be relative from where the configuration file is.

Also, instead of the screenshot, I'd really need the logging info from the command line console to figure out more.
Steve Devore
BookScanWizard, a flexible book post-processor.
matt

Re: Book Scan Wizard

Post by matt »

steve1066d wrote:I've got a sample project in /examples/basic... try loading that and see if it works.
Thanks, that got me started. Looks there are a few commands that are necessary to have set. I was able to get the following simple script working -- the "Fix Perspective and Crop" command is fast and awesome!

Code: Select all

SetPreviewScale = 1
LoadImages = test
SetDestination = out
Pages = all
PerspectiveAndCrop =  537,587, 1691,540, 1677,2616, 493,2491
Also, instead of the screenshot, I'd really need the logging info from the command line console to figure out more.
The webstart app wasn't giving any command line output, but I figured out how to run the jar from svn from the command line so I will be able to give you logging info from now on.

I'm sensing that this wiIl end up being a very useful and well-received tool. I'm very excited to see how this project develops... Thanks for your generous efforts so far!

Matt
Post Reply