Preprocessing RAW images for Scantailor

Share your software workflow. Write up your tips and tricks on how to scan, digitize, OCR, and bind ebooks.

Moderator: peterZ

abmartin
Posts: 79
Joined: 15 Sep 2010, 15:33
Number of books owned: 2000
Country: USA
Location: Ohio

Re: Preprocessing RAW images for Scantailor

Post by abmartin »

A quick read of the imagemagick manual has shown me how to get seperate numbers for RGB vlues which I could then use in a script. Now, I just need to figure out the formula to adjust to a target color. That color adjustment could then be applied to all the others. It might be possible to do this entirely in Imagemagick! It would be nice to get rid of a dependency.

To get RGB results from the cropped image:
$convert color-out.png -resize 1x1 -format "%[fx:int(255*p{10,10}.r)],%[fx:int(255*p{10,10}.g)],%[fx:int(255*p{10,10}.b)]" info:
102,100,104

To get individual colors, obviously, just remove the ones we don't want...
$convert colorout.png -resize 1x1 -format "%[fx:int(255*p{10,10}.r)]" info:
102

This should get us closer to the dream of a no-human-input pre-processing solution!
pablitoclavito
Posts: 39
Joined: 12 Sep 2012, 16:54
E-book readers owned: Iliad
Number of books owned: 200
Country: Spain

Re: Preprocessing RAW images for Scantailor

Post by pablitoclavito »

Wow! Amazing ideas, I wish I had the knowledge to do those things...

I was wondering if you could add another step before going to scantailor: the cropping of the photos. I read somewhere in the forum this could be done with barcodes, something like this but in your photo
image.jpg
image.jpg (78.87 KiB) Viewed 10899 times
sorry about the quality of the drawing :)

with those barcodes stuck in the glass, you could position them depending on the dimensions of the text, or depending on what you want to keep (sometimes I don't want the headers or footers)

It would be really great to have another step done automatically and very precisely.
Thanks for your work!
abmartin
Posts: 79
Joined: 15 Sep 2010, 15:33
Number of books owned: 2000
Country: USA
Location: Ohio

Re: Preprocessing RAW images for Scantailor

Post by abmartin »

Pablitoclavito,

ppmunwarp does automatically do some cropping for you based on the size of the calibration image. This eliminates issues with scantailor detecting the gutter, and will get reasonably close to the outside of the image. If you look at the final image I posted, that basic crop has already been done. I haven't had any problem yet with scantailor. The only annoying thing is that I need to have different sizes of calibration images if I want the cropping step to be done more accurately. (I have made 5 sizes, which seem to cover basically any situation I have)

It is also beyond my abilities to add any sort of detection-based cropping. For me, ppmunwarp's crop is "good enough."

I too find it pretty impressive what programmers can achieve! I'm stuck just using other people's programs. (And have just started learning some scripting, which is really just using things in certain orders)
abmartin
Posts: 79
Joined: 15 Sep 2010, 15:33
Number of books owned: 2000
Country: USA
Location: Ohio

Re: Preprocessing RAW images for Scantailor

Post by abmartin »

I think I have figured out how to do gray cards with imagemagick. I haven't done any real testing beyond proof-of-concept at this point. I'll compare the results that this gets with UFRaw as soon as I can. (Probably when I test the new auto-detection of ppi!) I prefer using imagemagick, because a user isn't tied to formats accepted by UFRaw. (And, for that matter, even photographing in RAW isn't required anymore)

I haven't tried to integrate this into the updated script thus far. I see that it is much improved and better organized now, in particular adding the global variables for config. I'll try to integrate this soon. I'll also take that opportunity to fix the rambling comments.

Code: Select all

#!/bin/bash

###User Configuration###

#Input File Extension
ext=CRW
#True RGB values for gray card
red=128
green=128
blue=128
#Crop Size
size=500


###Color Correction###

#Convert RAW file extension and crop an area in the center of calibration image
convert color.$ext -gravity center -crop "$size"x"$size"+0+0 color-test.png

#Determine average colors in cropped area
sourcered=$(convert color-test.png -resize 1x1 -format "%[fx:int(255*p{10,10}.r)]" info:)
sourcegreen=$(convert color-test.png -resize 1x1 -format "%[fx:int(255*p{10,10}.g)]" info:)
sourceblue=$(convert color-test.png -resize 1x1 -format "%[fx:int(255*p{10,10}.b)]" info:)

#Calculate necessary adjustments
redadjust="$(echo "scale=10; $red/$sourcered" | bc)"
greenadjust="$(echo "scale=10; $green/$sourcegreen" | bc)"
blueadjust="$(echo "scale=10; $blue/$sourceblue" | bc)"

#Adjust Colors and Output Raw images as ppm files
mogrify -format ppm -verbose  -color-matrix "$redadjust 0 0 0 $greenadjust 0 0 0 $blueadjust" *.$ext
I'm really excited! If we are both right, we may have a solution that requires no user input!
pablitoclavito
Posts: 39
Joined: 12 Sep 2012, 16:54
E-book readers owned: Iliad
Number of books owned: 200
Country: Spain

Re: Preprocessing RAW images for Scantailor

Post by pablitoclavito »

OK, thanks again!
abmartin
Posts: 79
Joined: 15 Sep 2010, 15:33
Number of books owned: 2000
Country: USA
Location: Ohio

Re: Preprocessing RAW images for Scantailor

Post by abmartin »

I do think that some other folks have written barcode detection programs. If you found one that works well, you can always just add that command at the end of the script.


I tested the new autodetection with about a dozen examples. It seems to be right on. Always within 1 ppi of what I manually did, which of course isn't as accurate as the average approach. I'm sold and am ready to comment out the double-check line!! Congrats to royeven!!

I have incorporated the new imagemagick approach into the script. I have also adjusted what I originally wrote to better fit the style that royeven did. With color correction, I have kept the UFRaw approach and have also added a quick no-color-correction section too. In the Global Variables section, I added a "COLOR_CORRECTION_METHOD" option to select between Imagemagick, UFRaw and none. I have found that if I use RAW images, I still prefer the UFRaw approach. (because imagemagick actually has to call UFRaw anyway to read the raw files -- and that seems extremely slow!) When using JPEGs, I use imagemagick, which is remarkably fast.

Be sure to change the TRUE_RED, TRUE_GREEN, and TRUE_BLUE values if using imagemagick to match your card's values. I've been using my Opteka gray card which is 162,162,160. Theoretically, you should be able to do this with almost any color. Also there is a COLOR_CROP_SIZE option used to select an area at the center of the image to determine the average color of the gray card. If it is too large and moves off the edge of the book, that will mess up the average, so adjust if needed. (I think 500 or 250 is probably best depending on camera resolution) I also added a variable for the name of the color calibration image.

Code: Select all

#!/bin/bash

###Scantailor Preprocessor###
#Fixes color with a gray card and geometry with ppmunwarp and a calibration grid
#Dependencies: Imagemagick, ppmunwarp as modified by royeven, Ufraw (only needed for RAW), Ufraw-batch (only needed for RAW)
#Written by abmartin and royeven at www.diybookscanner.org

###Configuration###

# Global "variables"
INPUT_FORMAT="JPG" #Change as appropriate
OUTPUT_FORMAT="ppm"
COLOR_IMAGE="color" # The file name (without extension) of the gray card photo
CALIBRATION_IMAGE="calibration" # The file name (without extension) of the geometry calibration file
CONTROL_IMAGE="check" # The name of the descew control file created with the -m attribute of ppmunwarp
CONTROL_PPI="check_ppi" # The name of the ppi control file created with the -m attribute of ppmunwarp
CALIBRATION_FILE="calibration.bin" # The full file name of the binary calibration data
UNWARP_OPT="-ps 60 -pv 60 -ph 90 -mul 5.08" # Options to ppmunwarp. Change according to your setup. Note: the mul attribute is a multiplier to the pixel-count. Since there are 5.08 calibration points pr. inch (i.e. 2*2.54), we must multiply the pixel count between two adjacent calibration dots with 5.08 to get the PPI
CORRECTED_POSTFIX="_corrected" # the postfix of corrected images. E.g. the file "PAGE01_R.PPM" becomes "PAGE_01_POSTFIX.PPM"
COLOR_CORRECTION_METHOD="imagemagick" #options for imagemagick and ufraw - leave blank if no color correction is desired

#Imagemagick Color Correction Options
#True RGB values of Gray Card
TRUE_RED=162
TRUE_GREEN=162
TRUE_BLUE=160
COLOR_CROP_SIZE=500 #size of the sides of a square crop in the center of the image that contains gray card

# Include current folder in search path for program ppmunwarp
PATH=$PATH:`pwd`

###Color Correction###

##Using Imagemagick to Fix Colors

if [ "$COLOR_CORRECTION_METHOD" == "imagemagick" ];
	then
	#Convert RAW file extension and crop an area in the center of calibration image
	convert $COLOR_IMAGE.$INPUT_FORMAT -gravity center -crop "$COLOR_CROP_SIZE"x"$COLOR_CROP_SIZE"+0+0 color-test.png

	#Determine average colors in cropped area
	SOURCE_RED=$(convert color-test.png -resize 1x1 -format "%[fx:int(255*p{10,10}.r)]" info:)
	SOURCE_GREEN=$(convert color-test.png -resize 1x1 -format "%[fx:int(255*p{10,10}.g)]" info:)
	SOURCE_BLUE=$(convert color-test.png -resize 1x1 -format "%[fx:int(255*p{10,10}.b)]" info:)

	#Calculate necessary adjustments
	RED_ADJUST="$(echo "scale=10; $TRUE_RED/$SOURCE_RED" | bc)"
	GREEN_ADJUST="$(echo "scale=10; $TRUE_GREEN/$SOURCE_GREEN" | bc)"
	BLUE_ADJUST="$(echo "scale=10; $TRUE_BLUE/$SOURCE_BLUE" | bc)"

	#Adjust Colors and Output Raw images as ppm files
	mogrify -format ppm -verbose  -color-matrix "$RED_ADJUST 0 0 0 $GREEN_ADJUST 0 0 0 $BLUE_ADJUST" *.$INPUT_FORMAT
	
	#Remove temporary file
	rm color-test.png
	
##Using UFRaw to Fix Colors

elif [ "$COLOR_CORRECTION_METHOD" == "ufraw" ];

	#1. select an area on your gray card,
	#2. click the eyedropper, which equalizes RGB values
	#3. adjust the exposure control to get RGB values to their ultimate goal.
	#4. Input values in the script

	then
	#Interactive variables
	echo "The color calibration image is being loaded in UFRaw. Enter the following values for color correction"
	ufraw $COLORIMAGE.$INPUT_FORMAT &
	echo "Color temperature?: "
	read UFRAW_TEMP
	echo "Green Value?: "
	read UFRAW_GREEN
	echo "Exposure change?: "
	read UFRAW_EXPOSURE
	
	##Convert all files of INPUT_FORMAT to OUTPUT_FORMAT
	echo -e "\nRunning ufraw-batch, this will take a while..."
	ufraw-batch --out-type=$OUTPUT_FORMAT *.$INPUT_FORMAT #--temperature=$UFRAW_TEMP --green=$UFRAW_GREEN --exposure=$UFRAW_EXPOSURE 


##No Color Correction

else
	echo "No color correction will be done"
	echo "Preparing images for ppmunwarp"
	mogrify -format ppm -verbose *.$INPUT_FORMAT
		
fi

###Geometry Correction###

##Calibration

echo -e "\nCalculating calibration data from image: $CALIBRATION_IMAGE.$OUTPUT_FORMAT"
ppmunwarp $UNWARP_OPT -m "$CONTROL_IMAGE.$OUTPUT_FORMAT" "$CALIBRATION_IMAGE.$OUTPUT_FORMAT" > "$CALIBRATION_FILE"

##Correction

echo -e "\nCorrecting geometry.  This will take some time..."
for i in *.ppm; do
 if [ -e "$i" ]; then
   FILE=`basename "$i" .ppm`
   ppmunwarp $UNWARP_OPT -d calibration.bin "$i" > "$FILE$CORRECTED_POSTFIX.$OUTPUT_FORMAT"
 fi
done
	
##Uncomment this line to visually inspect and measure DPI in GIMP
gimp "$CALIBRATION_IMAGE$CORRECTED_POSTFIX.$OUTPUT_FORMAT" &

##Calculating PPI in software
echo -e "\nCalculating calibration data from image: $CALIBRATION_IMAGE$CORRECTED_POSTFIX.$OUTPUT_FORMAT"
PPI=$(( (ppmunwarp $UNWARP_OPT -m "$CONTROL_PPI.$OUTPUT_FORMAT" "$CALIBRATION_IMAGE$CORRECTED_POSTFIX.$OUTPUT_FORMAT") 1>/dev/null ) 2>&1)
echo "$PPI"
PPI=`echo "$PPI" | grep -o "Average: [0-9\.]*, calculated from [0-9]* of [0-9]* data points. PPI: [0-9]*" | sed -r 's/.*PPI: ([0-9]*).*/\1/g'`
echo "Calculated PPI is: $PPI"
echo "Is this correct? If not, insert corrected value now. If correct, leave empty and press enter"
read PPI_CORRECTED
if [ "$PPI_CORRECTED" != "" ]
then
	PPI=$PPI_CORRECTED
fi
echo "PPI is set to: $PPI"


###Preparing Images for Scantailor###

echo "ImageMagick will now convert the format into one useable by Scantailor"
mogrify -verbose -format tif -density $PPI -units PixelsPerInch -compress lzw *[$CORRECTED_POSTFIX].ppm


###Housekeeping###

#Delete temporary files
rm *.ppm

I think we are there!
pablitoclavito
Posts: 39
Joined: 12 Sep 2012, 16:54
E-book readers owned: Iliad
Number of books owned: 200
Country: Spain

Re: Preprocessing RAW images for Scantailor

Post by pablitoclavito »

Tested with no luck here!
The dpi seems pretty well to me though, but then the process fails...
I have uploaded my calibration photos so you can see if anything is wrong with them. I did 2 tests, so 2 calibrations.
https://mega.co.nz/#!bZsw0LZT!V2bEBwpeJ ... KVPB9qNm6Q
https://mega.co.nz/#!KZMnlYCL!dW2JfzP9g ... H1GLupXI7I

Photos in jpg
I left blank the part for color correction

In first test (calibrationA.jpg uploaded) I got this:

Code: Select all

No color correction will be done
Preparing images for ppmunwarp
calibration.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.895MB 0.570u 0:00.649
calibration.JPG=>calibration.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.190u 0:00.410
IMG_2505.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.398MB 0.580u 0:00.710
IMG_2505.JPG=>IMG_2505.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.180u 0:00.550
IMG_2506.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.285MB 0.550u 0:00.720
IMG_2506.JPG=>IMG_2506.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.180u 0:00.469
IMG_2507.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.349MB 0.560u 0:00.820
IMG_2507.JPG=>IMG_2507.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.180u 0:00.429
IMG_2508.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.34MB 0.540u 0:00.719
IMG_2508.JPG=>IMG_2508.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.180u 0:00.530
IMG_2509.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.319MB 0.530u 0:00.679
IMG_2509.JPG=>IMG_2509.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.200u 0:00.480
IMG_2510.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.28MB 0.540u 0:00.740
IMG_2510.JPG=>IMG_2510.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.270u 0:00.619
IMG_2511.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.293MB 0.540u 0:00.670
IMG_2511.JPG=>IMG_2511.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.180u 0:00.449

Calculating calibration data from image: calibration.ppm
Number of detected points: 8203
Only 28 detected points used for calibration!
Average: 81.690872, calculated from 13 of 25 data points. PPI: 415
!!! Error in ppmunwarp:
!!! Distance too large for extrapolation!

Correcting geometry.  This will take some time...
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!

Calculating calibration data from image: calibration_corrected.ppm
!!! Error in ppmunwarp:
!!! PPM image file 'calibration_corrected.ppm' has wrong preamble!
Calculated PPI is: 
Is this correct? If not, insert corrected value now. If correct, leave empty and press enter

(gimp:15301): Gimp-Widgets-CRITICAL **: gimp_device_info_set_device: assertion `(info->device == NULL && GDK_IS_DEVICE (device)) || (GDK_IS_DEVICE (info->device) && device == NULL)' failed

** (gimp:15301): WARNING **: Invalid borders specified for theme pixmap:
        /usr/share/themes/Lubuntu-default/gtk-2.0/images/null.png,
borders don't fit within the image

** (gimp:15301): WARNING **: Invalid borders specified for theme pixmap:
        /usr/share/themes/Lubuntu-default/gtk-2.0/images/scrollbar_vertical.png,
borders don't fit within the image

PPI is set to: 
ImageMagick will now convert the format into one useable by Scantailor
mogrify.im6: invalid argument for option `-units': -density @ error/mogrify.c/MogrifyImageCommand/4325.
In test 2 (calibrationB.jpg uploaded) I got this:

Code: Select all

 color correction will be done
Preparing images for ppmunwarp
calibration.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.829MB 0.590u 0:00.640
calibration.JPG=>calibration.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.250u 0:00.659
IMG_2497.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.099MB 0.500u 0:01.399
IMG_2497.JPG=>IMG_2497.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.230u 0:00.760
IMG_2498.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.149MB 0.490u 0:00.700
IMG_2498.JPG=>IMG_2498.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.190u 0:00.509
IMG_2499.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.112MB 0.490u 0:00.629
IMG_2499.JPG=>IMG_2499.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.180u 0:00.609
IMG_2500.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.151MB 0.480u 0:00.679
IMG_2500.JPG=>IMG_2500.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.190u 0:00.759
IMG_2501.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.179MB 0.490u 0:00.559
IMG_2501.JPG=>IMG_2501.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.210u 0:00.460
IMG_2502.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.054MB 0.510u 0:00.720
IMG_2502.JPG=>IMG_2502.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.210u 0:00.559

Calculating calibration data from image: calibration.ppm
Number of detected points: 7397
Only 51 detected points used for calibration!
Average: 84.041154, calculated from 28 of 48 data points. PPI: 427
!!! Error in ppmunwarp:
!!! Distance too large for extrapolation!

Correcting geometry.  This will take some time...
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!
!!! Error in ppmunwarp:
!!! Couldn't read deformation map file 'calibration.bin'!

Calculating calibration data from image: calibration_corrected.ppm
!!! Error in ppmunwarp:
!!! PPM image file 'calibration_corrected.ppm' has wrong preamble!
Calculated PPI is: 
Is this correct? If not, insert corrected value now. If correct, leave empty and press enter

(gimp:14982): Gimp-Widgets-CRITICAL **: gimp_device_info_set_device: assertion `(info->device == NULL && GDK_IS_DEVICE (device)) || (GDK_IS_DEVICE (info->device) && device == NULL)' failed

** (gimp:14982): WARNING **: Invalid borders specified for theme pixmap:
        /usr/share/themes/Lubuntu-default/gtk-2.0/images/null.png,
borders don't fit within the image

** (gimp:14982): WARNING **: Invalid borders specified for theme pixmap:
        /usr/share/themes/Lubuntu-default/gtk-2.0/images/scrollbar_vertical.png,
borders don't fit within the image

PPI is set to: 
ImageMagick will now convert the format into one useable by Scantailor
mogrify.im6: invalid argument for option `-units': -density @ error/mogrify.c/MogrifyImageCommand/4325.
In both tests, when gimp opens, it gives this error:
Opening '/blablabla/calibration_corrected.ppm' failed: PNM Image plug-In could not open image

Any ideas?
Thanks for all your effort and improvements.
abmartin
Posts: 79
Joined: 15 Sep 2010, 15:33
Number of books owned: 2000
Country: USA
Location: Ohio

Re: Preprocessing RAW images for Scantailor

Post by abmartin »

I probably should think about adding in some exit codes for failed steps with error messages... I'm not sure how to do that, but the internet probably has the answer somewhere.

What's going wrong there is that ppmunwarp isn't detecting enough calibration points for it do actually run. (The issue is when it says only 28 are being used) Since ppmunwarp isn't doing it's thing, the final stage is churning out errors too, since there isn't even a picture to open. (hence GIMP screaming at you)

To fix that, in this case, get rid of the special options for ppmunwarp. Just leave it like this:
UNWARP_OPT="-mul 5.08"

The other options are specific color options that help royeven's setup be more accurate. I probably should remove those from the script, since those are different for everybody. With them gone, it seems to detect the points correctly from your first image. Those options need to be added in only if there is a problem, and it seems to be working well without it.


A warning though: ppmunwarp crops to the size of the calibration image. It's too small for the book you are doing, since it needs to be a bit bigger than the book itself.
royeven
Posts: 19
Joined: 27 Nov 2012, 19:43
E-book readers owned: ipad
Number of books owned: 0
Country: Norway

Re: Preprocessing RAW images for Scantailor

Post by royeven »

abmartin wrote:I probably should think about adding in some exit codes for failed steps with error messages... I'm not sure how to do that, but the internet probably has the answer somewhere.
Exit codes in bash are easy. You could write something like this:

Code: Select all

if [ condition for exiting the script here ]
then
     exit 1
fi
exit 0 means that the script completed successfully. Any integer besides 0 means it failed. Use a different exit code for different conditions.

pablitoclavito:
Just replace the line

Code: Select all

UNWARP_OPT="-ps 60 -pv 60 -ph 90 -mul 5.08" # Options to ppmunwarp. Change according to your setup. Note: the mul attribute is a multiplier to the pixel-count. Since there are 5.08 calibration points pr. inch (i.e. 2*2.54), we must multiply the pixel count between two adjacent calibration dots with 5.08 to get the PPI
with

Code: Select all

UNWARP_OPT="-mul 5.08" # Options to ppmunwarp. Change according to your setup. Note: the mul attribute is a multiplier to the pixel-count. Since there are 5.08 calibration points pr. inch (i.e. 2*2.54), we must multiply the pixel count between two adjacent calibration dots with 5.08 to get the PPI
and you're good to go. Sorry for leaving that part in the script. It's specific to my (for the time being) crappy lighting setup and should be fine tuned for each individual setup
pablitoclavito
Posts: 39
Joined: 12 Sep 2012, 16:54
E-book readers owned: Iliad
Number of books owned: 200
Country: Spain

Re: Preprocessing RAW images for Scantailor

Post by pablitoclavito »

All worked OK leaving those options out! Thanks!

I did the same 2 previous tests again, and 2 new tests. I have uploaded the results in just one photo (calibration-check corrected-calibration corrected side by side)

Test A
https://mega.co.nz/#!2wAWFJyK!Vaw0-kEPj ... ND9CTQlMvk

The reflection of the light messed things a little, and in the last photo, there is a slight wave in the grid, but there is no text in that area so everything was OK.

Code: Select all

No color correction will be done
Preparing images for ppmunwarp
calibration.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.895MB 0.510u 0:00.570
calibration.JPG=>calibration.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.210u 0:00.429
IMG_2505.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.398MB 0.460u 0:00.510
IMG_2505.JPG=>IMG_2505.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.190u 0:00.469
IMG_2506.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.285MB 0.460u 0:00.559
IMG_2506.JPG=>IMG_2506.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.200u 0:00.410
IMG_2507.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.349MB 0.450u 0:00.510
IMG_2507.JPG=>IMG_2507.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.200u 0:00.500
IMG_2508.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.34MB 0.460u 0:00.530
IMG_2508.JPG=>IMG_2508.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.210u 0:00.320
IMG_2509.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.319MB 0.450u 0:00.509
IMG_2509.JPG=>IMG_2509.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.200u 0:00.530
IMG_2510.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.28MB 0.460u 0:00.480
IMG_2510.JPG=>IMG_2510.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.170u 0:00.559
IMG_2511.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.293MB 0.480u 0:00.719
IMG_2511.JPG=>IMG_2511.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.160u 0:00.370

Calculating calibration data from image: calibration.ppm
Number of detected points: 1733
Only 1629 detected points used for calibration!
Average: 82.203650, calculated from 1568 of 1591 data points. PPI: 418
Deskewed picture size: 3582 x 3175   (77.77% x 91.92%)

Correcting geometry.  This will take some time...

Calculating calibration data from image: calibration_corrected.ppm
Number of detected points: 1739
Only 1632 detected points used for calibration!
Average: 81.405662, calculated from 1577 of 1594 data points. PPI: 414
Deskewed picture size: 3575 x 3169   (99.83% x 99.83%)
Calculated PPI is: 414
Is this correct? If not, insert corrected value now. If correct, leave empty and press enter

PPI is set to: 414
ImageMagick will now convert the format into one useable by Scantailor
calibration_corrected.ppm PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 34.14MB 0.250u 0:00.339
calibration_corrected.ppm=>calibration_corrected.tif PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 17.61MB 0.950u 0:01.500
check_corrected.ppm PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 34.14MB 0.310u 0:00.349
check_corrected.ppm=>check_corrected.tif PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 3.701MB 0.470u 0:00.750
IMG_2505_corrected.ppm PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 34.14MB 0.300u 0:00.400
IMG_2505_corrected.ppm=>IMG_2505_corrected.tif PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 14.22MB 0.900u 0:01.379
IMG_2506_corrected.ppm PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 34.14MB 0.280u 0:00.390
IMG_2506_corrected.ppm=>IMG_2506_corrected.tif PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 13.65MB 0.860u 0:01.199
IMG_2507_corrected.ppm PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 34.14MB 0.300u 0:00.439
IMG_2507_corrected.ppm=>IMG_2507_corrected.tif PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 14.05MB 0.860u 0:01.060
IMG_2508_corrected.ppm PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 34.14MB 0.320u 0:00.560
IMG_2508_corrected.ppm=>IMG_2508_corrected.tif PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 13.92MB 0.850u 0:01.300
IMG_2509_corrected.ppm PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 34.14MB 0.300u 0:00.370
IMG_2509_corrected.ppm=>IMG_2509_corrected.tif PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 13.8MB 0.860u 0:01.050
IMG_2510_corrected.ppm PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 34.14MB 0.300u 0:00.519
IMG_2510_corrected.ppm=>IMG_2510_corrected.tif PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 13.68MB 0.850u 0:01.180
IMG_2511_corrected.ppm PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 34.14MB 0.300u 0:00.410
IMG_2511_corrected.ppm=>IMG_2511_corrected.tif PPM 3583x3176 3583x3176+0+0 8-bit DirectClass 13.8MB 0.860u 0:01.210
Test B
https://mega.co.nz/#!u1pkjKbY!PK9mFR0-R ... owIs69_gkI

In the photo in the middle there is something wrong but the corrected grid is perfect.
Obviously because that area receives more light, although we can't see perfectly in the photo at first. This uneven lighting is something I have to improve, but with my basic setup, that's the area that receives more light...

Code: Select all

No color correction will be done
Preparing images for ppmunwarp
calibration.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.829MB 0.760u 0:00.779
calibration.JPG=>calibration.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.250u 0:00.560
IMG_2497.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.099MB 0.540u 0:00.690
IMG_2497.JPG=>IMG_2497.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.260u 0:00.359
IMG_2498.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.149MB 0.540u 0:00.580
IMG_2498.JPG=>IMG_2498.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.240u 0:00.689
IMG_2499.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.112MB 0.540u 0:00.599
IMG_2499.JPG=>IMG_2499.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.230u 0:00.420
IMG_2500.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.151MB 0.530u 0:00.649
IMG_2500.JPG=>IMG_2500.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.260u 0:00.539
IMG_2501.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.179MB 0.540u 0:00.679
IMG_2501.JPG=>IMG_2501.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.340u 0:00.429
IMG_2502.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.054MB 0.540u 0:00.609
IMG_2502.JPG=>IMG_2502.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.260u 0:00.410

Calculating calibration data from image: calibration.ppm
Number of detected points: 1568
Only 1548 detected points used for calibration!
Average: 82.769399, calculated from 1508 of 1512 data points. PPI: 420
Deskewed picture size: 3642 x 3062   (79.06% x 88.65%)

Correcting geometry.  This will take some time...

Calculating calibration data from image: calibration_corrected.ppm

(gimp:2698): Gimp-Widgets-CRITICAL **: gimp_device_info_set_device: assertion `(info->device == NULL && GDK_IS_DEVICE (device)) || (GDK_IS_DEVICE (info->device) && device == NULL)' failed
Number of detected points: 1564
Only 1548 detected points used for calibration!
Average: 82.767497, calculated from 1512 of 1512 data points. PPI: 420
Deskewed picture size: 3641 x 3061   (99.98% x 100.00%)
Calculated PPI is: 420
Is this correct? If not, insert corrected value now. If correct, leave empty and press enter

** (gimp:2698): WARNING **: Invalid borders specified for theme pixmap:
        /usr/share/themes/Lubuntu-default/gtk-2.0/images/null.png,
borders don't fit within the image

** (gimp:2698): WARNING **: Invalid borders specified for theme pixmap:
        /usr/share/themes/Lubuntu-default/gtk-2.0/images/scrollbar_horizontal.png,
borders don't fit within the image

** (gimp:2698): WARNING **: Invalid borders specified for theme pixmap:
        /usr/share/themes/Lubuntu-default/gtk-2.0/images/null.png,
borders don't fit within the image

** (gimp:2698): WARNING **: Invalid borders specified for theme pixmap:
        /usr/share/themes/Lubuntu-default/gtk-2.0/images/scrollbar_vertical.png,
borders don't fit within the image

** (gimp:2698): WARNING **: Invalid borders specified for theme pixmap:
        /usr/share/themes/Lubuntu-default/gtk-2.0/images/scrollbar_vertical-sel.png,
borders don't fit within the image

** (gimp:2698): WARNING **: Invalid borders specified for theme pixmap:
        /usr/share/themes/Lubuntu-default/gtk-2.0/images/scrollbar_horizontal-sel.png,
borders don't fit within the image

PPI is set to: 420
ImageMagick will now convert the format into one useable by Scantailor
calibration_corrected.ppm PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 33.48MB 0.260u 0:00.329
calibration_corrected.ppm=>calibration_corrected.tif PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 17.26MB 0.930u 0:01.179
check_corrected.ppm PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 33.48MB 0.310u 0:00.429
check_corrected.ppm=>check_corrected.tif PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 3.592MB 0.490u 0:00.699
IMG_2497_corrected.ppm PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 33.48MB 0.340u 0:00.449
IMG_2497_corrected.ppm=>IMG_2497_corrected.tif PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 12.77MB 0.820u 0:01.050
IMG_2498_corrected.ppm PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 33.48MB 0.300u 0:00.500
IMG_2498_corrected.ppm=>IMG_2498_corrected.tif PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 12.89MB 0.830u 0:01.049
IMG_2499_corrected.ppm PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 33.48MB 0.300u 0:00.500
IMG_2499_corrected.ppm=>IMG_2499_corrected.tif PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 12.91MB 0.860u 0:00.990
IMG_2500_corrected.ppm PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 33.48MB 0.280u 0:00.530
IMG_2500_corrected.ppm=>IMG_2500_corrected.tif PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 12.88MB 0.830u 0:00.960
IMG_2501_corrected.ppm PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 33.48MB 0.290u 0:00.530
IMG_2501_corrected.ppm=>IMG_2501_corrected.tif PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 13.06MB 0.850u 0:00.969
IMG_2502_corrected.ppm PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 33.48MB 0.290u 0:00.500
IMG_2502_corrected.ppm=>IMG_2502_corrected.tif PPM 3643x3063 3643x3063+0+0 8-bit DirectClass 12.48MB 0.850u 0:00.969
I don't know what those gimp errors are, but the final photos were OK.

NEW TESTS:
Test 3:
https://mega.co.nz/#!30wkjBgK!X0Nujkgo8 ... ddOKZ9eoeI
(if you want the original calibration image is here:
https://mega.co.nz/#!34J00TaT!PjFaKyuCn ... bG2E3wCRRo)

Nothing wrong in the photos, even with that reflection at the top+fingerprints around
This was the best of the 4 tests I have done.

Code: Select all

No color correction will be done
Preparing images for ppmunwarp
calibration.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.716MB 0.520u 0:00.540
calibration.JPG=>calibration.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.190u 0:00.230
IMG_2515.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 2.956MB 0.470u 0:00.490
IMG_2515.JPG=>IMG_2515.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.200u 0:00.239
IMG_2516.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 2.966MB 0.480u 0:00.480
IMG_2516.JPG=>IMG_2516.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.180u 0:00.370
IMG_2517.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 2.997MB 0.480u 0:00.500
IMG_2517.JPG=>IMG_2517.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.190u 0:00.579

Calculating calibration data from image: calibration.ppm
Number of detected points: 1593
Only 1591 detected points used for calibration!
Average: 77.330805, calculated from 1554 of 1554 data points. PPI: 393
Deskewed picture size: 3394 x 2931   (73.69% x 84.86%)

Correcting geometry.  This will take some time...

Calculating calibration data from image: calibration_corrected.ppm
Number of detected points: 1595
Only 1591 detected points used for calibration!
Average: 77.137898, calculated from 1554 of 1554 data points. PPI: 392
Deskewed picture size: 3392 x 2930   (99.97% x 99.97%)
Calculated PPI is: 392
Is this correct? If not, insert corrected value now. If correct, leave empty and press enter

PPI is set to: 392
ImageMagick will now convert the format into one useable by Scantailor
calibration_corrected.ppm PPM 3395x2932 3395x2932+0+0 8-bit DirectClass 29.86MB 0.200u 0:00.250
calibration_corrected.ppm=>calibration_corrected.tif PPM 3395x2932 3395x2932+0+0 8-bit DirectClass 15.82MB 0.830u 0:01.109
check_corrected.ppm PPM 3395x2932 3395x2932+0+0 8-bit DirectClass 29.86MB 0.260u 0:00.439
check_corrected.ppm=>check_corrected.tif PPM 3395x2932 3395x2932+0+0 8-bit DirectClass 3.358MB 0.430u 0:00.640
IMG_2515_corrected.ppm PPM 3395x2932 3395x2932+0+0 8-bit DirectClass 29.86MB 0.270u 0:00.410
IMG_2515_corrected.ppm=>IMG_2515_corrected.tif PPM 3395x2932 3395x2932+0+0 8-bit DirectClass 11.51MB 0.730u 0:00.989
IMG_2516_corrected.ppm PPM 3395x2932 3395x2932+0+0 8-bit DirectClass 29.86MB 0.270u 0:00.379
IMG_2516_corrected.ppm=>IMG_2516_corrected.tif PPM 3395x2932 3395x2932+0+0 8-bit DirectClass 11.49MB 0.730u 0:01.030
IMG_2517_corrected.ppm PPM 3395x2932 3395x2932+0+0 8-bit DirectClass 29.86MB 0.270u 0:00.400
IMG_2517_corrected.ppm=>IMG_2517_corrected.tif PPM 3395x2932 3395x2932+0+0 8-bit DirectClass 11.57MB 0.770u 0:00.890
Test 4
https://mega.co.nz/#!KlZHkYKR!MeQ32yO9o ... b-IBqk61Ss
(if you want the original calibration image is here:
https://mega.co.nz/#!fkgUAY6B!A8z6jgHhN ... 7lQvjyfxno)

The wavy pattern seen in the last photo did not affect the results because the text is above that. Maybe it is because of the reflection of the own glass side. I could paint that side black... The fingerprints+light at the top didn't mess anything.

Code: Select all

No color correction will be done
Preparing images for ppmunwarp
calibration.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.905MB 0.510u 0:00.559
calibration.JPG=>calibration.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.190u 0:00.359
IMG_2521.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.091MB 0.470u 0:00.510
IMG_2521.JPG=>IMG_2521.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.210u 0:00.419
IMG_2522.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.323MB 0.510u 0:00.550
IMG_2522.JPG=>IMG_2522.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.200u 0:00.410
IMG_2523.JPG JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 3.26MB 0.480u 0:00.530
IMG_2523.JPG=>IMG_2523.ppm JPEG 4608x3456 4608x3456+0+0 8-bit DirectClass 47.78MB 0.210u 0:00.469

Calculating calibration data from image: calibration.ppm
Number of detected points: 1631
Only 1613 detected points used for calibration!
Average: 81.666453, calculated from 1561 of 1575 data points. PPI: 415
Deskewed picture size: 3559 x 3155   (77.27% x 91.32%)

Correcting geometry.  This will take some time...

Calculating calibration data from image: calibration_corrected.ppm
Number of detected points: 1644
Only 1621 detected points used for calibration!
Average: 80.894662, calculated from 1573 of 1583 data points. PPI: 411
Deskewed picture size: 3550 x 3146   (99.76% x 99.74%)
Calculated PPI is: 411
Is this correct? If not, insert corrected value now. If correct, leave empty and press enter

PPI is set to: 411
ImageMagick will now convert the format into one useable by Scantailor
calibration_corrected.ppm PPM 3560x3156 3560x3156+0+0 8-bit DirectClass 33.71MB 0.220u 0:00.300
calibration_corrected.ppm=>calibration_corrected.tif PPM 3560x3156 3560x3156+0+0 8-bit DirectClass 17.53MB 0.940u 0:01.129
check_corrected.ppm PPM 3560x3156 3560x3156+0+0 8-bit DirectClass 33.71MB 0.300u 0:00.509
check_corrected.ppm=>check_corrected.tif PPM 3560x3156 3560x3156+0+0 8-bit DirectClass 3.649MB 0.460u 0:00.760
IMG_2521_corrected.ppm PPM 3560x3156 3560x3156+0+0 8-bit DirectClass 33.71MB 0.300u 0:00.390
IMG_2521_corrected.ppm=>IMG_2521_corrected.tif PPM 3560x3156 3560x3156+0+0 8-bit DirectClass 12.64MB 0.820u 0:01.179
IMG_2522_corrected.ppm PPM 3560x3156 3560x3156+0+0 8-bit DirectClass 33.71MB 0.300u 0:00.429
IMG_2522_corrected.ppm=>IMG_2522_corrected.tif PPM 3560x3156 3560x3156+0+0 8-bit DirectClass 13.72MB 0.860u 0:01.169
IMG_2523_corrected.ppm PPM 3560x3156 3560x3156+0+0 8-bit DirectClass 33.71MB 0.310u 0:00.419
IMG_2523_corrected.ppm=>IMG_2523_corrected.tif PPM 3560x3156 3560x3156+0+0 8-bit DirectClass 13.55MB 0.860u 0:01.059
The dpi was always good in all the tests.

The size of the grid: About that, I did it on purpose that way, because I didn't want the headers and footers. Anyway, It resulted smaller than I expected, because I was wrong in the xa xb values, but now I did a new one. The cropping part is still the part where I think I am going to have more problems, because my setup is the basic one, and I have to be really careful if I want the book in the same place all the time...

Excellent guys! Thank you again!
Last edited by pablitoclavito on 01 May 2013, 14:14, edited 2 times in total.
Post Reply