BookBuilder

sandreas · Post by **sandreas** » 09 Oct 2014, 08:43

Hello,

in the last weeks i hacked together a little console utility for image postprocessing and pdf generation. There is no guarantee, that it will be developed further, but i would like to show it here, just in case someone might find it useful. Feedback would be great, the more feedback i get, the higher is the possibilty for further development

If someone would like to support development, i could provide some papers that need to be implemented (especially for content detection), my math skills are very limited

Instructions for taking pictures:

Place the book on a dark background
The book should be fully visible
You can use your fingers, to hold the book, but don't place it in the 4 corners of the book (corner-detection is very important for content extraction)
Only white or light pages are supported, if you have colored pages (like a fully red page) this is not possible at the moment

Sample:

Result:

Feature list:

PDF-Generation and invisible Text-Layer embedding (Windows only, on other OS it might work, when tesseract is installed)
Batch-Converting a set of JPG-images
Image-Rotation (which has to be done manually)
Fully-Automatic Book-Edge detection and content extraction (no parameters)
Very simple finger removal (this is really not working well )
Paper-whitening
Very simple dewarp (a more complex dewarp approach is in development, but this task is very time consuming, because content and line dection needs to be done)
Very simple remaining time calculation

Print Help and Options:

Code: Select all

java -jar bookbuilder-0.2.0.jar

Sampe Usage:

Code: Select all

java -jar bookbuilder-0.2.0.jar --input-path="data\images\book" --output-file="data\temp\out.pdf" --embed-ocr-layer --rotation-degrees=180

Download: http://www63.zippyshare.com/v/45431292/file.html

sandreas

Post by **daniel_reetz** » 09 Oct 2014, 09:04

This is pretty cool! Thanks for bringing it here and sharing it. Can I host a copy of it locally?

sandreas · Post by **sandreas** » 09 Oct 2014, 10:14

Sure, if sourcecode would not be so embarrassing, i would share it

Perhaps in the next few weeks , if i cleaned it up

duerig · Post by **duerig** » 09 Oct 2014, 10:53

Hi there!

As part of my laser-scanning work, I've been developing some tools to try to automatically detect and remove both the hands and the background of the book. I think that the hand/finger detection is working pretty well, but the book background detection could use some more work.

Since you mention that you are not too confident about your finger detection code, let me share some links.

Here are two online sources that I found very helpful about using OpenCV (not sure if there is a Java port or not) in order to do hand detection:

http://docs.opencv.org/trunk/doc/py_tut ... projection
http://stackoverflow.com/questions/8593 ... ter-vision

I don't entirely understand the math behind it, but I did implement it and it seems to work well. Here is my source:

https://github.com/duerig/laser-dewarp/ ... ookmask.py
https://github.com/duerig/laser-dewarp/ ... ndmodel.py

Maybe you could share what you have found with regard to finding the corners of a book? That would be very interesting to me. My current technique subtracts the background and then performs a floodfill. And I would like to augment it with something more robust.

sandreas · Post by **sandreas** » 09 Oct 2014, 15:06

@duerig:

Thanks for this useful post. Well, i found 3 different image processing libraries that met my requirements:

OpenCV (there is a java api, but i did not use it because its really big and not easy to use)
OpenImaJ (seemed good, but huge dependencies and kind of slow)
BoofCV (pure java, i chose this one, because it is lightweight, fast and has a nice and clean api. Unfortunately it is not as powerful as the others and in an early development state. But: The developer is very motivated and shares most of my ideas)

For Finger-Detection there are many approaches. The most promising ones i found is the one you posted and another one about adaptive skin detection, but i did not have the time to implement the backprojection part in boofcv. My current approach uses static hsv detection, which works quite ok, but is not perfect. Perhaps i will implement an "Adaptive Skin Detection", which should work much better (OpenCV has this on board, but boofcv is in an early state). I'll check the links later, perhaps i can implement some of openCVs algorithms for Boofcv here.

My algorithm for Page-Detection:

Draw a thin black line around the whole image (if the book isn't fully visible)
Perform an Otsu threshold
Find all contours of the image (as Pointlist)
For all contours as pointlist
- continue, when pointlist.length < max(width, height)
  biggestContour = max(biggestContour, pointlist)

finding the corners:

get biggestContour bounds (rectangle that contains all points of contour)
for all biggestContour.pointlist as point
- if(topLeft == null || point.distance(boundsCenter) - point.distance(boundsTopLeft) < topLeft.distance(boundsCenter)-topLeft.distance(boundTopLeft))
  topLeft = point;
  // like that for topRight, bottomRight, bottomLeft...

Hope it helps... this logic bases on my primitive geometry knowlege, but it seems to work. i think that there are tons of mathematical improvements you could do here, but it won't invest more time here, till i got this dewarp thing working.

sandreas · Post by **sandreas** » 13 Oct 2014, 11:56

Major Progress in Text-Line-Detection

At the moment i am working on a robust but fast line detection algorithm to perform a text-dewarp. This is far from perfect, but i would like to show the current results:

The full line-segmentation progress (including painting the lines) takes less than 5 seconds in a single thread for a 4000x3000px image on Core I5 Notebook! Next step will be polishing the edges and performing the dewarp.

@duerig:
Your Background detection algorithm sounds interesting. Is this done in OpenCV too? Because i would need it as raw code algorithm.

I hope i can get this done, seems to be a complex task which takes a lot of time

duerig · Post by **duerig** » 13 Oct 2014, 12:12

I am using opencv for everything, but the background detection doesn't use any special opencv features.

Create a new image by subtracting the background picture from the scan.
Add a 1x1 black border around it.
Flood fill with low fuzz threshold to turn the border black
Turn everything else white. You can use a threshold of 1 to do this.

If you want to integrate hand detection, turn the hand region black before the flood fill.

What dewarping algorithm will you use? I have had the most success implementing the arc based coarse dewarping one I link to in the laser-dewarp.py script. I am working on a way to use laser photos alternated with normal scans to dewarp. So instead of looking for text lines, I look for laser lines. And other than that, I use the same kinds of dewarping algorithms as you would.

Edit: Here is a link to the arc dewarping method: http://users.iit.demokritos.gr/~bgat/3337a209.pdf

They use it as the first 'coarse' step of dewarping followed by a 'fine' dewarping which I haven't looked at much.

duerig · Post by **duerig** » 13 Oct 2014, 12:17

BTW, I might get a chance to try your corner detection algorithm tonight. The two ways I am looking at are your corner algorithm or using fingers to define page edges. This assumes that the fingers will be holding the book by the left and right page edges instead of the top or bottom.

Either way, I will assume that the book is properly aligned to a line laser guide and then chop off the left and right edges to form my final book mask.

sandreas · Post by **sandreas** » 14 Oct 2014, 15:32

Thank you for the link to the paper. I remember that i have come across it some time ago, but i did not find it any more... Furthermore I will take a look at your python approach for background substraction later, sounds interesting...

To answer your question - I will use this dewarp for a first test: https://github.com/cxcxcxcx/imgwarp-js/

I ported it from JS to Java (OpenCV already has it i think!) with some optimizations and it is quite fast and reliable. Most important advantage is, that you only have to specify source and destination points as warp instruction and the warp grid does the rest. That seems to be a quite useable approach for Textline-Dewarping i think, but i have to test, if it is really working. My tests with bigger warp distances failed, but i think text-line dewarping as refinement should work.

My plan is to use the coordinates of the text lines as source point and warp them to the y coordinate of the first character of the line. That should straighten things up. An x-coordinate correction will be done later, after recalculating the line curviness

Without having looked at the arc dewarp method, you posted above, i already did a two phase dewarping. But there is a problem: As the paper describes (if i understood correctly) arc-dewarping is only possible, if the text is blockwise and the bounds of the text show a nice and clean boundary line. Unfortunately this was a very rare case in my tests (see image below). I use the corners of the book to do the initial dewarp (Phase 1) and i plan to detect text lines and the curling to do the detail dewarp (Phase 2). Should be more robust, but also not as accurate as arc dewarp.

Phase 1 is already working:

Phase 2 is complex and very hard to universalize, especially with some odd book pages in my collection (text-tables, pages that have photos with no borders, etc.)

Unfortunately i have no laser guided book scanner build... my only "Hardware" is a 10$ clipping tripod, a Canon Ixus 255 HS with CHDK Firmware, a USB cable and a software for remote release via usb

So i have to do all the work via Software but luckily it is already working as expected. Only quality improvements have to be done...

BTW: Floodfill is easy to implement, but really slow. Did you check Connected Components algorithm of OpenCV? That should speed things up

sandreas · Post by **sandreas** » 03 Dec 2015, 17:01

Here is the current release of Bookbuilder with some small fixes and better output, including a short german howto and english command line help:

http://fynder.de/article/freeware-bookb ... ro-35.html

Please use google translate for an english version

DIY Book Scanner

BookBuilder

BookBuilder

Re: BookBuilder

Re: BookBuilder

Re: BookBuilder

Re: BookBuilder

Re: BookBuilder

Re: BookBuilder

Re: BookBuilder

Re: BookBuilder

Re: BookBuilder