in the last weeks i hacked together a little console utility for image postprocessing and pdf generation. There is no guarantee, that it will be developed further, but i would like to show it here, just in case someone might find it useful. Feedback would be great, the more feedback i get, the higher is the possibilty for further development
If someone would like to support development, i could provide some papers that need to be implemented (especially for content detection), my math skills are very limited
Instructions for taking pictures:
- Place the book on a dark background
The book should be fully visible
You can use your fingers, to hold the book, but don't place it in the 4 corners of the book (corner-detection is very important for content extraction)
Only white or light pages are supported, if you have colored pages (like a fully red page) this is not possible at the moment
Result:
Feature list:
- PDF-Generation and invisible Text-Layer embedding (Windows only, on other OS it might work, when tesseract is installed)
- Batch-Converting a set of JPG-images
- Image-Rotation (which has to be done manually)
- Fully-Automatic Book-Edge detection and content extraction (no parameters)
- Very simple finger removal (this is really not working well )
- Paper-whitening
- Very simple dewarp (a more complex dewarp approach is in development, but this task is very time consuming, because content and line dection needs to be done)
- Very simple remaining time calculation
Code: Select all
java -jar bookbuilder-0.2.0.jar
Code: Select all
java -jar bookbuilder-0.2.0.jar --input-path="data\images\book" --output-file="data\temp\out.pdf" --embed-ocr-layer --rotation-degrees=180
sandreas