Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Open software Windows 10 Processing in 2020

Share your software workflow. Write up your tips and tricks on how to scan, digitize, OCR, and bind ebooks.
Post Reply
Posts: 55
Joined: 07 Nov 2011, 16:22
E-book readers owned: newton
Number of books owned: 2
Country: Australia
Location: Castlemaine, Victoria, Australia

Open software Windows 10 Processing in 2020

Post by victoriaaustralia » 03 Oct 2020, 18:00

I had an older tutorial (2013!) which used Homer, a wrapper for Ruby/PdfBeads/Tessaract. Unfortunately this is no longer available so I needed to create a new process. I am IT competant but not skilled and only really use the Windows environment, I don't have time to learn linux or command line approaches, powerful as they do seem.

What is working currently:
camera images go into two folders, named left and right. I find it useful to record every page, including blank pages. I stick a post-it note onto the blank page for the camera auto-focus look at. Having images of blank pages helps with maintaining book page numbering.
Bulk Rename Utility (was previously using Ant Renamer) is used to rename in the four digit format (0002.jpg, 0004, 0006 for example for the left folder images, 0001, 0003 for right pages). Cover is named 0000.jpg
You should then have correct page numbered right and left pages. before you combine the two sides, check on some random pages to make sure that the file numbers match the page numbers, this should catch any missing pages. Interleaved image plate pages can upset this.
These two folder contents left and right are then placed in a folder named Combined.

Image process the Combined folder jpgs in Scan Tailor Advanced (still being maintained, thanks amazing IT savy people!)
.tiff to pdf using Irfanview and plugins

Obviously you can stop there with a pdf, if needing OCR and the book is appropriate to upload to Archive.org, you can take advantage of their excellent OCR engine:
Upload to archive.org
Archive will OCR the pdf, download as pdf with text
Freeware Windows workflow in 2020

Posts: 320
Joined: 01 Apr 2011, 18:05
Number of books owned: 0
Location: Issaquah, WA

Re: Open software Windows 10 Processing in 2020

Post by dpc » 05 Oct 2020, 10:04

I stick a post-it note onto the blank page for the camera auto-focus look at.
Does your camera support focus lock? If it does, you could set the focus once and it would remain set for shots of subsequent pages.

Post Reply