Where do I start?

Discussions, questions, comments, ideas, and your projects having to do with DIY Book Scanner software. This includes the Stereo Data Maker software for the cameras, post-processing software, utilities, OCR packages, and so on.

Moderator: peterZ

Post Reply
Anthony
Posts: 6
Joined: 17 Aug 2011, 22:29
Number of books owned: 1000
Location: Toronto, Ontario

Where do I start?

Post by Anthony »

Hi Everyone,
After spending a fair bit of time reading through the hardware section... and now more recently the software section. I am at a bit of a loss of where to start. I thought I'd do a test run on the software side before I build the hardware to match. The problem I'm having is where to start? I didn't know which would be better to start with ST 0.9.10 or ST Enhanced 20110907. I ended up going with enhanced because I think it is a build on top of 0.9.10, and it seemed quite stable for non coding people like myself. Then I realized there is other software; Book Scan Wizard, KuKnet FileRenamer,Renamer, another one to convert to PDF, and another one to do OCR. So with this list of programs and my growing uncertainty I thought I would ask for help.

I'm running Windows 7. I'm looking to go from a setup (which is yet to be built, so I'll do it by hand for now) where I have 2 cams and 2 SD cards to a PDF file that has OCR done on it. Whats the best way to go about this? What files will I need?

Thanks in advance for anyone who can offer help or direct me to a thread that has already gone through this from start to end.

- Anthony
Ryan_phx
Posts: 63
Joined: 29 Dec 2010, 14:51
E-book readers owned: Nook, Kindle DX
Number of books owned: 0
Country: USA
Location: Sandusky, OH

Re: Where do I start?

Post by Ryan_phx »

There are a lot of different ways to do this. Here's how I do it (and mind you, this is certainly not the most streamlined or efficient way to do it--but it's the way I'm most comfortable with).

1) Create folder "Book" on computer, with sub-folders "L" and "R", and then copy files from SD card to the appropriate sub-folders
2) Use Total Commander to rename files in "L" to 001.jpg, 003,jpg, 005.jpg, etc., and files in "R" to 002.jpg, 004.jpg, 006.jpg, etc., then copy renamed files into a single folder
3) Use Book Scan Wizard to fix any perspective distortion and output at 600 DPI
4) Use ScanTailor to align, crop, set margins, and generally finish the processing
5) Use Adobe Acrobat Pro to combine the processed files into a single pdf

As I said, this isn't the most efficient method. I like it, though, because it helps me keep each step separate in my own head so that I'm less likely to miss something or mess something up. Also, all of these programs are free, except Acrobat Pro, but there are other free pdf programs out there that people have had success with.
Anthony
Posts: 6
Joined: 17 Aug 2011, 22:29
Number of books owned: 1000
Location: Toronto, Ontario

Re: Where do I start?

Post by Anthony »

Thanks for the run through Ryan. I didn't realize that each book would take so many steps. From start to end, How long does a typical book take you to go from the original to the OCR copy? I'd be curious to see if anyone else has other approaches.
-Anthony
Ryan_phx
Posts: 63
Joined: 29 Dec 2010, 14:51
E-book readers owned: Nook, Kindle DX
Number of books owned: 0
Country: USA
Location: Sandusky, OH

Re: Where do I start?

Post by Ryan_phx »

It's not as bad as it sounds, and it doesn't take as long as you'd think. I think BSW can even handle the file renaming, if I'm not mistaken. I just never bothered to learn that feature. For me, what takes the longest is proofing the images--looking over ScanTailor's automatic content boxes to make sure that all of the page numbers, footnotes, margin notes, etc. have been included in the box, and that the picture boxes grab the entire picture. I often have to do some manual work there, and that takes a lot of time. The rest of it goes pretty fast, and doesn't require much attention.

There are other threads in this forum that explain other people's workflow and processes--your best bet is probably to read over those first.
Post Reply