Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Brainstorming ways to increase throughput

Built a scanner? Started to build a scanner? Record your progress here. Doesn't need to be a whole scanner - triggers and other parts are fine. Commercial scanners are fine too.
Post Reply
BuchScanner
Posts: 5
Joined: 04 Mar 2014, 00:52

Brainstorming ways to increase throughput

Post by BuchScanner » 09 Feb 2010, 20:23

We have yet to construct our scanner, but are in the process of brainstorming ways to increase throughput. We have come up with two solutions, one being relatively complicated/expensive but potentially awesome, the other taking the K.I.S.S. approach to engineering.

1. Automation: platen raising, page turning (with post-processing OCR to identify missing pages), and image capture.
2. Make a dual bookholder and divide the labor, one person raising platen and turning pages, the other checking and snapping photos.

I understand the apparent reluctance to develop an automated DIY book scanner (cost, complexity, need to still monitor), but is there any reason scanning two books "at once" would be prohibitive? The only thing I can think of that would be prohibitive, is writing a script to de-interleave the books, which isn't very prohibitive at all.

Thoughts?

User avatar
daniel_reetz
Posts: 2786
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Brainstorming ways to increase throughput

Post by daniel_reetz » 09 Feb 2010, 21:37

Wow, two on one with four cameras? Sounds like a winner to me. You'd need to counterweight the platen, and the books would probably have to be identical in size and thickness?

The wider the platen gets, the harder it is to stabilize and keep from binding, in my experience.

I would love to see you go through with any/all of these ideas.

BuchScanner
Posts: 5
Joined: 04 Mar 2014, 00:52

Re: Brainstorming ways to increase throughput

Post by BuchScanner » 09 Feb 2010, 23:16

It's not quite that extravagant; since the bookholder is designed to slide anyway, we envisioned a dual-bookholder that would slide two separate books underneath the standard 2 cameras; this should increase throughput slightly (not as much as actually doubling your scanners :P ), since the page on one book can be turned whilst capturing images from the other. Sorry for not being more clear initially.

User avatar
IcantRead
Posts: 95
Joined: 17 Sep 2009, 02:56
Number of books owned: 0
Country: United States
Location: Arizona

Re: Brainstorming ways to increase throughput

Post by IcantRead » 10 Feb 2010, 01:07

I think you may get the same speed out of two book scanners. It could save on cost though. The literal book scanning is not were the time goes, that takes little to no time, most of it is post processing. I scanned 300 pages in like 40 mins last night, but it took me 1 to 2 hours just to get one 50 page chapter in pdf.

sdati

Re: Brainstorming ways to increase throughput

Post by sdati » 10 Feb 2010, 10:16

I agree that the actual photography is a pretty small component of the time spent putting together something readable.

For me, I can do about 500 - 600 pages per hour with a single camera, but then running through something like ScanTailor takes a couple more (much of it is unattended, but you still want to verify some of the output...) Then since I only have one camera, I have to rearrange the files in alternating order (simple enough with a little script, but still another step to verify that I haven't accidentally missed a page somewhere along the line), put it all together as a PDF, etc.

No harm in trying for more efficiencies, but you may want to look at the end-to-end process to see where it makes sense to focus the most time.

BuchScanner
Posts: 5
Joined: 04 Mar 2014, 00:52

Re: Brainstorming ways to increase throughput

Post by BuchScanner » 10 Feb 2010, 17:17

Wow, we had no idea the post-processing took up so much time... What are your guys' system specs???

User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Brainstorming ways to increase throughput

Post by rob » 10 Feb 2010, 17:53

It's not the specs: it's the algorithm. The camera gives images that are not perfectly cropped, aligned, and dewarped. That's what the post-processing is for.
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.

BuchScanner
Posts: 5
Joined: 04 Mar 2014, 00:52

Re: Brainstorming ways to increase throughput

Post by BuchScanner » 10 Feb 2010, 19:24

We are asking for the specs of the computers you guys are using for post processing, so that we may get a better estimate of how our computers may perform.

sdati

Re: Brainstorming ways to increase throughput

Post by sdati » 10 Feb 2010, 23:26

I have 2 computers that I have used with ScanTailor - one is a laptop, 2 GB RAM and I think about a 1.7GHz processor, the other is a desktop 3GHz processor, 3GB RAM. I'm not quite sure how long some of the steps take, as I've never really timed them. I'd guess it's about an hour per 500 pages on the faster computer, if I don't do any manual tweaking. Probably 2 or 2.5 hours on the slower computer. I do know that final output goes much faster if I select 300dpi instead of 600dpi (I would guess it's about 4x faster, since there's 4x fewer pixels to deal with...) 300dpi is more than sufficient for my needs, and since most of my photos are effectively no more than 400dpi anyway, there's no reason to upsample.

If you don't mind uneven colored backgrounds, strange rotation, and extra stuff around your page borders, then of course you can skip all that postprocessing and just dump all your images into some sort of file - but for me I plan long-term to work on OCR and the images need to be pretty clean to work well (right now I'm getting 95% accuracy or greater with Tesseract, at least on fiction... Ocropus looks better long-term, especially for more complex layout). And even just for reading on my iPod, I prefer the images to be clean and straight so they look more polished.

BuchScanner
Posts: 5
Joined: 04 Mar 2014, 00:52

Re: Brainstorming ways to increase throughput

Post by BuchScanner » 11 Feb 2010, 12:32

Thanks for the information, we will focus on overclocking our computers (well, OCing them more 8-) currently have a Phenom II X3 OC'd 14% to 3.2GHz), in addition to trying the dual bookholder (DBH). We are planning to repurpose an old-ish desk (particleboard) and a bookshelf (constructed out of found OSB, recycling^2 :D ) to build the DBH installation scanner, and a bare-bones portable scanner. We will start the build this weekend, so look forward to pics!

Post Reply