Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

LCD (Least Complicated Device)

Built a scanner? Started to build a scanner? Record your progress here. Doesn't need to be a whole scanner - triggers and other parts are fine. Commercial scanners are fine too.
Post Reply
cmahte
Posts: 3
Joined: 02 Sep 2014, 16:00
E-book readers owned: Kindle Touch, Ipad mini, Ipod Touch, BN Nook Simple Touch, Slick E701, Several Android Phones, Handspring Visor, Nokia N810
Number of books owned: 1000
Country: United States
Location: Dallas, Texas
Contact:

LCD (Least Complicated Device)

Post by cmahte » 03 Sep 2014, 01:19

Introduction:

I am working OCR version of a series of books that is partially available on google books, but annoyingly incomplete, and missing some of the most important parts. After exhausting all options for digital books (including chasing down the publisher 102 years later,) I requested physical copies via library loan.

So... Now I need a non-destructive scanner in Texas at an unknown point in time but probably within the next week or so to scan ~18 books--some 2000-3000 pages. No time like after you begin to start planning right?

Today, I reviewed what is out there in the way of nondestructive scanning technology, and, I think I can build a halfscanner with my 8mp android phone with a cracked screen/bad contract I quit using in January, a counterbalanced platen, and and a soft backing. I've access to adobe creative suite and creative cloud so post processing isn't too big a deal, but I've previously scripted imageJ for automated cropping and splitting, and I'm surprised I don't see it in the scannerworld as a tool in use.

The Plan:
Base/Cradle-- something close to
http://cool.conservation-us.org/coolaic ... 04-01.html
(with clipboards probably because I'm expecting what I get to be softbound.)

Platen-- Something based on
http://www.instructables.com/id/Book-Sc ... ages-an-h/

But with a foot pedal actuator:
https://www.google.com/shopping/product ... 7381039671

Foot pedal action down with platen (foot force = force on book to control pressure.) That is, the counter balance will be heavier than the platen, the footpedal will provide ~12 inch range of motion to lower platen to book height.

Camera:
Android 8MP camera (N7189 model) - with remote camera app installed.
Camera Actuator:
http://www.amazon.com/M-Audio-Sustain-P ... B00063678K

camera is setup and on my local net as a remote camera. midi pedal is programmed with a macro keys program/system defined macro to initiate the remote camera.

Questions:
1. Has anyone used a pillow type cradle, and have any feedback on it's success or demise?

2. Does anyone have EXPERIENCE with OCR that can suggest the best resolution (pixels across an M in body text)?

User avatar
daniel_reetz
Posts: 2797
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: LCD (Least Complicated Device)

Post by daniel_reetz » 05 Sep 2014, 09:15

How many pages are we talking?

The pillow type cradles are nice'n'gentle, I built a crude one a few years ago. Problem I had was that it seemed to slowly change angle.

I worked briefly for an organization that said that their OCR accuracy improved dramatically around 400PPI.

cmahte
Posts: 3
Joined: 02 Sep 2014, 16:00
E-book readers owned: Kindle Touch, Ipad mini, Ipod Touch, BN Nook Simple Touch, Slick E701, Several Android Phones, Handspring Visor, Nokia N810
Number of books owned: 1000
Country: United States
Location: Dallas, Texas
Contact:

Re: LCD (Least Complicated Device)

Post by cmahte » 05 Sep 2014, 14:14

Re: OCR

I've been testing my Android imaging "solution," and I'm about to give up on it. Lots and lots of pixel noise. I've been unable to get text recognition functioning at all, at any resolution. I don't want to have to do normalization processing prior to OCR, since that will undoubtedly make for more OCR errors. Earlier statements about 300ppi are all based on destructive, flatbed scanner work I've done when I can buy copies.

I'm probably going to look at a fourier transform before abandoning the Android. Because it's free, and I like free. However, I already know it will cause more spellcheck issues later. Most likely I'll be switching to a much slower 2 second delay on a better camera that I don't have a way to remotely activate.

Pattern detection with CCD is a process I'm familiar with (in automated manufacturing quality control), but when I did it professionally, I was working with multi million dollar systems, with most of the issues already solved and invisible to me. I'd seen portrait photographs from this camera and thought it would do well enough, but my optimism clearly was also based on the memory of perfection that a 200k CCD in a light shielded environment, with precision engineered mechanical, optical and digital paths. I still think a lens attached to the android marketplace makes the data path much much easier to solve, but I don't see any hardware options currently worth investigating. I would love to hear if anyone has an android camera working they recommend.

Re: About my current project

http://books.google.com/books?id=y_USAAAAYAAJ

A typical representation of the complete list of books I'm collecting is viewable on page 204-205 of Volume 1 in this book. This link includes 2 volumes that have been bound together, so its the 204 that is about halfway through.

The Sunday School Commission of The New York Diocese of the Anglican Church completed or nearly completed a complete, graded set of lesson books from pre-k to Post-highschool. Each of the ~18 courses had 2 or 4 parts, and each part has a matching Teacher's manual. So far, each part is 100-200 pages. This work was done between the 1890s and 1910s, and seems to have stopped about the beginning of the first world war. This scheme closely matches a scheme I've envisioned to bridge a gap between the end of a 1 year literacy course and being able to deal with Bible's written at a high school level.

The Entire series from this Sunday School Commission is probably in the range of 30,000-50,000 pages. I've pulled and begun processing the ones I can find on the Internet Archive. I am still missing parts of 9 courses or teacher manuals that are needed to make a complete set for my purpose. I haven't paid attention to worldcat descriptions of page counts, but based on what I have, the gap is about 2000-2500 pages.

My primary goal is that this material becomes source for translations into other languages, in order to end up with sets of digital and paper books which can compliment existing Literacy and Bible distribution projects.

From an Archivist view point, the complete set as The Sunday School Commission published should also be archived. Completing that set would require finding and scanning an additional 10,000 pages more than what I'm currently seeking. If we exclude the Google books with 'restrictions' on their Public domain status, it's more like 25000 pages will need to be scanned. Once I've completed this first phase, and have translation projects in place for the first couple of courses, I hope to come back to scanning the entire set as the SSC did it.

vitorio
Posts: 138
Joined: 30 Oct 2010, 23:56
Number of books owned: 0
Location: Austin, Texas, USA
Contact:

Re: LCD (Least Complicated Device)

Post by vitorio » 12 Sep 2014, 01:51

cmahte wrote:So... Now I need a non-destructive scanner in Texas at an unknown point in time but probably within the next week or so to scan ~18 books--some 2000-3000 pages. No time like after you begin to start planning right?
Yeah, 8MP is not great, and smartphone lenses don't let you focus as well as you need to.

If you'd like to not have to build something yourself, the Austin Hackerspace has a DIY book scanner. It's free to use for members, but non-members need to have a member chaperone present at all times. For the length of time you'd need to use it, it'd probably be worth buying a single month's membership so you could sit there as long as you need to.

http://atxhackerspace.org/wiki/Book_Scanner

BruceG
Posts: 71
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: LCD (Least Complicated Device)

Post by BruceG » 12 Sep 2014, 23:17

There is a scanner made with PVC pipe that is worth looking at
http://diybookscanner.org/forum/viewtop ... =14&t=2914

I have not built one yet but will in the future.
You should be able to find 2 cheap cameras to use with it.

Recently I just used a tripod, camera pointed down, 2 pages (about A4 size)at a time. Click - turn page - click - turn page - etc
Lighting was a desk lamp overhead
Used a Nikon S6500

Worked well for OCR
A book of 164 (double) pages took 25 min
Did over 5000 shots in 3 days
Also dropped the camera so had to buy another

cmahte
Posts: 3
Joined: 02 Sep 2014, 16:00
E-book readers owned: Kindle Touch, Ipad mini, Ipod Touch, BN Nook Simple Touch, Slick E701, Several Android Phones, Handspring Visor, Nokia N810
Number of books owned: 1000
Country: United States
Location: Dallas, Texas
Contact:

Re: LCD (Least Complicated Device)

Post by cmahte » 13 Sep 2014, 16:02

Rob,

Re: Austin Hackerspace

I'm averse to driving I-35 to Austin. If it were in the Dallas area, I'd be very interested.

Re: Focusing smart phones

I think my phones limitation wasn't so much 8mp as it is the quality of manufacture. I bought it direct from china, no name, at third shift prices, knowing it to be lesser quality. we publish globally in many languages, and I want devices like the majority will be accessing. The sensor is clearly a reject from a namebrand phone...

I've settled on a Nikon s800c as the camera for my yet to be named scanner, and since it is so cheap (54.00 on Amazon for the white case), I've bought 2, so I'm now working toward a full spread scanner. It runs an older Android (2.3 Gingerbread), and most especially has bluetooth for making remote shutter activation easy. A tripod connector, Wifi, GPS, HDMI out are really nice extras (wifi and HDMI have applications for this project, GPS for some others.) The downside is the USB connector isn't a standard Micro or mini, but one of those off standard. What came in the box is only an external charger. Until number 2 arrives, I'm limited to ~100 images per battery, then a long wait for a battery recharge.

attached is test image #3 from the device, which was about the focusing and focal distance. everything works well, but I've now got some design issues with reflective glass on a 90 degree design. Is there a thread discussign best platen angles?

Bruce,

Re: Counter Balance

I'm following the idea of counterbalance, but not pvc. I want a device that can work on any table, yet use a foot pedal, which will involve a self adjusting mechanism...
Attachments
DSCN0028.JPG

Post Reply