Posted: 13 Feb 2017, 13:18
by robinsonfaitha
What can I do to make a book scanner that meets the following requirements? Do I need extra software to do this? Can I do this with cheap/free software?

I'd like to be able to AUTOMATICALLY scan large books (greater than 8.5 in x 11 in) that have color pictures.

I'd like it to be as cheap as possible. Please include a cost estimate, as I have no idea about things like that.

I'd like to be able to have OCR for the following languages:
Welsh, Scottish Gaelic, Irish Gaelic, Chinese (traditional & simplified), Korean, Japanese, French, German, and Spanish
Preferably, it should include rarer and ancient languages like Latin, Manx, etc..
It should be able to recognize accents and other stuff (ESPECIALLY IPA) or things over letters when it comes to things like scanning dictionaries.

I'd like to make the scans look like ebooks you buy from Amazon (white background, not yellow, and clear & sharp black text)

I do NOT want to have the books destroyed. I need to keep a physical form.

I'd like the post-scanning stuff (color-fixing, aligning) to take as little time as possible.

I'd like the final result to have a small file size.

I'd like to be able to hire someone to build this stuff for me, so there should be plans/prints. There's a local engineering school near me (Palm Beach Gdns, FL, U.S.A) so I could hire cheap students.

Thank you! I have no idea what to do. I try reading some people's posts and I just go... " :? I knew I was never meant for engineering..." And that's probably true. I'm going into folklore and mythology studies/entrepreneurship this year as a freshman in college. Tons and tons of those books do not have e-book formats and one of my encyclopedias of Celtic folklore is falling apart. Ugh.

Posted: 14 Feb 2017, 17:49
by BruceG
You asking a lot. Cheap and automatically do not seem to go together.
A cheap scanner would be one made from PVC piping. It was designed by David Landin, he has provided plans and videos on its construction and use. You will find details on this site. You can scale up the scanner for large books. User input is required.

As for OCR ABBYY and OmniPage would be worth looking at, of the commercial software. I expect both have trial versions. Tesseract is free, also worth looking at.
Best idea is to scan a few pages of each language and give them a go with each of the software. In general the older the book the less accurate is the OCR. Paper quality, font size and printing quality all play a part.
The best OCR come from the best scans by way of lighting, camera etc. With good scanning time doing post scanning work is reduced.

As for IPA you would need to check software if they cover this. I understand ABBYY can be taught a new language. But if no one has done it for IPA then I would say it is not easy. Fonts sets for previous unwritten languages are still being created today.

I use OmniPage and it outputs to epub, kindle etc. Some people output to a text editor to make a book uniform. Depends how you want to use the material.
If you use a large tablet going as far as searchable pdf may be best use of time instead of OCR.

A new scanner CZUR ET16 may also be worth considering.

Posted: 23 Feb 2017, 23:05
by robinsonfaitha
Thank you. Sorry for the late reply. I've been struggling with Fibromyalgia and possibly a just-beginning autoimmune disorder (only time and further blood tests will tell). I wake up each morning with my favorite quote in mind: "What fresh hell is this?"

Just basically, I can't build anything myself. How much would the materials cost to make it automatically scan? I don't know how to find these pages, and frankly, I don't think I can do that. My brain capacity has been reduced to dust. Everything is a struggle for me. Some of the forum posts look like they're written in gibberish to me. I could hire a local engineering student, but anything like cleaning up scans to be easy to read, etc. would have to be easy to use. Sometimes for certain software or construction guides, even a "How-to XYZ for Dummies" isn't easy enough for me. My brain was made for liberal arts, esp. business, and other subjects like computer science. Throw in Fibro and AD(HD) and it's usually impossible. I spend about 8 hours in bed before I can handle doing simple things like making food or sending an e-mail. Most days I'm still too sick to make food or get up out of bed to get food. Additionally, most e-mails I send by that time still have multiple mistakes in them, whether it be wrong information or boatloads of typos. Ah.

On second thought, the images don't need to be pretty. Just easily readable. If I can edit the OCR text, accuracy isn't necessary. I'm willing to be able to add languages. Will those resources you listed still get the job done?

Posted: 28 Mar 2017, 18:50
by BruceG
I was waiting to reply for a Czur ET 16 scanner to turn up to see if it was suitable for you. The first one got lost but have now received a replacement. I have not had the time or space to test it out so far.
There is a recent post here about building a automatic scanner for around $10,000 (without software) as against the commercial ones at $100,000 (with software).
So I do not think you will be able to get away with not turning pages. Many DIY scanners also have a sheet of glass or plastic that lays on the pages to flatten them that you have to contend with. The Czur uses lasers instead of glass to flatten the output.
The other consideration is software, how much work is required to achieve what I want.
Then for scanning and processing, what happens when things go wrong.

When I test the Czur ET16 I let you know what I think.

Posted: 17 Nov 2017, 03:49
by LauraKK
I‘m a college student too and I’ve been using CZUR book scanner for a while. I suppose it meets your requirements and I’d like to share may using experience.
1. Document size
CZUR is able to scan up to A3 document, which is 11.7 x 16.5 in.
2. Automatically scan books
CZUR has a technology, Auto- Flatten to scan books efficiently.
3. OCR function
CZUR is also integrated with OCR function. The official website says it is from ABBYY and supports 187 languages. Most of languages you mentioned are included in it (even Latin).
4. Background
When scanning books, you may choose different color mode. Black and White is what you need I guess.
5. Non-destructive
You don’t need to destroy the physical books and just put them under the scanner. Like I said, the Auto-Flatten (they call it Flattening Curve) may help with it.
6. Processing software
The CZUR scanner software supports lots of processing of images, such as trimming, image quality adjustment, rotate and etc.
7. File size
You may choose the image quality for smaller file size.

Generally speaking, I’m a happy user of CZUR. I’m particularly fascinated by its fast speed. The easy operations are also good help for document scanning for me.
I have to say, CZUR ET16 is not perfect when scanning reflective materials and you need to add extra lights to avoid reflection.
