Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Google Cloud Vision

Convert page images into searchable text. Talk about software, techniques, and new developments here.
Post Reply
JZL003
Posts: 1
Joined: 24 Oct 2015, 11:50
E-book readers owned: Kindle paperwhite, kindle voyage
Number of books owned: 700
Country: Australia

Google Cloud Vision

Post by JZL003 » 19 Feb 2016, 21:19

Google just released an interesting API called Google Cloud Vision:https://cloud.google.com/vision/. It has some really crazy image analysis but it also offers OCR.

FYI, I have not used it but, they offer 1,000 `units`/images per month for free and then $2.5 per thousand after that. I know running your own software is free but, it possibly could be faster and it's still pretty cost effective. However, the benefit I see is that, while it would lose page formatting/images (which is a non trivial loss), it seems very accurate and would probably not degrade with even extreme lighting issues or perspective warping.

I just thought it was an interesting piece of (basically) free tech

spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: Google Cloud Vision

Post by spamsickle » 08 Mar 2016, 10:31

I'm not sure how you conclude that it "seems very accurate" if you haven't tried it, unless that's an assumption based on Google's reputation.

I tried it on one typical image from a recent book scan, and it dropped a lot of the text altogether. While the image was not enhanced for contrast, and was a few times larger than the "500K or less" which Google recommends for image size, I didn't consider the result complete enough to be useful.

On the plus side, it did handle the mixed French and English on the same page acceptably well, and the text it did recognize was accurate. The dropped text was puzzling -- it would recognize the beginning of the line, and the end, but would often replace large sections of the middle with a newline (\n). While my eyes don't see anything different about the text that is being skipped, I hypothesize that the results might be improved by some kind of preliminary image processing -- contrast stretching or even binarization, perhaps.

In any case, thanks for the tip. I hadn't used Google cloud services at all before, and now I have a $300 credit which I either have to use within a month or lose. I hope I can find time to use it, somehow.

It will be interesting to see how other online OCR-as-a-service providers compare.

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest