Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Calling from Frankfurt am Main in Germany

A place to introduce yourself, and to meet other awesome people.
Post Reply
L.Willms
Posts: 129
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Calling from Frankfurt am Main in Germany

Post by L.Willms » 06 Oct 2016, 11:18

I am a Web publisher and I scan books to get HTML text out of them to put on the Web. I do have an Ebook-Reader, but my target is not scanning books to read them on the Ebook-Reader instead of as a stack of bound paper...

I did this my work using a flatbed scanner ans ABBYY Fine Reader 11 for text recognition. Then comes the actual text editung which is quite tedious since ABBYY does give me too much of its own style guessing which I have to remove for a simple semantic markup and leaving the appearance to CSS. I would like ABBYY to deliver simply a sequence of paragraphs bracketed by <p> and </p> and a markup of italics and bold text by <em> and <b> respectively, and so on. I'll discuss possible solution in the OCR section.

The books I scan are not only in my home. Some I can access only in the reading room of the local university library or the German National library. In some cases of old and precious books I am not allow to make photocopies, but might photograph them.

I also scan musical scores with musical "text" recognition using capella scan, a companion of the score writer capella (there is an article on capella score writer on en.wikipedia.org, if you want to know more about it). Capella scan also uses the ABBYY recognition engine for text, and has also licenced the module for gothic script (Fraktur), but only on pages which also carry musical notes.

I found this forum after I stumbled -- while reading the Heise Ticker -- over this article about a folding DIY book scanner and thought that I could also build a book scanner for myself. Some searching brought me here. Great forum with great resources and great people. Greets to everybody, and glad to be with you!

BruceG
Posts: 67
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: Calling from Frankfurt am Main in Germany

Post by BruceG » 07 Oct 2016, 05:00

Hi
I am using Omnipage instead of ABBYY Finereader but I guess they both work the same way. The amount of text editing is dependent on the quality of the scan. Some books require editing on every page others only a few. For epub I need to remove headers and page numbers so they do not turn up in the final output. Doing this helps text flowing from page to page, not 100% though. Editing the epub file fixes this up or I just leave it.

Post a example of scan, epub output and xhtml for a page so we can better understand the problems.

L.Willms
Posts: 129
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: Calling from Frankfurt am Main in Germany

Post by L.Willms » 07 Oct 2016, 09:44

BruceG wrote:Post a example of scan, epub output and xhtml for a page so we can better understand the problems.
I'll open a thread in the appropriate section of the forum, the one on OCR.

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests