Calling from Frankfurt am Main in Germany

A place to introduce yourself, and to meet other awesome people.

Moderator: peterZ

Post Reply
L.Willms
Posts: 134
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Calling from Frankfurt am Main in Germany

Post by L.Willms »

I am a Web publisher and I scan books to get HTML text out of them to put on the Web. I do have an Ebook-Reader, but my target is not scanning books to read them on the Ebook-Reader instead of as a stack of bound paper...

I did this my work using a flatbed scanner ans ABBYY Fine Reader 11 for text recognition. Then comes the actual text editung which is quite tedious since ABBYY does give me too much of its own style guessing which I have to remove for a simple semantic markup and leaving the appearance to CSS. I would like ABBYY to deliver simply a sequence of paragraphs bracketed by <p> and </p> and a markup of italics and bold text by <em> and <b> respectively, and so on. I'll discuss possible solution in the OCR section.

The books I scan are not only in my home. Some I can access only in the reading room of the local university library or the German National library. In some cases of old and precious books I am not allow to make photocopies, but might photograph them.

I also scan musical scores with musical "text" recognition using capella scan, a companion of the score writer capella (there is an article on capella score writer on en.wikipedia.org, if you want to know more about it). Capella scan also uses the ABBYY recognition engine for text, and has also licenced the module for gothic script (Fraktur), but only on pages which also carry musical notes.

I found this forum after I stumbled -- while reading the Heise Ticker -- over this article about a folding DIY book scanner and thought that I could also build a book scanner for myself. Some searching brought me here. Great forum with great resources and great people. Greets to everybody, and glad to be with you!
BruceG
Posts: 99
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: Calling from Frankfurt am Main in Germany

Post by BruceG »

Hi
I am using Omnipage instead of ABBYY Finereader but I guess they both work the same way. The amount of text editing is dependent on the quality of the scan. Some books require editing on every page others only a few. For epub I need to remove headers and page numbers so they do not turn up in the final output. Doing this helps text flowing from page to page, not 100% though. Editing the epub file fixes this up or I just leave it.

Post a example of scan, epub output and xhtml for a page so we can better understand the problems.
L.Willms
Posts: 134
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: Calling from Frankfurt am Main in Germany

Post by L.Willms »

BruceG wrote:Post a example of scan, epub output and xhtml for a page so we can better understand the problems.
I'll open a thread in the appropriate section of the forum, the one on OCR.
Post Reply