Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Getting Abbyy to Recognize Chapters/Divisions

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.
Post Reply
A Book and a Wand
Posts: 3
Joined: 23 Oct 2012, 22:38
E-book readers owned: kindle kobo
Number of books owned: 0
Country: Canada

Getting Abbyy to Recognize Chapters/Divisions

Post by A Book and a Wand » 24 Oct 2012, 00:45

Hi,

I'm using Abbyy to OCR my book scans. I'm scanning both fiction (literary novels) and non-fiction (academic books). The OCR itself is working well and there are few errors in recognition. I read in mobi or epub formats using Caliber to convert between them.

However, I find the fiction sometimes difficult to read because the chapter divisions are stripped from the books. Some books also have little breaks within chapters -- in the hard copies that is indicated by a skipped line in the text -- that indicate scene changes/flashbacks/etc. These are also stripped. So other than paragraphs, the text I get is undivided. This actually makes the books less enjoyable since I sometimes don't realize that the chapter or scene has changed and have a little moment of confusion until it becomes clear.

I would like Abbye to preserve the chapter structure and the divisions between chapters. How can I make this happen? Most of the fiction books in question have at most a simple number indicating the new chapter. So a new chapter is indicated by a page where the text starts half way down the page and there's a number above and divisions between chapters are indicated by a skipped line. I would like the epubs to similarly have new chapters when chapters start and skip lines within chapters where appropriate.

How can I make this happen? If not with abbyy is there other software I can use instead of or along with Abbyye that will do this?

A Book and a Wand
Posts: 3
Joined: 23 Oct 2012, 22:38
E-book readers owned: kindle kobo
Number of books owned: 0
Country: Canada

Re: Getting Abbyy to Recognize Chapters/Divisions

Post by A Book and a Wand » 24 Oct 2012, 01:01

One more thing I should add...

I guess if absolutely necessary I'd be willing to individuall show abbyy where the chapters start, say be right clicking on a page and indicating "new chapter" somehow. However, I'd like to avoid anything like actually zooming in on the pages to find the within-chapter divisions or anything like that. Since I'll be reading these books I don't want to end up reading bits and pieces from the middle or end of the book ahead of time, since that's sure to end with at least a few spoilers.

Vidar
Posts: 8
Joined: 17 Sep 2012, 17:18
E-book readers owned: Kindle Paperwhite
Number of books owned: 400
Country: Norway

Re: Getting Abbyy to Recognize Chapters/Divisions

Post by Vidar » 24 Oct 2012, 12:57

Are you saving directly to mobi or epub in Abbyy? I use ver 7 and it can't, but probably the newest ver 11 can. Anyhow, if it cannot do a proper job of it, that's just to bad. I save to pdf and open that in Caliber. It doesn't cause any problems so you could try that.

A Book and a Wand
Posts: 3
Joined: 23 Oct 2012, 22:38
E-book readers owned: kindle kobo
Number of books owned: 0
Country: Canada

Re: Getting Abbyy to Recognize Chapters/Divisions

Post by A Book and a Wand » 24 Oct 2012, 22:31

I'm using Abbyy 11.I was saving directly to epub.

So I just tried saving as a pdf "Exact copy" and then converted to epub in caliber. This didn't work very well. First, the text isn't re-flowing well. The end of every line in the book has a hard return at the end so the paragraphs aren't holding together. Second, each scanned image includes non-text (i.e. black areas around the outside of the page (i.e. if the image area is 8 inches tall and the book is 6 inches, there's an inch top and bottom of image noise)). When I save as epub in abbyy this all goes away. When I convert in caliber I get images of each page after the text of the page -- page text, image of page, page text, image of page etc.

So is that what you mean when you suggested saving as pdf? If something else I"ll be happy to try it or hear any other ideas.

Vidar
Posts: 8
Joined: 17 Sep 2012, 17:18
E-book readers owned: Kindle Paperwhite
Number of books owned: 400
Country: Norway

Re: Getting Abbyy to Recognize Chapters/Divisions

Post by Vidar » 26 Oct 2012, 12:29

Well, when I saved in Abbyy ver 7, I choose "text and pictures only" in "Formats Settings". "Pictures" means pictures on the page, not the whole picture of the page. There's is other options , "pictures only" or pictures over or under the text which preserves the whole picture of the page. In Calibre I choose my device, Kindle, not the format itself. I think this worked reasonably well. Not perfect, some lines stops halfway and start a new line after some leading spaces, but it is not to much to
bother me. I got the cover page at the beginning, but there are noe more pictures. I have done this only once, except for one more try with a manual in pdf format (not made with Abbyy) . It had lots of headers, bulleted lists and graphics and made a horrible mess on my Kindle. I think I was a bit out of my way in my first posting and are sorry to have made such a sweeping statement. For pages with more complicated layout I guess your best option will be to read directly in pdt if your device support it.

Post Reply