Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Text reflow in scanned page

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.
Post Reply
jumpjack
Posts: 21
Joined: 04 Mar 2014, 00:53

Text reflow in scanned page

Post by jumpjack » 15 Jan 2011, 14:59

I think the algorithm(s) used to identify lines in ScanTailor could also be used to identify single words; once you can enclose each word in a box, it's easy to extract them one by one and save each one to a file.
This would allow building a weird html page containing all words as images.
Why?
Because in this way you get a book scan which, although graphic (rather than textual), allows text reflow and justification, just like any e-text!

Is that possible to implement such a feature in ScanTailor?
I figured out which command to use in ImageMagick to process an image in such a way it turns into a bunch of separated "blobs", one per each word, including punctuation:
convert test.nmp -morphology Erode Rectangle:8x3 blobs.bmp

(source image should not contain compression artifacts)

I wonder if ImageMagick could be also used to identify blobs edge, but till now I didn't find a method to do it.

Post Reply