Text reflow in scanned page

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

Post Reply
jumpjack
Posts: 21
Joined: 04 Mar 2014, 00:53

Text reflow in scanned page

Post by jumpjack »

I think the algorithm(s) used to identify lines in ScanTailor could also be used to identify single words; once you can enclose each word in a box, it's easy to extract them one by one and save each one to a file.
This would allow building a weird html page containing all words as images.
Why?
Because in this way you get a book scan which, although graphic (rather than textual), allows text reflow and justification, just like any e-text!

Is that possible to implement such a feature in ScanTailor?
I figured out which command to use in ImageMagick to process an image in such a way it turns into a bunch of separated "blobs", one per each word, including punctuation:
convert test.nmp -morphology Erode Rectangle:8x3 blobs.bmp

(source image should not contain compression artifacts)

I wonder if ImageMagick could be also used to identify blobs edge, but till now I didn't find a method to do it.
Post Reply