I think the algorithm(s) used to identify lines in ScanTailor could also be used to identify single words; once you can enclose each word in a box, it's easy to extract them one by one and save each one to a file.
This would allow building a weird html page containing all words as images.
Why?
Because in this way you get a book scan which, although graphic (rather than textual), allows text reflow and justification, just like any e-text!
Is that possible to implement such a feature in ScanTailor?
I figured out which command to use in ImageMagick to process an image in such a way it turns into a bunch of separated "blobs", one per each word, including punctuation:
convert test.nmp -morphology Erode Rectangle:8x3 blobs.bmp
(source image should not contain compression artifacts)
I wonder if ImageMagick could be also used to identify blobs edge, but till now I didn't find a method to do it.
Text reflow in scanned page
Moderator: peterZ