I came across an article on recent developments in depixeling pixel art
http://research.microsoft.com/en-us/um/ ... index.html
I'm curious to hear if that could be used to improve some book scanning software step? Or are the applications (Scantailor and others) already using such algoritms?
Depixeling Pixel Art
Moderator: peterZ
Re: Depixeling Pixel Art
I don't think this would be useful for book scanning. One of the core ways Scan Tailor reduces the size of text pages is by reducing it to only two colours, black and white - which inherently introduces pixelation. That's an important step for getting small e-books. It doesn't cause a problem for reading books because the resolution of the scanned pages is so high that just by viewing it at a normal reading size, the sharp edges of the pixels get smoothed out.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
Re: Depixeling Pixel Art
Well I admit I don't grasp the algorithm steps but I got the impression that it in general improved conversion from pixelated images to smooth vectors. My hunch was that it might be useful as a step before OCR.
Re: Depixeling Pixel Art
I doubt that would help OCR at all. In fact, it might make OCR results worse. OCR software does cleaning, smoothing, etc. processes behind the scenes and they're all tuned specifically to the OCR's algorithms.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
- strider1551
- Posts: 126
- Joined: 01 Mar 2010, 11:39
- Number of books owned: 0
- Location: Ohio, USA
Re: Depixeling Pixel Art
Edit: Misty replied while I was typing. Essentially, "what she said".
Long story short, from my quick test on a single image, it's not worth it. I cleaned up a sample image by hand as a base image. I vectored and scaled it with potrace, and also scaled it with scantailor without any deskew or other stuff. Then I sent both images through tesseract. The results were slightly different, but the vector-scaled introduced as many mistakes as it corrected... and both of them only had ~5 mistakes in the whole page.
What was intersting, though, is that a vector-scaled image left in grayscale looks darn nice at ridiculous zooms, slightly better at no zoom, and produced better OCR results than the bitonal images. The tradeoff, of course, is size and compression efficiency. Still, if OCR accuracy is a must and both time and processing power isn't a concern, you could produce a vector-scaled version of the page purely to OCR and still use a bitonal version for the PDF/Djvu/whatever.
So to summarize from my horrifically un-thorough test, scaling as a vector image doesn't produce significant OCR improvements when the image is bitonal. Grayscale images have potential for better OCR, but uncertain whether vector scaling is critical to that or not.
All the OCR engines that I know of work with raster (pixeled) graphics, not vector. That being said, I noticed they had comparisons of raster graphics that had been scaled with various vector programs and raster scaling algorithms... their vector method looking better than anything else. I don't know about everyone else, but my books normally get captured at about 375 dpi, and then scaled up by scantailor to 600 dpi since Tulon recommends doing that. So your comment got me thinking that perhaps scaling as a vector image and then converting back to raster could be worthwhile.dtic wrote:My hunch was that it might be useful as a step before OCR.
Long story short, from my quick test on a single image, it's not worth it. I cleaned up a sample image by hand as a base image. I vectored and scaled it with potrace, and also scaled it with scantailor without any deskew or other stuff. Then I sent both images through tesseract. The results were slightly different, but the vector-scaled introduced as many mistakes as it corrected... and both of them only had ~5 mistakes in the whole page.
What was intersting, though, is that a vector-scaled image left in grayscale looks darn nice at ridiculous zooms, slightly better at no zoom, and produced better OCR results than the bitonal images. The tradeoff, of course, is size and compression efficiency. Still, if OCR accuracy is a must and both time and processing power isn't a concern, you could produce a vector-scaled version of the page purely to OCR and still use a bitonal version for the PDF/Djvu/whatever.
So to summarize from my horrifically un-thorough test, scaling as a vector image doesn't produce significant OCR improvements when the image is bitonal. Grayscale images have potential for better OCR, but uncertain whether vector scaling is critical to that or not.
Re: Depixeling Pixel Art
Strider, for the sake of comparison, could you make a copy of your Scan Tailor project and output your page in full colour, then OCR that? (Or OCR the original scan.) My experience is that OCR software, or at least commercial OCR software, usually gets better results from a quality original scan rather than from Scan Tailor output (as documented here: http://diybookscanner.org/forum/viewtopic.php?f=3&t=356) The reason, again, is that OCR software does its own internal bitonalization and processing when performing OCR, and it's usually better-suited to their specific OCR algorithms than software processing for humans like Scan Tailor.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
- strider1551
- Posts: 126
- Joined: 01 Mar 2010, 11:39
- Number of books owned: 0
- Location: Ohio, USA
Re: Depixeling Pixel Art
Excellent link - can't believe I forgot about that one.
I redid my comparison with four versions of the images. (1) base image, color, 375 dpi; (2) scantailor, color, 600 dpi; (3) scantailor, bitonal, 600 dpi; (4) potrace of #1, grayscale, 600 dpi.
The original base image (#1) gave the best results but (#2) was very close, essentially only missing a space between two words. The bitonal image (#3) missed a good amount of spaces. Finally, the potrace version (#4) didn't mess up spaces, but had other incorrect characters.
I redid my comparison with four versions of the images. (1) base image, color, 375 dpi; (2) scantailor, color, 600 dpi; (3) scantailor, bitonal, 600 dpi; (4) potrace of #1, grayscale, 600 dpi.
The original base image (#1) gave the best results but (#2) was very close, essentially only missing a space between two words. The bitonal image (#3) missed a good amount of spaces. Finally, the potrace version (#4) didn't mess up spaces, but had other incorrect characters.
Re: Depixeling Pixel Art
Thanks Strider1551 and Misty, very interesting. Some reflections based on Strider's quick tests:
OCR'ing base images before ScanTailor might improve quality (as you say). But then a programming challenge is to find a way to import the OCR'ed text so that it maps to the right position on the pages of the ScanTailor+djvulibre .djvu output.
If OCR on ST'ed grayscale/color output is much better than ST'ed BW output then maybe a request should be made for a ST option to output in both modes. Both output versions of each page would then have the same dimensions which might solve the challenge above.
I'm also wondering if some advanced OCR program could make use of several versions of the same page (BW, color and one vectorized). Maybe some inputs are better for some aspects in the OCR process?
OCR'ing base images before ScanTailor might improve quality (as you say). But then a programming challenge is to find a way to import the OCR'ed text so that it maps to the right position on the pages of the ScanTailor+djvulibre .djvu output.
If OCR on ST'ed grayscale/color output is much better than ST'ed BW output then maybe a request should be made for a ST option to output in both modes. Both output versions of each page would then have the same dimensions which might solve the challenge above.
I'm also wondering if some advanced OCR program could make use of several versions of the same page (BW, color and one vectorized). Maybe some inputs are better for some aspects in the OCR process?