Depixeling Pixel Art

Discussions, questions, comments, ideas, and your projects having to do with DIY Book Scanner software. This includes the Stereo Data Maker software for the cameras, post-processing software, utilities, OCR packages, and so on.

Moderator: peterZ

Post Reply
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Depixeling Pixel Art

Post by dtic »

I came across an article on recent developments in depixeling pixel art
http://research.microsoft.com/en-us/um/ ... index.html

I'm curious to hear if that could be used to improve some book scanning software step? Or are the applications (Scantailor and others) already using such algoritms?
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: Depixeling Pixel Art

Post by Misty »

I don't think this would be useful for book scanning. One of the core ways Scan Tailor reduces the size of text pages is by reducing it to only two colours, black and white - which inherently introduces pixelation. That's an important step for getting small e-books. It doesn't cause a problem for reading books because the resolution of the scanned pages is so high that just by viewing it at a normal reading size, the sharp edges of the pixels get smoothed out.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: Depixeling Pixel Art

Post by dtic »

Well I admit I don't grasp the algorithm steps but I got the impression that it in general improved conversion from pixelated images to smooth vectors. My hunch was that it might be useful as a step before OCR.
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: Depixeling Pixel Art

Post by Misty »

I doubt that would help OCR at all. In fact, it might make OCR results worse. OCR software does cleaning, smoothing, etc. processes behind the scenes and they're all tuned specifically to the OCR's algorithms.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
User avatar
strider1551
Posts: 126
Joined: 01 Mar 2010, 11:39
Number of books owned: 0
Location: Ohio, USA

Re: Depixeling Pixel Art

Post by strider1551 »

Edit: Misty replied while I was typing. Essentially, "what she said".
dtic wrote:My hunch was that it might be useful as a step before OCR.
All the OCR engines that I know of work with raster (pixeled) graphics, not vector. That being said, I noticed they had comparisons of raster graphics that had been scaled with various vector programs and raster scaling algorithms... their vector method looking better than anything else. I don't know about everyone else, but my books normally get captured at about 375 dpi, and then scaled up by scantailor to 600 dpi since Tulon recommends doing that. So your comment got me thinking that perhaps scaling as a vector image and then converting back to raster could be worthwhile.

Long story short, from my quick test on a single image, it's not worth it. I cleaned up a sample image by hand as a base image. I vectored and scaled it with potrace, and also scaled it with scantailor without any deskew or other stuff. Then I sent both images through tesseract. The results were slightly different, but the vector-scaled introduced as many mistakes as it corrected... and both of them only had ~5 mistakes in the whole page.

What was intersting, though, is that a vector-scaled image left in grayscale looks darn nice at ridiculous zooms, slightly better at no zoom, and produced better OCR results than the bitonal images. The tradeoff, of course, is size and compression efficiency. Still, if OCR accuracy is a must and both time and processing power isn't a concern, you could produce a vector-scaled version of the page purely to OCR and still use a bitonal version for the PDF/Djvu/whatever.

So to summarize from my horrifically un-thorough test, scaling as a vector image doesn't produce significant OCR improvements when the image is bitonal. Grayscale images have potential for better OCR, but uncertain whether vector scaling is critical to that or not.
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: Depixeling Pixel Art

Post by Misty »

Strider, for the sake of comparison, could you make a copy of your Scan Tailor project and output your page in full colour, then OCR that? (Or OCR the original scan.) My experience is that OCR software, or at least commercial OCR software, usually gets better results from a quality original scan rather than from Scan Tailor output (as documented here: http://diybookscanner.org/forum/viewtopic.php?f=3&t=356) The reason, again, is that OCR software does its own internal bitonalization and processing when performing OCR, and it's usually better-suited to their specific OCR algorithms than software processing for humans like Scan Tailor.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
User avatar
strider1551
Posts: 126
Joined: 01 Mar 2010, 11:39
Number of books owned: 0
Location: Ohio, USA

Re: Depixeling Pixel Art

Post by strider1551 »

Excellent link - can't believe I forgot about that one.

I redid my comparison with four versions of the images. (1) base image, color, 375 dpi; (2) scantailor, color, 600 dpi; (3) scantailor, bitonal, 600 dpi; (4) potrace of #1, grayscale, 600 dpi.

The original base image (#1) gave the best results but (#2) was very close, essentially only missing a space between two words. The bitonal image (#3) missed a good amount of spaces. Finally, the potrace version (#4) didn't mess up spaces, but had other incorrect characters.
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: Depixeling Pixel Art

Post by dtic »

Thanks Strider1551 and Misty, very interesting. Some reflections based on Strider's quick tests:

OCR'ing base images before ScanTailor might improve quality (as you say). But then a programming challenge is to find a way to import the OCR'ed text so that it maps to the right position on the pages of the ScanTailor+djvulibre .djvu output.

If OCR on ST'ed grayscale/color output is much better than ST'ed BW output then maybe a request should be made for a ST option to output in both modes. Both output versions of each page would then have the same dimensions which might solve the challenge above.

I'm also wondering if some advanced OCR program could make use of several versions of the same page (BW, color and one vectorized). Maybe some inputs are better for some aspects in the OCR process?
Post Reply