Page 1 of 1

djvu 1bit vs 8bits

Posted: 16 Jul 2019, 15:31
by 0kelvin
Just noticed something:

A single text page, with grayscale or 1bit, results in the same PDF with mixed raster content. Maybe some bytes of difference, but they look the same. However, with djvu the result is different. Grayscale results in a somewhat bold text and 3,5x times increased filesize. It seems to add some anti aliasing to the text. With 1bit the filesize is smaller and there is no blurry antialiasing applied.

Some pages have grayscale images, making it impossible to convert to 1bit without destroying the images.

Re: djvu 1bit vs 8bits

Posted: 17 Jul 2019, 03:02
by zbgns
What software do you used for creating pdf and djvu files?

Re: djvu 1bit vs 8bits

Posted: 17 Jul 2019, 14:20
by 0kelvin
PDF in Abbyy. djvu with djvu small mod.

Re: djvu 1bit vs 8bits

Posted: 17 Jul 2019, 16:07
by cday
0kelvin wrote:
17 Jul 2019, 14:20
PDF in Abbyy. djvu with djvu small mod.
Which Abbyy FineReader version? Some years ago I looked briefly at MRC in both FineReader and Nuance OmniPage 18, and with the number of options available and the interface design it wasn't easy to understand how MRC output images were produced, even after studying the limited information in the help system and PDF guides.

If the PDF output images and file sizes you are obtaining are essentially the same whether grayscale or 1-bit is selected, I would suggest that the outputs are actually being produced using the same internal settings. If you convert a page containing both a grayscale graphic and text, then with appropriate MRC settings the graphic should be rendered in grayscale and the text in 1-bit or possibly grayscale if the option is available, I'm not sure.

Do you have one or two pages you could upload, one with just text and one with mixed content, if anyone has time to investigate the results that can be obtained?

Re: djvu 1bit vs 8bits

Posted: 17 Jul 2019, 17:33
by zbgns
0kelvin wrote:
17 Jul 2019, 14:20
PDF in Abbyy. djvu with djvu small mod.
I do not have much experience with Abbyy Finereader. However I would guess is that in case of 1 bit pictures there is jbig2 compression applied and no MRC compression is necessary. In case of grayscale images, algorithms do separation of foreground (letters) and background. The foreground is compressed with jbig2, background probably with jpeg2000. Afterwards layers are combined together (plus masks) in order to have final picture. Since the background is probably white (or near white) it may be compressed very efficiently (or even omitted) and in result the output is already the same in both cases (1bit and 8 bit pictures) as only foreground matters.

Could you please make available any samples of 1 bit and 8 bit originating files in order to verify whether my theory is correct?

Moreover, as far as I understand, the djvu compression works in similar way, however some other compression algorithms are used. Maybe you should use Abbyy Finereader to create both pdf and djvu and compare them? It seems to me that Abbyy software supports also djvu.

Re: djvu 1bit vs 8bits

Posted: 17 Jul 2019, 20:18
by 0kelvin
Maybe a single page compressed differently? I compressed a whole book and the same issue didn't happen.

I think it was a false alarm. When I view the djvu at 100% zoom the quality is higher than "fit width to the screen". Fit width seems to blurry the edges of everything.

Re: djvu 1bit vs 8bits

Posted: 29 Jul 2019, 03:32
by Konos93a
can you write the filesize and the dimension ?

personally i use djvusmall 0.4.4

so canon a3300 1 jpeg=3.25 mb

a book with 581 pages has 1.86 gb

after scan tailor advanced 581 pages 79 mb (filesize change depends the size of the book)

581 b&w tiff after djvusmall has a djvu with filesize 4437 kb

after abbyfineread djvu + ocr 7671 kb and pdf+ocr (with b&w) 11920 kb and table of contects output