Howto manually add colored text to DJVU files

Share your software workflow. Write up your tips and tricks on how to scan, digitize, OCR, and bind ebooks.

Moderator: peterZ

Post Reply
mhr
Posts: 37
Joined: 07 May 2012, 10:12
E-book readers owned: onyx-boox-m92 sony-trs-t1
Number of books owned: 500
Country: Germany

Howto manually add colored text to DJVU files

Post by mhr »

If You want to colorize black and white text of a DJVU document and You
are willing to do this manually, You can do so as described in the following discussion.
mhr
Posts: 37
Joined: 07 May 2012, 10:12
E-book readers owned: onyx-boox-m92 sony-trs-t1
Number of books owned: 500
Country: Germany

Re: Howto manually add colored text to DJVU files

Post by mhr »

For this purpose we use a litte tool, which is contained in my PPM toolset.
The latter can be downloaded from http://www.diybookscanner.org/forum/vie ... =20&t=2915
and the used program is ppmrectlist.
mhr
Posts: 37
Joined: 07 May 2012, 10:12
E-book readers owned: onyx-boox-m92 sony-trs-t1
Number of books owned: 500
Country: Germany

Re: Howto manually add colored text to DJVU files

Post by mhr »

Now there are several appraches to this task using the photo encoder c44 of the djvulibre tool chain.
This often looks a bit fuzzy and enlarges file size quie a bit. But the djvumake utility of djvulibre has
a fancy possibility to specify a colored rectangle list, which is used to create a FGbz layer, which contains the color information of each encoded symbol.
The utility ppmrectlist extracts colored rectangular areas from a PPM file and creates as output such a list in the format, djvumake expects.
I will give a tiny example now.
mhr
Posts: 37
Joined: 07 May 2012, 10:12
E-book readers owned: onyx-boox-m92 sony-trs-t1
Number of books owned: 500
Country: Germany

Re: Howto manually add colored text to DJVU files

Post by mhr »

Lets say we have a bitonal image example_bw.tif like
Black/White layer of example text image example_bw.tif
Black/White layer of example text image example_bw.tif
example_bw.jpg (20.39 KiB) Viewed 13713 times
We do a conversion with our favorite DJVU/bw tool, e.g. minidjvu or cjb2, e.g.

Code: Select all

cjb2 -dpi 100 -lossy example_bw.tif example_bw.djvu
to create the DJVU text mask example_bw.djvu.
To add color to this DJVU-document, load the image example_bw.tif into your favorite graphic editor.
I will use gimp. Now do the following tasks:
  • Change image mode to RGB.
  • Swap black/white with color inversion.
  • Make this text layer transparent.
  • Delete the now white non text area.
  • Add a new non-transparent layer filled with black below this text layer and select it for editing.
Now use the rectangle tool to select areas which contain letters to be filled with a specific color
and fill the rectangle with the wanted color. I like to do this by specifying a background color and subsequently typing the delete key.
We may get a situations like
Colored rectangles in GIMP
Colored rectangles in GIMP
example_color.jpg (26.61 KiB) Viewed 13713 times
Now delete the text layer (or make it unvisible). Then save the image to a PPM file, e.g. example_rect.ppm.
This should look like
Saved PPM image with rectangular color information, file example_rect.ppm
Saved PPM image with rectangular color information, file example_rect.ppm
example_rect.jpg (4.06 KiB) Viewed 13713 times
Then use my utility ppmrectlist from the above mentioned toolkit in Your favorite operating system:

Code: Select all

ppmrectlist example_rect.ppm > example_rect.txt
This generates the rectangle color information we need. The content of the file example_rect.txt
looks like

Code: Select all

#4be1a3:111,184,52,33#df5140:63,72,109,50#636ed7:117,13,128,52
Finally feed this file at the appropriate place into djvumake:

Code: Select all

djvumake example.djvu INFO=,,100 Sjbz=example_bw.djvu FGbz=`cat example_rect.txt`
You can combine both programs within one call if You like:

Code: Select all

djvumake example.djvu INFO=,,100 Sjbz=example_bw.djvu FGbz=`ppmrectlist example_rect.ppm`
If You are using Windows You have to specify the content of the file example_rect.txt instead of the last quoted terms.
Or You use cygwin with a unix shell. The quotes will execute the system command cat, which displays the
content of the file like type in windows and splice the result as text in the djvumake command.

Now the DJVU-file example.djvu should contain our final result und should look like
Final colored DJVU file example.djvu
Final colored DJVU file example.djvu
example.jpg (23.57 KiB) Viewed 13713 times
The file size of this (ridiculous) example example_bw.djvu is 458 bytes. The file size of the color augmented result
example.djvu is 520 bytes. For more realistic file sizes, the increase in file size remains negligible.
Note that due to the implementation of djvumake each rectangle of the image increases the final file size by three bytes
(a RGB color palette entry in the FGbz chunk). The other increment in file size is due to a compressed (with BZZ) index table with
16bit entries into the above color palette for each encoded text symbol.

Note that it is not crucial that a text character is fully contained in a rectangle to obtain the respective color. But if there are
unconnected parts (like the dot of the letter "i" in our example) their bounding boxes must all intersect the corresponding rectangle.
If there are multiple rectangles intersecting the bounding box of a letter, the last rectangle wins.
Last edited by mhr on 17 Sep 2013, 10:42, edited 2 times in total.
mhr
Posts: 37
Joined: 07 May 2012, 10:12
E-book readers owned: onyx-boox-m92 sony-trs-t1
Number of books owned: 500
Country: Germany

Re: Howto manually add colored text to DJVU files

Post by mhr »

I just want to mention another trick which seems to work across the tools cjb2, cpaldjvu and minidjvu.
If You have a colored text (few colors, no rectangles, just like the final image of the last post), then You can encode it by cpaldjvu:

Code: Select all

cpaldjvu -dpi 100 example.ppm example_dummy.djvu
Then You extract the FGbz layer from the created file example_dummy.djvu:

Code: Select all

djvuextract example_dummy.djvu FGbz=example.bgzz
You also encode the pure black/white text with a good compressor like minidjvu in a multi page setup:

Code: Select all

minidjvu -d 100 --erosion --match --smooth -r -i example.tif page02.tif ... result_dummy.djvu
Note that the option -l or --lossy induces the option --clean and the latter option eliminates
small connected components, which in turn changes the order of connected components. But that is disastreous for
this approach. That means despeckling should be done prior to using cpaldjvu and minidjvu!

Now You augment the sample page with the old FGbz chunk:

Code: Select all

djvumake example_color.djvu INFO=,,100 Sjbz=example.djvu FGbz=example.fgbz
And finally You bundle the document with:

Code: Select all

djvm -c book.djvu example_color.djvu page02.djvu ...
but without result_dummy.djvu.

This trick seems to work for me. It is based on the assumption, that the order of the encoded symbols of
cpaldjvu and the other tools like minidjvu or cjb2 is always identical.
I don't know if this assumption is always valid. It may be the case if all these tools use the same library function to obtain
all connected components and stay with this order. In my (up to now very limited) tests I succeeded.

The assumption above will certainly break down, if a connected component should be colored by two or more colors. Therefore Logos etc. will be a problem!

In general beware to always check the resulting DJVU file for correctness!!

Of course all this can be automated by suitable scripts.
Post Reply