If You want to colorize black and white text of a DJVU document and You
are willing to do this manually, You can do so as described in the following discussion.
Howto manually add colored text to DJVU files
Moderator: peterZ
-
- Posts: 37
- Joined: 07 May 2012, 10:12
- E-book readers owned: onyx-boox-m92 sony-trs-t1
- Number of books owned: 500
- Country: Germany
Re: Howto manually add colored text to DJVU files
For this purpose we use a litte tool, which is contained in my PPM toolset.
The latter can be downloaded from http://www.diybookscanner.org/forum/vie ... =20&t=2915
and the used program is ppmrectlist.
The latter can be downloaded from http://www.diybookscanner.org/forum/vie ... =20&t=2915
and the used program is ppmrectlist.
-
- Posts: 37
- Joined: 07 May 2012, 10:12
- E-book readers owned: onyx-boox-m92 sony-trs-t1
- Number of books owned: 500
- Country: Germany
Re: Howto manually add colored text to DJVU files
Now there are several appraches to this task using the photo encoder c44 of the djvulibre tool chain.
This often looks a bit fuzzy and enlarges file size quie a bit. But the djvumake utility of djvulibre has
a fancy possibility to specify a colored rectangle list, which is used to create a FGbz layer, which contains the color information of each encoded symbol.
The utility ppmrectlist extracts colored rectangular areas from a PPM file and creates as output such a list in the format, djvumake expects.
I will give a tiny example now.
This often looks a bit fuzzy and enlarges file size quie a bit. But the djvumake utility of djvulibre has
a fancy possibility to specify a colored rectangle list, which is used to create a FGbz layer, which contains the color information of each encoded symbol.
The utility ppmrectlist extracts colored rectangular areas from a PPM file and creates as output such a list in the format, djvumake expects.
I will give a tiny example now.
-
- Posts: 37
- Joined: 07 May 2012, 10:12
- E-book readers owned: onyx-boox-m92 sony-trs-t1
- Number of books owned: 500
- Country: Germany
Re: Howto manually add colored text to DJVU files
Lets say we have a bitonal image example_bw.tif like
We do a conversion with our favorite DJVU/bw tool, e.g. minidjvu or cjb2, e.g.
to create the DJVU text mask example_bw.djvu.
To add color to this DJVU-document, load the image example_bw.tif into your favorite graphic editor.
I will use gimp. Now do the following tasks:
and fill the rectangle with the wanted color. I like to do this by specifying a background color and subsequently typing the delete key.
We may get a situations like
Now delete the text layer (or make it unvisible). Then save the image to a PPM file, e.g. example_rect.ppm.
This should look like
Then use my utility ppmrectlist from the above mentioned toolkit in Your favorite operating system:
This generates the rectangle color information we need. The content of the file example_rect.txt
looks like
Finally feed this file at the appropriate place into djvumake:
You can combine both programs within one call if You like:
If You are using Windows You have to specify the content of the file example_rect.txt instead of the last quoted terms.
Or You use cygwin with a unix shell. The quotes will execute the system command cat, which displays the
content of the file like type in windows and splice the result as text in the djvumake command.
Now the DJVU-file example.djvu should contain our final result und should look like
The file size of this (ridiculous) example example_bw.djvu is 458 bytes. The file size of the color augmented result
example.djvu is 520 bytes. For more realistic file sizes, the increase in file size remains negligible.
Note that due to the implementation of djvumake each rectangle of the image increases the final file size by three bytes
(a RGB color palette entry in the FGbz chunk). The other increment in file size is due to a compressed (with BZZ) index table with
16bit entries into the above color palette for each encoded text symbol.
Note that it is not crucial that a text character is fully contained in a rectangle to obtain the respective color. But if there are
unconnected parts (like the dot of the letter "i" in our example) their bounding boxes must all intersect the corresponding rectangle.
If there are multiple rectangles intersecting the bounding box of a letter, the last rectangle wins.
We do a conversion with our favorite DJVU/bw tool, e.g. minidjvu or cjb2, e.g.
Code: Select all
cjb2 -dpi 100 -lossy example_bw.tif example_bw.djvu
To add color to this DJVU-document, load the image example_bw.tif into your favorite graphic editor.
I will use gimp. Now do the following tasks:
- Change image mode to RGB.
- Swap black/white with color inversion.
- Make this text layer transparent.
- Delete the now white non text area.
- Add a new non-transparent layer filled with black below this text layer and select it for editing.
and fill the rectangle with the wanted color. I like to do this by specifying a background color and subsequently typing the delete key.
We may get a situations like
Now delete the text layer (or make it unvisible). Then save the image to a PPM file, e.g. example_rect.ppm.
This should look like
Then use my utility ppmrectlist from the above mentioned toolkit in Your favorite operating system:
Code: Select all
ppmrectlist example_rect.ppm > example_rect.txt
looks like
Code: Select all
#4be1a3:111,184,52,33#df5140:63,72,109,50#636ed7:117,13,128,52
Code: Select all
djvumake example.djvu INFO=,,100 Sjbz=example_bw.djvu FGbz=`cat example_rect.txt`
Code: Select all
djvumake example.djvu INFO=,,100 Sjbz=example_bw.djvu FGbz=`ppmrectlist example_rect.ppm`
Or You use cygwin with a unix shell. The quotes will execute the system command cat, which displays the
content of the file like type in windows and splice the result as text in the djvumake command.
Now the DJVU-file example.djvu should contain our final result und should look like
The file size of this (ridiculous) example example_bw.djvu is 458 bytes. The file size of the color augmented result
example.djvu is 520 bytes. For more realistic file sizes, the increase in file size remains negligible.
Note that due to the implementation of djvumake each rectangle of the image increases the final file size by three bytes
(a RGB color palette entry in the FGbz chunk). The other increment in file size is due to a compressed (with BZZ) index table with
16bit entries into the above color palette for each encoded text symbol.
Note that it is not crucial that a text character is fully contained in a rectangle to obtain the respective color. But if there are
unconnected parts (like the dot of the letter "i" in our example) their bounding boxes must all intersect the corresponding rectangle.
If there are multiple rectangles intersecting the bounding box of a letter, the last rectangle wins.
Last edited by mhr on 17 Sep 2013, 10:42, edited 2 times in total.
-
- Posts: 37
- Joined: 07 May 2012, 10:12
- E-book readers owned: onyx-boox-m92 sony-trs-t1
- Number of books owned: 500
- Country: Germany
Re: Howto manually add colored text to DJVU files
I just want to mention another trick which seems to work across the tools cjb2, cpaldjvu and minidjvu.
If You have a colored text (few colors, no rectangles, just like the final image of the last post), then You can encode it by cpaldjvu:
Then You extract the FGbz layer from the created file example_dummy.djvu:
You also encode the pure black/white text with a good compressor like minidjvu in a multi page setup:
Note that the option -l or --lossy induces the option --clean and the latter option eliminates
small connected components, which in turn changes the order of connected components. But that is disastreous for
this approach. That means despeckling should be done prior to using cpaldjvu and minidjvu!
Now You augment the sample page with the old FGbz chunk:
And finally You bundle the document with:
but without result_dummy.djvu.
This trick seems to work for me. It is based on the assumption, that the order of the encoded symbols of
cpaldjvu and the other tools like minidjvu or cjb2 is always identical.
I don't know if this assumption is always valid. It may be the case if all these tools use the same library function to obtain
all connected components and stay with this order. In my (up to now very limited) tests I succeeded.
The assumption above will certainly break down, if a connected component should be colored by two or more colors. Therefore Logos etc. will be a problem!
In general beware to always check the resulting DJVU file for correctness!!
Of course all this can be automated by suitable scripts.
If You have a colored text (few colors, no rectangles, just like the final image of the last post), then You can encode it by cpaldjvu:
Code: Select all
cpaldjvu -dpi 100 example.ppm example_dummy.djvu
Code: Select all
djvuextract example_dummy.djvu FGbz=example.bgzz
Code: Select all
minidjvu -d 100 --erosion --match --smooth -r -i example.tif page02.tif ... result_dummy.djvu
small connected components, which in turn changes the order of connected components. But that is disastreous for
this approach. That means despeckling should be done prior to using cpaldjvu and minidjvu!
Now You augment the sample page with the old FGbz chunk:
Code: Select all
djvumake example_color.djvu INFO=,,100 Sjbz=example.djvu FGbz=example.fgbz
Code: Select all
djvm -c book.djvu example_color.djvu page02.djvu ...
This trick seems to work for me. It is based on the assumption, that the order of the encoded symbols of
cpaldjvu and the other tools like minidjvu or cjb2 is always identical.
I don't know if this assumption is always valid. It may be the case if all these tools use the same library function to obtain
all connected components and stay with this order. In my (up to now very limited) tests I succeeded.
The assumption above will certainly break down, if a connected component should be colored by two or more colors. Therefore Logos etc. will be a problem!
In general beware to always check the resulting DJVU file for correctness!!
Of course all this can be automated by suitable scripts.