Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Introducing djvubind for djvu file creation

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.
Lazy_Kent
Posts: 37
Joined: 26 Oct 2010, 10:06
Number of books owned: 0
Location: Moscow

Re: Introducing djvubind for djvu file creation

Post by Lazy_Kent » 30 Oct 2010, 10:25

strider1551, could you please change "/usr/etc" to "/etc"?
Building packages for openSUSE I got error:

Code: Select all

... checking filelist
djvubind: "/usr/etc" is not allowed anymore in FHS 2.2.
djvubind: "/usr/etc/djvubind" is not allowed anymore in FHS 2.2.
djvubind: "/usr/etc/djvubind/config" is not allowed anymore in FHS 2.2.
Thanks.

User avatar
strider1551
Posts: 126
Joined: 01 Mar 2010, 11:39
Number of books owned: 0
Location: Ohio, USA

Re: Introducing djvubind for djvu file creation

Post by strider1551 » 30 Oct 2010, 10:46

Someone else caught that horrifically embarrassing mistake last night. I adjusted the distutils script (setup.py) and released version 1.0.1 a little bit ago this morning.

Nothing like writing software to teach you some humility time and time again.

Lazy_Kent
Posts: 37
Joined: 26 Oct 2010, 10:06
Number of books owned: 0
Location: Moscow

Re: Introducing djvubind for djvu file creation

Post by Lazy_Kent » 31 Oct 2010, 08:38

1. I can't disable OCR.
User config:

Code: Select all

cores = 2
ocr = False
ocr_engine = cuneiform
cuneiform_options = -l ruseng
tesseract_options =
bitonal_encoder = minidjvu
color_encoder = csepdjvu
c44_options =
cjb2_options = -lossy
csepdjvu_options =
minidjvu_options = --dpi 300 --pages-per-dict 80 --verbose

Code: Select all

% djvubind -v --no-ocr
djvubind version 1.0.1
Executing with these parameters:
{'ocr_engine': 'cuneiform', 'tesseract_options': '', 'verbose': True, 'cjb2_options': '-lossy', 'cuneiform_options': '-l ruseng', 'bitonal_encoder': 'minidjvu', 'color_encoder': 'csepdjvu', 'ocr': False, 'quiet': False, 'minidjvu_options': '--dpi 300 --pages-per-dict 80 --verbose', 'win_path': 'C:\\Program Files\\DjVuZone\\DjVuLibre\\;C:\\Program Files\\Tesseract-OCR;C:\\Program Files\\ImageMagick-6.6.5-Q16', 'cores': 2, 'csepdjvu_options': '', 'c44_options': ''}

* Collecting files to be processed.
* Analyzing image information.
  Spawning 2 processing threads.
* Performing optical character recognition.
  Spawning 2 processing threads.
* Encoding all information to book.djvu.
2. While recognizing the program removes output hocr-files. So there is no text in book.djvu. Tested with Tesseract 3.00 and Cuneiform 1.0.0.
Python3 3.0 and 3.1, djvulibre 3.5.21 and 3.5.22.
Attachments
book.djvu.zip
Output book.
(335.6 KiB) Downloaded 294 times
scans.zip
Input scans.
(1.51 MiB) Downloaded 379 times

User avatar
strider1551
Posts: 126
Joined: 01 Mar 2010, 11:39
Number of books owned: 0
Location: Ohio, USA

Re: Introducing djvubind for djvu file creation

Post by strider1551 » 31 Oct 2010, 11:12

1.
Thanks for catching that. I reworked a lot of the code recently and that fell through the cracks. It's now fixed in the repository, and a 1.0.2 release should be later this week.

2.
This seems to be the same as issue 15. I've been developing with cuneiform-0.8.0, since that is the latest available from Gentoo's Portage. cuneiform-0.9.0 and above changed the format of their .hocr files, and so my parser doesn't know how to read these newer versions. I have some examples of the new format and just need to find the time to sit down and work with them. In the meantime you could switch the ocr-engine to tesseract, which should still work (it only switches automatically if cuneiform crashes, not when it gives output I didn't expect).

Lazy_Kent
Posts: 37
Joined: 26 Oct 2010, 10:06
Number of books owned: 0
Location: Moscow

Re: Introducing djvubind for djvu file creation

Post by Lazy_Kent » 31 Oct 2010, 12:27

I think there is another problem with inserting text layer.
Trying tesseract engine:

Code: Select all

% djvubind --ocr-engine=tesseract --tesseract-options="-l rus" -v
djvubind version 1.0.1
Executing with these parameters:
{'ocr_engine': 'tesseract', 'tesseract_options': '-l rus', 'verbose': True, 'cjb2_options': '-lossy', 'cuneiform_options': '-l ruseng', 'bitonal_encoder': 'minidjvu', 'color_encoder': 'csepdjvu', 'ocr': False, 'quiet': False, 'minidjvu_options': '--dpi 300 --pages-per-dict 80 --verbose', 'win_path': 'C:\\Program Files\\DjVuZone\\DjVuLibre\\;C:\\Program Files\\Tesseract-OCR;C:\\Program Files\\ImageMagick-6.6.5-Q16', 'cores': 2, 'csepdjvu_options': '', 'c44_options': ''}

* Collecting files to be processed.
* Analyzing image information.
  Spawning 2 processing threads.
* Performing optical character recognition.
  Spawning 2 processing threads.
* Encoding all information to book.djvu.
The same time monitoring working directory:

Code: Select all

% inotifywait -m -r --format '%:e %f' tst
...
CREATE 155_box.box
OPEN 155_box.box
MODIFY 155_box.box
MODIFY 155_box.box
CLOSE_WRITE:CLOSE 155_box.box
...
CLOSE_NOWRITE:CLOSE 155.tif
CREATE 155_txt.txt
OPEN 155_txt.txt
MODIFY 155_txt.txt
MODIFY 155_txt.txt
CLOSE_WRITE:CLOSE 155_txt.txt
OPEN 155_box.box
ACCESS 155_box.box
CLOSE_NOWRITE:CLOSE 155_box.box
OPEN 155_txt.txt
ACCESS 155_txt.txt
CLOSE_NOWRITE:CLOSE 155_txt.txt
DELETE 155_box.box
DELETE 155_txt.txt
...
CLOSE_NOWRITE:CLOSE 157.tif
CREATE enc_temp.djvu
OPEN enc_temp.djvu
MODIFY enc_temp.djvu
...
As far as I can see tesseract output files were deleted before starting djvu encoding. The whole log attached.
Attachments
inotify.log
Inotify log.
(8.77 KiB) Downloaded 410 times

User avatar
strider1551
Posts: 126
Joined: 01 Mar 2010, 11:39
Number of books owned: 0
Location: Ohio, USA

Re: Introducing djvubind for djvu file creation

Post by strider1551 » 01 Nov 2010, 13:10

As far as I can see tesseract output files were deleted before starting djvu encoding.
Yes, that is what happens. All the files are read, parsed, and formatted into the djvused format. That information is then kept internally in a variable.

I do not have the russian language files for tesseract, but just running with "-l eng" produced a normal (albeit completely incorrect) text layer. If you could give me the tesseract output files, I'll take a look and see if something language/encoding related in them is messing up the parser:

Code: Select all

tesseract "input.tif" "out_box" -l rus batch makebox
tesseract "input.tif" "out_txt" -l rus batch

Lazy_Kent
Posts: 37
Joined: 26 Oct 2010, 10:06
Number of books owned: 0
Location: Moscow

Re: Introducing djvubind for djvu file creation

Post by Lazy_Kent » 01 Nov 2010, 14:19

Doesn't work for me even with "-l eng".
Russian output files attached.
Attachments
tesseract_out.zip
Tesseract output files.
(117 KiB) Downloaded 284 times

Lazy_Kent
Posts: 37
Joined: 26 Oct 2010, 10:06
Number of books owned: 0
Location: Moscow

Re: Introducing djvubind for djvu file creation

Post by Lazy_Kent » 04 Nov 2010, 03:47

strider1551
It works now. Thanks.
May I set custom DPI?

User avatar
strider1551
Posts: 126
Joined: 01 Mar 2010, 11:39
Number of books owned: 0
Location: Ohio, USA

Re: Introducing djvubind for djvu file creation

Post by strider1551 » 04 Nov 2010, 06:46

That's great news, glad it works.

djvubind will determine the resolution of the images on its own and give it to the encoder. If you also specify a resolution in an encoder option in the config file, it will end up running a command like "cjb2 -dpi 300 -lossy -dpi 400 image.tif". I have no idea what happens when you pass along two resolutions; it may take the second one and use that, or it may crash and complain. Hence, the config file recommends you don't specify a resolution.

So if the images themselves have the correct resolution, there's no need to set it in djvubind. If they have an incorrect resolution, you can take your chances and see what happens, or you can fix the images with ImageMagick (-density, I believe)

Images coming from scantailor should have the correct resolution.

caudwell
Posts: 2
Joined: 15 Sep 2010, 19:45

Re: Introducing djvubind for djvu file creation

Post by caudwell » 23 Nov 2010, 22:44

Got this working on Ubuntu 10.04, it's great!

Thanks strider!

Post Reply