Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Introducing djvubind for djvu file creation

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.
gsloop

Re: Introducing djvubind for djvu file creation

Post by gsloop » 31 Mar 2011, 15:09

Sure, I'd love a GUI.

But I can also do just fine with a CL program too.

So, if it's just for me, it seems like a lot of work for not much reward [at least for the "masses"]. :)

My hope is that we can create a fairly turn-key system that will handle most everything and won't need to cost a bunch. [In this case, it would all be "free" in both senses of the word - which is even better.]

So, I'm working on a FC14 vmware image. [The part I'm most unclear about is how to use the "shared" folders feature in VMware in a Linux VM. I don't want to have to pull the images into the VM environment/disk - I want to use files outside the VM and write the result outside the VM too...]

I'll let you all know how it goes - I'd guess we'll see more use with something that's more turn-key.

-Greg

gsloop

Re: Introducing djvubind for djvu file creation

Post by gsloop » 01 Apr 2011, 20:09

Ok, I just did my first DJVU with djvubind on my FC14 VM.

Not bad - however a few questions.

I'm probably mostly going to be doing PDF and/or DJVU for myself, but I also have a Kindle and others with Nooks etc.

So, epub and mobi output would be nice. What is the best way to go about that with djvubind? Is it possible? Also, when it did the djvu it still didn't do OCR even though I have tesseract installed. [I didn't pass it any config params - just ran it in the directory with the TIFFS - but it appeared by the config that it _should_ have done OCR...)

I know, I should read the docs - even as thin as they are - but could anyone give me some pointers?

-Greg

User avatar
strider1551
Posts: 126
Joined: 01 Mar 2010, 11:39
Number of books owned: 0
Location: Ohio, USA

Re: Introducing djvubind for djvu file creation

Post by strider1551 » 02 Apr 2011, 16:34

gsloop wrote: So, epub and mobi output would be nice. What is the best way to go about that with djvubind? Is it possible?
Nope. I have no experience with epub or mobi formats, and even then, djvubind is not a generic image-to-format kind of tool. It's only purpose is to tie together existing tools to easily create high quality djvu files.
gsloop wrote: Also, when it did the djvu it still didn't do OCR even though I have tesseract installed.
Now that doesn't sound right at all. Just so we're on the same page, the ocr won't be a separate file... it will be embedded in the djvu file itself - so if you open the djvu file and can search through it, then it did ocr. If that isn't the case, I would ask that you open an issue on the tracker and I can get more specifics from you there.

steve009

Re: Introducing djvubind for djvu file creation

Post by steve009 » 25 Dec 2011, 21:50

I've been experimenting with djvubind. There is a few issues I ran into sometimes. First, when I had blank pages the process would hang. After I inserted small "slashes" into the page, it seems to fix the problem. Another thing is I had some pages where the image is typography and graphics. I think it might have confused the ocr program. The program exits with error status 1. Is there a way to explicitly exclude certain pages from ocr on the command line?

User avatar
strider1551
Posts: 126
Joined: 01 Mar 2010, 11:39
Number of books owned: 0
Location: Ohio, USA

Re: Introducing djvubind for djvu file creation

Post by strider1551 » 26 Dec 2011, 12:07

steve009 wrote:I've been experimenting with djvubind. There is a few issues I ran into sometimes.
I am glad to help. If you wouldn't mind, file these on the tracker as two separate issues. The bug report form will prompt some more specific information that will help me to understand what the situation is, you can attach an image for me to see if I can duplicate the bug on my own machines, I get notified much quicker than posts here, and the forums here don't get mucked up as much.
steve009 wrote:First, when I had blank pages the process would hang. After I inserted small "slashes" into the page, it seems to fix the problem.
I have processed images that are blank and are pictures of blank pages before. It could be something about that specific image, possibly in combination with whatever ocr program, version, OS, etc.
steve009 wrote:Another thing is I had some pages where the image is typography and graphics. I think it might have confused the ocr program. The program exits with error status 1. Is there a way to explicitly exclude certain pages from ocr on the command line?
No, the only option is to turn off ocr for everything. Again, I've done pages with images and text, so I need the bug report with some more specifics about your environment.

steve009

Re: Introducing djvubind for djvu file creation

Post by steve009 » 15 Jan 2012, 02:32

Thanks for the reply. I'm quite happy with the results I am getting, I'm using Arch linux. djvubind is in the Arch user repository. Actually, right now I am processing a book. I used cuneiform. Every so often there is a buffer overflow, and then the ocr switches to tesseract. My computer is a laptop, dual core intel 1.3 GHz with 4 GB RAM. The problem might just be that my computer is slow. So far I can get the processing to finish after many hours. I will post any issues on the tracker if they occur.

Anonymous2
Posts: 97
Joined: 18 Oct 2011, 16:05

Re: Introducing djvubind for djvu file creation

Post by Anonymous2 » 15 Jan 2012, 15:59

Hello fellow Archer ;)

Cuneiform constantly segfaults and crashes for me, so I think it might be cuneiform's fault, not djvubind's. If you don't have it already, get tesseract-svn from the AUR. It has page layout analysis (but I have no idea how to use it) and produces good results for me, even for pre-1900 books.

Anonymous2
Posts: 97
Joined: 18 Oct 2011, 16:05

Re: Introducing djvubind for djvu file creation

Post by Anonymous2 » 15 Jan 2012, 16:02

While I'm here, @strider1551, would it be too much work for you to implement a dummy progress() function for djvubind?

The only reason I can't include djvubind as a module in Bindery is because it doesn't have progress reporting, which is a core component of mine. If you want, I could write up a patch for you.

User avatar
strider1551
Posts: 126
Joined: 01 Mar 2010, 11:39
Number of books owned: 0
Location: Ohio, USA

Re: Introducing djvubind for djvu file creation

Post by strider1551 » 16 Jan 2012, 07:49

Anonymous2 wrote: Cuneiform constantly segfaults and crashes for me, so I think it might be cuneiform's fault, not djvubind's.
Yes, that is definitely cuneiform's fault. From my experience, cuneiform crashes upwards of 90% of the images I give it on a 64-bit Linux platform (32-bit does much better, but not perfect) - and that's calling it on its own outside of djvubind. I have no love for that program for that and a few other reasons, but people here requested its inclusion and supposedly it does better with non-English languages than tesseract does.
Anonymous2 wrote: While I'm here, @strider1551, would it be too much work for you to implement a dummy progress() function for djvubind? The only reason I can't include djvubind as a module in Bindery is because it doesn't have progress reporting, which is a core component of mine. If you want, I could write up a patch for you.
Yeah, go ahead and make a patch so I can see what exactly you need.

User avatar
strider1551
Posts: 126
Joined: 01 Mar 2010, 11:39
Number of books owned: 0
Location: Ohio, USA

Re: Introducing djvubind for djvu file creation

Post by strider1551 » 04 Mar 2012, 10:05

I just put up the next stable version of djvubind (1.2.0), which can be downloaded from the project's download page. A debian package will not be available until sometime after March 11th, because I'm away from my desktop and the virtual machine that I create/test the debian package in.

I strongly recommend taking the time to upgrade, especially because tessseract-3.01 would crash djvubind on blank images prior to this release. As always, bug reports and feature requests can be submitted to the issue tracker. I'm always paranoid that I did something stupid just before a release, so please don't hesitate to report bugs.

...and thanks for the support. The last version was out for a year and had 773 downloads, which blows my mind.

Post Reply