Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.
Forgreforn
Posts: 1
Joined: 13 Apr 2017, 11:09
E-book readers owned: 0
Number of books owned: 3
Country: United States
Location: buffalo new york

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by Forgreforn » 21 Apr 2017, 05:35

Thanks a lot for the source code, really appreciate your effort! :D

Konos93a
Posts: 110
Joined: 19 Sep 2016, 10:00
E-book readers owned: kobo aura,kindle 1
Number of books owned: 3000
Country: greece

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by Konos93a » 13 Oct 2017, 13:27

it took me 2 days to figure it out but this is my method with finereader 12 and DjvuSmal 0.4.4

1 export tiff files from scantailor experimental
2 make a djvu file with DjvuSmall (lets name it test1.djvu)
3 drag n drop test1.djvu in abbyyfinereader with this settings after read and save a djvu (lets name it test3.djvu) with this settingsImage
4 with abbyfinereader save a pdf(lets name it test2.pdf) with this settings
Image
5 with handyoutlinerfor djvu add and edit boomarks from pdf to djvu like this Image

done we have a djvu with ocr and boomarks but for 500 pages a black n white tiff from 6 mb go to 8
and one with mixed tiff 600 pages go from 30 to 45 mb

b0bcat
Posts: 36
Joined: 30 Nov 2012, 21:37
Number of books owned: 0
Country: UK

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by b0bcat » 12 Feb 2018, 13:37

I just added an update to the DRAFT MUG's GUIDE at viewtopic.php?f=19&t=2759
and as a lot actually relates to DjVu format it occurred to me to copy the relevant part here which I now do as follows:
As will be seen from that thread, using free tools (MS Windows) one can not only create rather small DjVu files using even greyscale tiff files with greater definition (e.g. 400dpi up if needed) but a searchable sub-image text layer can be included, again using free software as therein shown. And the problem of creating a DjVu file including one or more pages of mixed text and pictures (otherwise resulting sometimes in artifacts spoiling the picture part) can be worked round by saving such page(s) e.g. in DjVuSolo as photo (as opposed to e.g. scanned, perfect) and then substituting them (for any such pages that may have been DjVu encoded as 'scanned' or other default modes) using the Edit function of e.g. DjVuToy. (This workaround being an inferior means to a similar (in result) end as the djvu_imager and djvu_small application suite[*], which I found had more steps to learn before practical implementation).

[*] http://www.djvu-soft.narod.ru/scan/djvu_imager_en.htm

I don't know if licensing issues affect the permitted use of DjVu format by e.g. archive.org but their Luradoc compressed pdf files I find are a very inferior substitute; even a good multi-format reader like SumatraPDF stalls and halts in page turning while it labours to decompress whereas in my experience DjVu files scroll smoothly without such hesitation.

Last, I find DjVu metadata can now be viewed/edited using an MS Windows explorer extension:
https://www.cuminas.jp/en/downloads
DjVu Shell Extension Pack

I haven't yet tested whether Phil Harvey's updated exiftool can operate likewise on a DjVu file:
https://www.sno.phy.queensu.ca/~phil/exiftool/

b0bcat
Posts: 36
Joined: 30 Nov 2012, 21:37
Number of books owned: 0
Country: UK

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by b0bcat » 05 Sep 2018, 18:39

How time flies... my existing installed version of gImageReader (Windows 7) is v.3.2.3 and going to
https://github.com/manisandro/gImageReader/releases
I see it's now up to v.3.2.99 (Feb 24 2018) the change list for which includes among other items:
"Add support for reading DJVU documents".
I haven't installed the newer version which is stated to be intended as a beta for 3.3.0 so I don't know if this new functionality means a hidden text layer can also be written to the input djvu file - very nice if it can be.
The other desired improvement I've had in mind is the ability to save the project rather than just the output (e.g. where checking the ocr accuracy and editing errors remains incomplete between sessions). Must review and see if that's since been included or is in prospect.

Konos93a
Posts: 110
Joined: 19 Sep 2016, 10:00
E-book readers owned: kobo aura,kindle 1
Number of books owned: 3000
Country: greece

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by Konos93a » 06 Sep 2018, 03:16


b0bcat
Posts: 36
Joined: 30 Nov 2012, 21:37
Number of books owned: 0
Country: UK

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by b0bcat » 18 Sep 2018, 17:20

Sadly my old version 9 of ABBYY FineReader can read but not write DjVu format and I'm too tight to upgrade :|

Interesting to see that @5m of the video you open the djvu file with what looks like WinDjView:
https://sourceforge.net/projects/windjview

- running that program I find in its "Edit" menu a sub-item: "Add Bookmark". I haven't had to create an ebook for quite a long while but taking a random djvu file and first making a copy, I then played with the menu item and added a few specimen bookmarks - the methodology being the same as in my antique copy of Adobe Acrobat Pro, viz, place mouse cursor in location on desired page in DjVu reader's mainscreen, open the "Edit" menu, select "Add Bookmark", type in the description and OK. (WinDjView appears to save the bookmarks without being told).

WinDjView also allows exporting the bookmarks it's created but I'm wondering if its a format specific to that program as opening the same djvu file in DjVuLibre's DjVu no bookmarks were visible (I'm assuming it can ordinarily read bookmarks) nor in SumatraPDF which also reads DjVu - being a portable install here maybe that affected it. (Nor also a couple of DjVu readers on Android tablet). Anyway, if the WinDjView bookmarks export file isn't standard format (though it seems to be xml) it might be worth tinkering briefly to see if changing the header or other particulars makes it universal-approved then re-importing; currently trying to import (into a bookmarks-empty third copy of the same djvu file) WinDjView's exported bookmarks using HandyOutliner I get the error message "The root element must be OutlineRoot or BookmarkExport." Could be worth brief experimentation, as creating bookmarks manually in this way seems slightly more intuitive for the inexpert and workable at least for a small book without needing to learn another program.

EDIT: I just found this: https://sourceforge.net/p/djvuoutline/
"A program for easy creating and editing outline (bookmarks, contents) in djvu books. It maintains formulas for recalculating page numbers (e. g. for books with photo insets or with double pages). Also multiple-book outline can be easily created."

Konos93a
Posts: 110
Joined: 19 Sep 2016, 10:00
E-book readers owned: kobo aura,kindle 1
Number of books owned: 3000
Country: greece

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by Konos93a » 19 Sep 2018, 13:50

i use handyoutliner to mage djvu files with bookmarks so i can see how many pages till the end of the chapter i have in a jailbroken kindle with koreader

1 open an ocr pdf/djvu/txt/odt and copy text from the table of contect Image



2 create a new txt file and open notepad++ paste text there
Image



3 edit it so you can have only title and numbers like that
Image

4then use handyoutliner to create a file with bookmarks
Image

5 write outline and done
Image

in windjvu if you press ctrl+, you can change display colors Image

Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests