Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.

Moderator: peterZ

Forgreforn
Posts: 1
Joined: 13 Apr 2017, 11:09
E-book readers owned: 0
Number of books owned: 3
Country: United States
Location: buffalo new york

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by Forgreforn »

Thanks a lot for the source code, really appreciate your effort! :D
Konos93a
Posts: 186
Joined: 19 Sep 2016, 10:00
E-book readers owned: kobo aura,kindle 1,kindle pw3,pocketbook inkpad 2
Number of books owned: 3000
Country: greece

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by Konos93a »

it took me 2 days to figure it out but this is my method with finereader 12 and DjvuSmal 0.4.4

1 export tiff files from scantailor experimental
2 make a djvu file with DjvuSmall (lets name it test1.djvu)
3 drag n drop test1.djvu in abbyyfinereader with this settings after read and save a djvu (lets name it test3.djvu) with this settingsImage
4 with abbyfinereader save a pdf(lets name it test2.pdf) with this settings
Image
5 with handyoutlinerfor djvu add and edit boomarks from pdf to djvu like this Image

done we have a djvu with ocr and boomarks but for 500 pages a black n white tiff from 6 mb go to 8
and one with mixed tiff 600 pages go from 30 to 45 mb
b0bcat
Posts: 49
Joined: 30 Nov 2012, 21:37
Number of books owned: 0
Country: UK

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by b0bcat »

I just added an update to the DRAFT MUG's GUIDE at viewtopic.php?f=19&t=2759
and as a lot actually relates to DjVu format it occurred to me to copy the relevant part here which I now do as follows:
As will be seen from that thread, using free tools (MS Windows) one can not only create rather small DjVu files using even greyscale tiff files with greater definition (e.g. 400dpi up if needed) but a searchable sub-image text layer can be included, again using free software as therein shown. And the problem of creating a DjVu file including one or more pages of mixed text and pictures (otherwise resulting sometimes in artifacts spoiling the picture part) can be worked round by saving such page(s) e.g. in DjVuSolo as photo (as opposed to e.g. scanned, perfect) and then substituting them (for any such pages that may have been DjVu encoded as 'scanned' or other default modes) using the Edit function of e.g. DjVuToy. (This workaround being an inferior means to a similar (in result) end as the djvu_imager and djvu_small application suite[*], which I found had more steps to learn before practical implementation).

[*] http://www.djvu-soft.narod.ru/scan/djvu_imager_en.htm

I don't know if licensing issues affect the permitted use of DjVu format by e.g. archive.org but their Luradoc compressed pdf files I find are a very inferior substitute; even a good multi-format reader like SumatraPDF stalls and halts in page turning while it labours to decompress whereas in my experience DjVu files scroll smoothly without such hesitation.

Last, I find DjVu metadata can now be viewed/edited using an MS Windows explorer extension:
https://www.cuminas.jp/en/downloads
DjVu Shell Extension Pack

I haven't yet tested whether Phil Harvey's updated exiftool can operate likewise on a DjVu file:
https://www.sno.phy.queensu.ca/~phil/exiftool/
b0bcat
Posts: 49
Joined: 30 Nov 2012, 21:37
Number of books owned: 0
Country: UK

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by b0bcat »

How time flies... my existing installed version of gImageReader (Windows 7) is v.3.2.3 and going to
https://github.com/manisandro/gImageReader/releases
I see it's now up to v.3.2.99 (Feb 24 2018) the change list for which includes among other items:
"Add support for reading DJVU documents".
I haven't installed the newer version which is stated to be intended as a beta for 3.3.0 so I don't know if this new functionality means a hidden text layer can also be written to the input djvu file - very nice if it can be.
The other desired improvement I've had in mind is the ability to save the project rather than just the output (e.g. where checking the ocr accuracy and editing errors remains incomplete between sessions). Must review and see if that's since been included or is in prospect.
Konos93a
Posts: 186
Joined: 19 Sep 2016, 10:00
E-book readers owned: kobo aura,kindle 1,kindle pw3,pocketbook inkpad 2
Number of books owned: 3000
Country: greece

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by Konos93a »

b0bcat
Posts: 49
Joined: 30 Nov 2012, 21:37
Number of books owned: 0
Country: UK

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by b0bcat »

Sadly my old version 9 of ABBYY FineReader can read but not write DjVu format and I'm too tight to upgrade :|

Interesting to see that @5m of the video you open the djvu file with what looks like WinDjView:
https://sourceforge.net/projects/windjview

- running that program I find in its "Edit" menu a sub-item: "Add Bookmark". I haven't had to create an ebook for quite a long while but taking a random djvu file and first making a copy, I then played with the menu item and added a few specimen bookmarks - the methodology being the same as in my antique copy of Adobe Acrobat Pro, viz, place mouse cursor in location on desired page in DjVu reader's mainscreen, open the "Edit" menu, select "Add Bookmark", type in the description and OK. (WinDjView appears to save the bookmarks without being told).

WinDjView also allows exporting the bookmarks it's created but I'm wondering if its a format specific to that program as opening the same djvu file in DjVuLibre's DjVu no bookmarks were visible (I'm assuming it can ordinarily read bookmarks) nor in SumatraPDF which also reads DjVu - being a portable install here maybe that affected it. (Nor also a couple of DjVu readers on Android tablet). Anyway, if the WinDjView bookmarks export file isn't standard format (though it seems to be xml) it might be worth tinkering briefly to see if changing the header or other particulars makes it universal-approved then re-importing; currently trying to import (into a bookmarks-empty third copy of the same djvu file) WinDjView's exported bookmarks using HandyOutliner I get the error message "The root element must be OutlineRoot or BookmarkExport." Could be worth brief experimentation, as creating bookmarks manually in this way seems slightly more intuitive for the inexpert and workable at least for a small book without needing to learn another program.

EDIT: I just found this: https://sourceforge.net/p/djvuoutline/
"A program for easy creating and editing outline (bookmarks, contents) in djvu books. It maintains formulas for recalculating page numbers (e. g. for books with photo insets or with double pages). Also multiple-book outline can be easily created."
Konos93a
Posts: 186
Joined: 19 Sep 2016, 10:00
E-book readers owned: kobo aura,kindle 1,kindle pw3,pocketbook inkpad 2
Number of books owned: 3000
Country: greece

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by Konos93a »

i use handyoutliner to mage djvu files with bookmarks so i can see how many pages till the end of the chapter i have in a jailbroken kindle with koreader

1 open an ocr pdf/djvu/txt/odt and copy text from the table of contect Image



2 create a new txt file and open notepad++ paste text there
Image



3 edit it so you can have only title and numbers like that
Image

4then use handyoutliner to create a file with bookmarks
Image

5 write outline and done
Image

in windjvu if you press ctrl+, you can change display colors Image
b0bcat
Posts: 49
Joined: 30 Nov 2012, 21:37
Number of books owned: 0
Country: UK

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by b0bcat »

Another way to create an outline/set of bookmarks for a DjVu file without FineReader and without needing any intermediate PDF format is manually, per bookmark (easy for a small number of bookmarks):

1) open the DjVu file in STDU Viewer:
http://www.stdutility.com/index.html
2) in main pane select text and right click to add bookmark. Amend as desired and click OK.
3) repeat as necessary.
4) counter-intuitively, saving as copy or just closing the DjVu file does not seem to embed the bookmarks just added, you need to export them to file (xml) which does create the right format which Handy Outliner can then read. [Feature request to STDU Viewer developer: can you add option to embed bookmarks so created direct into the DjVu file?]
5) run Handy Outliner and select (a) the DjVu file and (b) the xml bookmarks file exported from STDU Viewer. However without more, after writing to the DjVu file this will result typically in many misplaced page references - on my limited testing I deduce STDU Viewer adds (or calculates by reference to) the printed page number rather than the "physical"/sequential page number of the DjVu file (e.g. if a book has covers and blank pages its printed page numbers will not correspond with the "physical" page numbers). Solution: before writing to DjVu file, either (a) open the xml file in a text editor and go through all page references to check they are correct against an open copy of the DjVu file in a DjVu reader and amend as necessary; or (b) make the necessary page adjustments in Handy Outliner itself (which I haven't yet explored) then in either case, final step:
6) in Handy Outliner click Write Outline and after a few seconds a confirmation splash screen comes up confirming bookmarks written to the DjVu file. Then open the DjVu file and test bookmarks working correctly.

You can also clean out unwanted bookmarks by e.g. loading the DjVu file into djvu Bookmark Tool 2.0,
https://sourceforge.net/projects/windjv ... 0Tool/2.0/
- select remove bookmark and save OK, then then start again.

For such a basic objective this methodology is still somewhat convoluted, yet it does seem that e.g. STDU Viewer, DjVuLibre/DjView, WinDjView themselves don't support creating AND embedding bookmarks/the DjVu outline, although WinDjView's accompanying 'djvu Bookmark Tool 2.0' comes close to forming an integral solution. As for e.g. DjVuToy I can't find any bookmark functionality although it handles annotations/hidden text.
b0bcat
Posts: 49
Joined: 30 Nov 2012, 21:37
Number of books owned: 0
Country: UK

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by b0bcat »

This doesn't really merit a reply-to-self but absent a visible means of editing my previous post to add this, here goes...

1) writing to DjVu (or pdf) the outline/bookmarks output by STDUViewer: having tested a little further it seems this almost universally, maybe even always, can be fixed easily in HandyOutliner by pressing the ± (plus sign above minus sign) icon in the GUI fascia: 'Outline' and advancing the offset from 0 to 1 (or as applicable though I find that it's always been 1 so far ), and selecting 'Scope' as 'all nodes' and pressing 'Accept', then back in main GUI, saving the changed outline before writing to the DjVu or pdf (although I guess it will still work without saving the changed outline file first).

2) sometimes HandyOutliner does not issue an error message but fails to write the outline to a target DjVu file and this does not seem to depend solely or at all on the number of bookmark entries in the outline. If so, one possible solution again under MS windows is "DjVuoutline" version 1.1":
http://djvu-soft.narod.ru/soft/
http://djvu-soft.narod.ru/soft/djvuoutline_v1_1.rar
vadim.flyagin [~at~] gmail.com

I haven't had any need to deploy its nested bookmarks capability yet but it has shown itself capable of writing at least a simple outline to target DjVu file where previously HandyOutliner failed.

Basic method:

- in the program's GUI, open target DjVu file; then
- compose bookmarks in the GUI by typing the description then pressing tab then typing the page number and finally pressing return; then ditto for next line till all bookmarks composed. (See program's included help page for formatting, including formatting of nested bookmarks).
Or compose elsewhere* including between x) each bookmark description and y) the associated target page number, tab / the Unicode control character shown in the program's help file example, then paste that outline into the GUI; then
- press save to write outline to target file.

So far this has resulted in successful writing of bookmarks although no result code is given.

* STDUViewer outputs its saved outline as xml, this is a format not friendly to editing and so far all I have found are XMLVP and alternatively XMLDocViewer which present the whole contents of the outline including formatting codes which substance at least can then be copied into e.g. Notepad and edited to recreate the outline into the format that can then be pasted into and processed by DjVuoutline - tiresome but precludes the necessity at least of re-typing the descriptions and page numbers.
b0bcat
Posts: 49
Joined: 30 Nov 2012, 21:37
Number of books owned: 0
Country: UK

Re: Make a djvu file and add ocr: DjVuToy; TiffDjvuOcr; CuneiDjVu

Post by b0bcat »

Hah, it's always the way: post something then find the more elegant solution.

In the particular circumstances I since found that changing the filename of the target DjVu (which included an apostrophe) to x.DjVu (for example) and re-running the write command in HandyOutliner successfully wrote the bookmarks. I then changed the file name back to its original.
Post Reply