Missing OCR text in djvubind?
Posted: 19 Jan 2012, 15:14
Hi, i have a TIFF that looks like this (in Swedish) after running through ScanTailor:
But, some text is missing. If I open it in djview4 and copy the entire page as text and paste I get
In djvubind I have selected tesseract as ocr, and have added "-l swe" as option in config (or else it didnät work at all).
When I run "tesseract -l swe" from the command line though, it gives me all the lines. So something is strange., and I would appreciate help with debugging.
djvubind-1.1.0
tesseract-3.01
djvulibre-bin 3.5.24-8
Ubuntu 11.10 amd64
Learning to scan, my aim is mostly to scan misc documents, but I got this book I'd like to scan first.
I now try djvub ind on it, and get a resulting .djvu with embedded OCR. Great! But, some text is missing. If I open it in djview4 and copy the entire page as text and paste I get
As you can see, many complete lines are missing and page numbers and heading also (I hope you don't have to understand swedish to see the missing lines?).När kan jag använda jordbrukstraktorn
Traktorns grundutrustning
Traktorns skogsutrustning
Montering av vinsch
Montering av linkran
Kontroll och skötsel av utrustning
Personlig skyddsutrustning
Planering av drivningstrakten
Sortiment till stickväg
Stammar eller stamdelar till stickväg
Stammar till avlägg
Sortiment till avlägg
Ekonomi
In djvubind I have selected tesseract as ocr, and have added "-l swe" as option in config (or else it didnät work at all).
When I run "tesseract -l swe" from the command line though, it gives me all the lines. So something is strange., and I would appreciate help with debugging.
|NNEHÃ…LL
När kan jag använda jordbrukstraktorn
iskogen?
Traktorns grundutrustning
Traktorns skogsutrustning
Kraftöverföringsaxlar
Traktorkärran
Vinschar
Montering av vinsch
Linkranar
Montering av linkran
Griplastare
Hjälpmedel
Stà llinor
Kontroll och skötsel av utrustning
Personlig skyddsutrustning
Körteknik
Planering av drivningstrakten
Drivningsmetoder
Sortiment till stickväg
Stammar eller stamdelar till stickväg
Stammar till avlägg
Sortiment till avlägg
Ekonomi
djvubind-1.1.0
tesseract-3.01
djvulibre-bin 3.5.24-8
Ubuntu 11.10 amd64
Learning to scan, my aim is mostly to scan misc documents, but I got this book I'd like to scan first.