using scantailor and ocropus

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

Post Reply
rwreed
Posts: 21
Joined: 23 Jan 2011, 16:15

using scantailor and ocropus

Post by rwreed »

Hi,

I love scan tailor it is a wonderful program and my hat is off to the developer and contributors. I have been using Google tesseract to do the OCR, but am annoyed by the generic nature of the output. I understand that Ocropus will do page layout so I thought I'd give it a try. But when I ran ocropus on a scan tailor tiff I got an error message saying it only supports 8 channel tiffs. Any idea how I can output an 8 channel tiff?

Thanks so much in advance.
Randy
spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: using scantailor and ocropus

Post by spamsickle »

I assume this means 8 bits per channel, and I always thought that's what Scan Tailor used.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: using scantailor and ocropus

Post by Tulon »

Except in B/W mode, where it's 1 bit per channel (and one channel total).
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
ahmad
Posts: 24
Joined: 28 Dec 2010, 11:26

Re: using scantailor and ocropus

Post by ahmad »

Hi, rwreed, and welcome.

There are many ways to turn your images into 8-bit tiffs... it might help if you told us which operating system are you using?

Tesseract can do positional OCR... you might want to look here - http://code.google.com/p/tesseract-ocr/ ... id=263#c10. I tried Ocropus some months back, but found Tesseract to have better output and a definite advantage in terms of simplicity and ease-of-use.
User avatar
strider1551
Posts: 126
Joined: 01 Mar 2010, 11:39
Number of books owned: 0
Location: Ohio, USA

Re: using scantailor and ocropus

Post by strider1551 »

I understand that Ocropus will do page layout
I know this is a little off-topic from the 8-bit question, but I believe tesseract-3.00 added page layout analysis (see here).
rwreed
Posts: 21
Joined: 23 Jan 2011, 16:15

Re: using scantailor and ocropus

Post by rwreed »

Hi all,
Thanks for the replies. My system ubuntu 10.10.

I've been trying to use tesseract3.0 and I do have to say I like the job that it does in recognizing text, though I still am unsure how to get it to output paragraphs separately. Perhaps I'll start a thread over in software.

Anyway, if there's a way to get scantailor to output 8 bit tiffs I'd be interested to hear, though it sounds like I change the output mode, is that right?

thanks again
Randy
spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: using scantailor and ocropus

Post by spamsickle »

I think if you output "color" mode, they'll be 8-bit. B&W mode is one bit, and mixed mode is mostly one bit.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: using scantailor and ocropus

Post by Tulon »

spamsickle wrote:and mixed mode is mostly one bit.
Mixed mode is actually never 1 bit. That's an implementation detail. It's always the same as "Color / Grayscale" mode, that is 8 bit Grayscale or 8+8+8 bit RGB or 8+8+8+8 bit RGBA.

I would actually report a bug to Ocropus developers regarding this issue. I mean it's just lame not to accept B/W TIFFs.

The "convert" command that's part of the ImageMagick suite should be able to convert from B/W to grayscale or color. I don't know the required command line options though.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
Anonymous1

Re: using scantailor and ocropus

Post by Anonymous1 »

Ubuntu 10.10 is the system of choice, my friend ;)

Just use ImageMagick for this:

Code: Select all

convert input.tiff -depth 8 output.tiff
If you want to overwrite the originals, just run mogrify:

Code: Select all

mogrify -depth 8 input.tiff
The code is untested, so it might not work. But I doubt that it'll screw anything up (just don't run mogrify until you know it does what you want it to).
Post Reply