Scan Tailor
Moderator: peterZ
- IcantRead
- Posts: 95
- Joined: 17 Sep 2009, 02:56
- Number of books owned: 0
- Country: United States
- Location: Arizona
Re: Scan Tailor
I was wondering if there was a way in scan tailor to take a color document and run black and white, but instead of turning everything into black it would turn it into color. This would make most of the background white, and those little parts that turn black would still be close to white. I'm not sure i'm asking this right, I hope someone understands me.
Re: Scan Tailor
You want to make the background a uniform color (white) but leave text and pictures unchanged in brightness and color?
- IcantRead
- Posts: 95
- Joined: 17 Sep 2009, 02:56
- Number of books owned: 0
- Country: United States
- Location: Arizona
Re: Scan Tailor
Kind of. Ok when you apply black and white to a picture, it will take anything close to white and make it white. Also it dose the same thing to black. I was wondering is there anyway to take what it puts out as white, and make that white. But on the other hand take what the program puts out as "black" not become black but keep its original color. The reason I was thinking this would be a good thing is because most of my textbooks don't have to many pictures, they just have some color. It seems a waist to keep the entire file in color when only 2%-5% of is it in color. Then again things are lost when it is in black and white.
-
- Posts: 596
- Joined: 06 Jun 2009, 23:57
Re: Scan Tailor
Yes, this is possible if you have an editor which can handle regular expressions. It's a bit beyond the capabilities of editors which only do exact-match replacements, as the content box which Scan Tailor finds automatically is unlikely to be the same from one page to the next. I use UltraEdit ($50 from http://www.ultraedit.com) to do it in Windows.daniel_reetz wrote:I have the faint impression that it is possible to apply one setting to all pages by editing the actual project file produced by Scan Tailor. Can anyone confirm this?
I think the reason Tulon has not implemented this feature is because the content selection is taking place after the deskew step, and the content rectangle is specified in the transformed space. It is unlikely that one page will be exactly the same as the next in the amount of rotation that's required to deskew it, and consequently the transformed space will be different from one page to the next. Forcing a fixed content rectangle onto pages which vary is thus technically incorrect, but I've never been one to let perfection stand in the way of "good enough".
The problem that remains for me is that I've stopped processing the "left" and "right" pages separately in ScanTailor and combining them afterwards, because sometimes the right-hand pages came out a different size than the left-hand pages, and the end result when they were combined in a single PDF (while still "good enough" for me) could be irritating. I'm now combining left and right into a single interleaved file (with a Perl script that renames and moves the originals) and letting ScanTailor work its magic on the whole book in one pass. While all the right-hand images can be sufficiently similar for the editing trick to work, and all the left-hand images, it's almost impossible to get the left-hand and right-hand images to have pages in the same place. For that, I'm going to want to apply one content rectangle to every other image (as can be done currently with Tulon's "rotation" step), and that's beyond what UltraEdit can handle too, unless someone can think of a regular expression that can distinguish between even and odd numbers.
So at this point, to do what I want to do (and what apparently a lot of others want to do as well), I'll either need to write another script or modify the Scan Tailor code. My preference is to make the change to Scan Tailor, and I'm in the process of going through the code to see how to accomplish it.
Re: Scan Tailor
It sounds to me like maybe you want to output 4-bit or 8-bit tiff, instead of monochrome. That wouldn't be so good for photos, but fine for illustrations or the occasional colored boxes or text. I don't think ScanTailor has that ability today, but it might not be too much of a stretch to add a new output type to replace the current binarization routines.IcantRead wrote:Kind of. Ok when you apply black and white to a picture, it will take anything close to white and make it white. Also it dose the same thing to black. I was wondering is there anyway to take what it puts out as white, and make that white. But on the other hand take what the program puts out as "black" not become black but keep its original color. The reason I was thinking this would be a good thing is because most of my textbooks don't have to many pictures, they just have some color. It seems a waist to keep the entire file in color when only 2%-5% of is it in color. Then again things are lost when it is in black and white.
-
- Posts: 290
- Joined: 20 Jun 2009, 12:19
- E-book readers owned: SONY PRS-505, Kindle DX
- Number of books owned: 9999
- Location: Grand Rapids, MI
- Contact:
Re: Scan Tailor
This raises the question I've wondered about (and I apologize if it's been asked before), but what're the best settings if you want to OCR the output of ScanTailor? I really like the "clean" look of b/w, but I dislike the sort of jaggy appearance. I figure the aliasing that multiple gray scales gives you makes the page more pleasant to look at, but it may confound OCR.
Do you use one set of ScanTailor settings if you're going to read the output yourself, and a different set of settings if you're going to hand it off to an OCR?
Do you use one set of ScanTailor settings if you're going to read the output yourself, and a different set of settings if you're going to hand it off to an OCR?
-
- Posts: 596
- Joined: 06 Jun 2009, 23:57
Re: Scan Tailor
You might try applying the "mixed" mode of output to all the pages. I assume you have some keywords in color, like a contextual editor would give you for a programming language. I've gotten to the point where I just run "mixed" as my default. I'm sure it's probably slower than specifying either B&W or color, because the application tries to determine what is text and what is picture. I think there's a better than 50/50 chance that it will identify your colored text as pictures and output it the way you hope. If not, you'll have the option to manually specify them as pictures yourself, which could be more hassle than it's worth to you if there are several per page, but then I don't really know what it's worth to you.IcantRead wrote: I was wondering is there anyway to take what it puts out as white, and make that white. But on the other hand take what the program puts out as "black" not become black but keep its original color.
-
- Posts: 24
- Joined: 28 Jul 2009, 01:27
- E-book readers owned: lBook V8, lBook V3
- Number of books owned: 0
- Location: Sofia, Bulgaria
Re: Scan Tailor
Yes. For making DjVu file, I always use output at 600 dpi. But for OCR with Abby FineReader 600 dpi is good only if your text is smaller than 10 pt. For text 12 pt (and above) 300 dpi is enought; if you OCR such a text in 600 dpi, strange things happens (i.e. quotes are recognized as "4" in superscript; why not "9" – I have no idea).StevePoling wrote:Do you use one set of ScanTailor settings if you're going to read the output yourself, and a different set of settings if you're going to hand it off to an OCR?
Re: Scan Tailor
I have a few suggestions:
1. A feature to ignore headers and footers (such as page numbers for instance). This would work great for "pageless" OCR-ing (such as continuous text files).
2. An "UNDO" feature with a Ctrl+Z shortcut and maybe a button to the left of the screen.
3. An extra step after 6 Output. Maybe called "Edit" or "Retouch" or something like that capable of basic monochrome editing. Or perhaps instead of an extra step, there could be a tab right under "Picture Zones" at step 6. Sure you can do this in other image editors (such as GIMP or even MS Paint) but it would really be nice to have it integrated. The mouse wheel could increase/decrease the size of the brush. The left mouse button could be white and the right mouse button black. Please think about it.
PS: I don't know if it's a bug or not but every page I've scanned is slightly tilted to the left. You can observe this when selecting content.
1. A feature to ignore headers and footers (such as page numbers for instance). This would work great for "pageless" OCR-ing (such as continuous text files).
2. An "UNDO" feature with a Ctrl+Z shortcut and maybe a button to the left of the screen.
3. An extra step after 6 Output. Maybe called "Edit" or "Retouch" or something like that capable of basic monochrome editing. Or perhaps instead of an extra step, there could be a tab right under "Picture Zones" at step 6. Sure you can do this in other image editors (such as GIMP or even MS Paint) but it would really be nice to have it integrated. The mouse wheel could increase/decrease the size of the brush. The left mouse button could be white and the right mouse button black. Please think about it.
PS: I don't know if it's a bug or not but every page I've scanned is slightly tilted to the left. You can observe this when selecting content.
-
- Posts: 4
- Joined: 04 Mar 2014, 00:52
Re: Scan Tailor
If this is a place for feature requests here are a couple more:
- On the page layout step I'd like to be able to specify the output page size instead of just the margins.
- Be able to skip steps. For example most of the time I probably don't need the fix orientation and split page steps.
- I found in the code where it disables CCITT group4 compression (line 267 of TiffWriter.cpp):I'm not sure a bug in photoshop should determine the level of compression. I've uncommented this for my purposes.
Really great job with scan tailor. It has a lot of potential.
- On the page layout step I'd like to be able to specify the output page size instead of just the margins.
- Be able to skip steps. For example most of the time I probably don't need the fix orientation and split page steps.
- I found in the code where it disables CCITT group4 compression (line 267 of TiffWriter.cpp):
Code: Select all
// Don't use CCITTFAX4 compression, as Photoshop
// has problems with it.
//compression = COMPRESSION_CCITTFAX4;
Really great job with scan tailor. It has a lot of potential.