Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

ST suggestions and feature requests go here.

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: ST suggestions and feature requests go here.

Post by Tulon » 05 Nov 2011, 11:49

leescott wrote:1.0betaC(beta2?) I used yesterday open it well.
1.0betaC? That's odd - I never put anything but numbers there.
leescott wrote:It can't open that file.I havenot found more question.
OK. I thought you couldn't install or run Scan Tailor itself.

What Windows version are you running?

Is anyone else experiencing this?
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.

leescott
Posts: 19
Joined: 29 Apr 2010, 03:17

Re: ST suggestions and feature requests go here.

Post by leescott » 05 Nov 2011, 19:47

Tulon wrote:1.0betaC? That's odd - I never put anything but numbers there.
Sorry !I forget.It should be beta 2 or 3.
Tulon wrote:What Windows version are you running?Is anyone else experiencing this?
My Windows is XP.

Anonymous2
Posts: 97
Joined: 18 Oct 2011, 16:05

Re: ST suggestions and feature requests go here.

Post by Anonymous2 » 06 Nov 2011, 17:20

Tulon, does Scan Tailor have debugging output when run from a terminal window? I think integrating a debug log tab/window would be useful for situations like this.

Also, doesn't Qt4 depend upon the Microsoft Visual C++ redistributable? That might be an issue.

Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: ST suggestions and feature requests go here.

Post by Tulon » 06 Nov 2011, 17:50

Anonymous2 wrote:Tulon, does Scan Tailor have debugging output when run from a terminal window?
Scan Tailor doesn't really have any logging. The debugging you can activate from the menu is purely visual and is meant for debugging image processing algorithms.
Anonymous2 wrote:Also, doesn't Qt4 depend upon the Microsoft Visual C++ redistributable? That might be an issue.
Anything compiled with Visual Studio will have such a dependency. I don't see any issues here though.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.

Anonymous2
Posts: 97
Joined: 18 Oct 2011, 16:05

Re: ST suggestions and feature requests go here.

Post by Anonymous2 » 07 Nov 2011, 12:56

I'll hack with the source code a little bit and see if I can create a debug window of sorts (and basic logging for a few of the problematic areas). I made one for Bindery in PyQt4, so I doubt the process would be much different.

sanzoghenzo
Posts: 5
Joined: 11 Nov 2011, 07:35
Number of books owned: 0

Re: ST suggestions and feature requests go here.

Post by sanzoghenzo » 15 Nov 2011, 06:22

Hi, first of all thanks for this great piece of software! It saves me some kg of books for my trips :)
Second, forgive me if it's already asked, but I cannot find it.

Short version: Would it be possible to use tesseract ocr library (or OCRopus) to output hOCR files, even better ePUB files?

Long version:
I would like to transform my scanned pages into ePub format and maintain the pictures between the text.
I can do it with an old copy of abbyy fine reader on my pc (jpeg->rtf with images->epub), but I prefer to work on my mac (and the demo of fine reader for mac doesn't handle pictures the way I would like).
I've heard about hOCR format, and in the specification I see that it supports ocr_photo (it defines a box/polygon that has to be treated like an image).
Since in the workflow with scan tailor we already define the image boxes, I was wondering if that information can be passed to tesseract/ocropus to keep it consistent (and speedup ocr recognition, maybe?).

Don't know if it's better/easier to pass information to an external program or integrate OCR capability in ScanTailor.
The last step is to convert hOCR html into epub (maybe calibre will do it someday...)
Thanks for your attention!

dtic
Posts: 461
Joined: 06 Mar 2010, 18:03

Re: ST suggestions and feature requests go here.

Post by dtic » 27 Nov 2011, 21:05

Tulon wrote:Scan Tailor never used more than one core for processing tasks. It does use additional threads and therefore additional cores for auxiliary tasks, like loading thumbnails.
Hi Tulon and others. Here are some more thoughts on the processing speed and multiple cores. By using two, four or more CPU cores (if available) ScanTailor would be much faster. Here is a manual process to speed up processing using two cores:

1. Add 5 sample photos of book spreads to ScanTailor
2. Work through all steps like usual, except the last step (output)
3. file > save project as > test.ScanTailor

4. make two copies of all the images: from "folder\" to "folder\1\" and "folder\2\"
5. select half of the pages in the right hand pane (page 6-10 in my example), right click > remove from project
6. file > save project as > test1.ScanTailor
7. file > open > test.ScanTailor
8. select the other half of the pages in the right hand pane (page 1-5 in my example), right click > remove from project
9. file > save project as > test2.ScanTailor
10. open test1.ScanTailor in a text editor and add "1\" to the variable for outputDirectory and "1/" for directory path , then save the file
example before: outputDirectory="C:\out"
example after: outputDirectory="C:\1\out"
11. open test1.ScanTailor in a text editor and add "2\" to the variable for outputDirectory and "2/" for directory path , then save the file
12. open test1.ScanTailor in Scantailor and run the output process
13. start a second instance of ScanTailor, open test2.ScanTailor and run the output process
14. Once both ScanTailor instances are done processing, put all files from each of the out folders in one folder. Then generate a pdf or djvu like usual.

Steps 1-3 are of course manual. But many of 4-14 can be automated through scripting. However, steps 5 and 8 are currently hard to script. But I think they'd be scriptable if only the ScanTailor UI had these features:
A. put information on the total number of pages in the project in the ScanTailor window title
(note: it is not reliable to only multiply the number of input images by two because the user might remove some pages manually in step 2 and some or all inputs might be images of single pages)
B. add keyboard control for extending selection in the right pane: let shift+down extend the selection with one downwards in the list. (That's how selection works in Windows Explorer)
C. add keyboard control for the "remove from project" command. For example delete or shift+delete. C isn't strictly necessary since a script simulating mouse clicks can work, but scripting keyboard commands is easier and more reliable.

I know that Tulon's ScanTailor development is currently in pause mode. But even a comment on how complex it might be to add A and B might be of use. My thinking is that A and B seem less complex than for example changes to the core text processing ScanTailor does so maybe some other dev reading this who can't work on the core stuff might still be able to look into adding A and B. I can't. But I can script a bit so if A and B was added then I'd take a stab at making a Windows script that automates 4-14, with an ini setting for what number of CPU cores to use.

edit: Oh wait! I forgot to look more into the .ScanTailor files. They're all plaintext. So a script might process them directly for the other steps too, using regexp or some other method. So as an alternative route, does anyone have hints on the easiest way to "split" a .ScanTailor file to get the same results as in step 5 and 8 above?

dtic
Posts: 461
Joined: 06 Mar 2010, 18:03

Re: ST suggestions and feature requests go here.

Post by dtic » 28 Nov 2011, 14:01

Ok, so I figured out how to "split" the project file in a functional way. Next I want to run two instances of command-line ScanTailor on one project file each.

When I go to the command-line and run
scantailor-cli.exe "C:\test.ScanTailor"
only the scantailor-cli help file is displayed and no processing starts. What other parameters are required for scantailor-cli to process based on the directions in the project file? The help file doesn't answer that AFAICT.

dtic
Posts: 461
Joined: 06 Mar 2010, 18:03

Re: ST suggestions and feature requests go here.

Post by dtic » 29 Nov 2011, 18:33

I have made an autohotkey script that does steps 4-14 with two instances of ScanTailor processing simultaneously. I need to clean it up a bit and add support for more instances before posting but it works reliably so far.

However command line usage would be more reliable than my current solution which sends keyboard commands to the ScanTailor UI and conditions actions on a measured drop in CPU usage for ST.

My last post asked what other parameters are needed to process a project file through scantailor-cli. But a more basic question is: is project file use currently supported at all by scantailor-cli?

Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: ST suggestions and feature requests go here.

Post by Tulon » 30 Nov 2011, 04:35

dtic wrote:My last post asked what other parameters are needed to process a project file through scantailor-cli. But a more basic question is: is project file use currently supported at all by scantailor-cli?
I happen to know very little about the CLI version. Try asking this on the development mailing list.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.

Post Reply