ScanTailor_multi_core -- speed up processing

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

ScanTailor_multi_core -- speed up processing

Post by dtic »

ScanTailor_multi_core
scantailor_multi_core.png
scantailor_multi_core.png (7.3 KiB) Viewed 17235 times
Processes step 6 in ScanTailor two or four times faster via multiple instances on a dual/quad core CPU system.

Made in Autohotkey by nod5 = dtic as free software (GPL3).

More details and download

This is Windows software. But the steps are straightforward so something similar could likely be done for Linux. Anyone attempting that might have use of the comments and regexp in the source.
eL_PuSHeR
Posts: 125
Joined: 28 Jun 2010, 15:25

Re: ScanTailor_multi_core -- speed up processing

Post by eL_PuSHeR »

It sounds cool.
scanster
Posts: 9
Joined: 10 Sep 2012, 01:37
Number of books owned: 0
Country: USA
Contact:

Re: ScanTailor_multi_core -- speed up processing

Post by scanster »

I really need the multicore processing in ScanTailor. Very sad to not find it there. This process you outlined has given me some hope but when I tried your program (thank you for posting it online), it unfortunately didn't work for me. I tried 4 and then 2 cores. In both cases, 4 (and 2) scan tailor windows opened and actually the first page was processed in each but that was it. The process stopped (even though the little UI said it was still processing, there was nothing going on in the machine). I am wondering if there may be a small bug or something that you may have already fixed. Could you please let me know... I just need a faster way to get through scan tailor which has every feature I need but just not fast enough. Thank you so much.
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: ScanTailor_multi_core -- speed up processing

Post by dtic »

Try this: in the .ahk source file add this

Code: Select all

sleep, 1000
to a new row directly after row 303 (row 303 = "sendinput, {space}").
Then run the .ahk file (you need autohotkey installed for that).
scanster
Posts: 9
Joined: 10 Sep 2012, 01:37
Number of books owned: 0
Country: USA
Contact:

Re: ScanTailor_multi_core -- speed up processing

Post by scanster »

Thanks for this.

Have you ever come across any way of doing this on a Linux box? Looks like AutoHotKey doesn't work on Linux. Ideally I'd like to turn ScanTailor into a linux script. I'll ask the same question over at the ScanTailor forum.

Thank you!
User avatar
jbaiter
Posts: 98
Joined: 17 Jun 2013, 16:42
E-book readers owned: 2
Number of books owned: 0
Country: Germany
Location: Munich, Germany
Contact:

Re: ScanTailor_multi_core -- speed up processing

Post by jbaiter »

scanster wrote:Thanks for this.

Have you ever come across any way of doing this on a Linux box? Looks like AutoHotKey doesn't work on Linux. Ideally I'd like to turn ScanTailor into a linux script. I'll ask the same question over at the ScanTailor forum.

Thank you!
You can take a look at the scantailor plugin for spreads: https://github.com/jbaiter/spreads/blob ... ntailor.py
It splits up a ScanTailor configuration file into as many smaller files as the CPU has cores and runs ScanTailor in parallel on every one of them.
If you know Python, it should be fairly trivial to adapt it to a standalone script, as you should be able to keep most of the code. You'd only need to refactor it a bit so that "split_configuration" and "generate_output" are standalone-functions and write some code that calls them, just like the "process" method does.
spreads: Command-line workflow assistant
User avatar
jbaiter
Posts: 98
Joined: 17 Jun 2013, 16:42
E-book readers owned: 2
Number of books owned: 0
Country: Germany
Location: Munich, Germany
Contact:

Re: ScanTailor_multi_core -- speed up processing

Post by jbaiter »

There you go, just adapted it for you on my ride to work ;-)

http://git.io/2sE44Q

Download the script, and run with your saved ScanTailor project file:

Code: Select all

$ chmod +x scantailor_multicore.py
$ ./scantailor_multicore /path/to/project.ScanTailor /path/to/output/directory
spreads: Command-line workflow assistant
mhr
Posts: 37
Joined: 07 May 2012, 10:12
E-book readers owned: onyx-boox-m92 sony-trs-t1
Number of books owned: 500
Country: Germany

Re: ScanTailor_multi_core -- speed up processing

Post by mhr »

I also have a solution to offer under linux in the same spirit as jbaiter's solution above
if a multicore capable scantailor is not available.
You split the scantailor projectfile book.Scantailor into multiple sub projectfiles.
Then You spawn several scantailor-cli processes and all images are created.
I only focus on batch processing the output images when all previous filters have been applied.
The advantage of my approach is, that You can afterwards update Your old projectfile and
continue in editing the whole plot with the batch processed files.

My scripts are written in python 2.x and should be also working under windows with minor adjustments.

I like to have a unique directory setup to save parameter handling, therefore my scripts are fixed for this setup.
Especially the number of sub processes should be choosen for Your machine (variable 'nprocess' in 'rose_scantailor').
But an adaption to a different setup is not hard. My setup is always (seen from current directory):

Code: Select all

Scantailor input:  'book/*.tif' or 'book.tif' (multiple tiff image file)
Scantailor projectfile: 'book.ScanTailor'
Image output directory: 'book/out' or 'out'
After all image settings have been done in scantailor (including output settings), go to the above directory and
invoce rose_scantailor. This creates the sub projectfiles book-[0,1,2,...].ScanTailor (fast).
Then issue the command rose_scantailor_spawn. This executes scantailor-cli multiple times (wait).
Finally issue the command rose_scantailor again. This creates the new combined projectfile book-new.ScanTailor
from the sub projectfiles and the old projectfile (fast).
You can now replace the old projectfile with this new projectfile if everything is OK. I like to be able to check that everything
has gone right after each substep, but You could also combine the latter three steps in one step.
Finally You can continue to edit Your scantailor project book-new.ScanTailor (or book.ScanTailor after
replacement).

It was not obvious how to get scantailor to accept the created files as up to date. One thing which can be a show stopper is,
that scantailor recalculates every automatic setting for the current set of files. A slight change in picture dimensions occurs
naturally if only a subset of pages should be processed. This leads to changes in the width and height of the calculated
output images and scantailor will reject the created files.

The solution I have choosen to avoid this, is to find the page with the maximum width and the page with the maximum
height and to duplicate these up to two pages in all sub projects. The drawback is that every sub process calculates this
pages and a race condition might occur. While updating the new projectfile, for these pages the highest time stamp
of all sub projectfiles is written and scantailor seems to be happy with this. The worst thing, which might happen is
that the corresponding page is not accepted. Then You can recalculate these two pages later in the GUI of scantailor.
The critical pages are marked in the info file 'book-info.txt', which is written by 'rose_scantailor'.

And here are the corresponding script files:
rose_scantailor_merge_split_spawn.zip
Scantailor scripts for use of multiple cores (by mhr)
(4.53 KiB) Downloaded 526 times
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: ScanTailor_multi_core -- speed up processing

Post by dtic »

I've put up an updated version of ScanTailor_multi_core where the user can use any number of cores he/she wants (before there was only a 2 or 4 core option).
0kelvin
Posts: 29
Joined: 10 Nov 2012, 17:14
Number of books owned: 0
Country: Brazil

Re: ScanTailor_multi_core -- speed up processing

Post by 0kelvin »

This multicore script isn't working right for me. It has two bugs: it changes the resolution per part, each part is having all its pages being adjusted according to the largest page of that part, yielding different page sizes (off by some pixels) at the end; it fails to use mixed or color modes, outputting everything in black and white even when I did select mixed mode for all pages.

I'm just opening four instances of scantailor and splitting the book in four parts instead.
Post Reply