Introducing spreads: command-line workflow tool

spamsickle · Post by **spamsickle** » 03 Sep 2013, 06:47

I'm curious to know what is required to implement a plugin for Spreads. Is any application with a CLI a candidate, or only applications for which one has the source available? I assume strictly GUI applications can't be plugged in, but maybe I'm mistaken.

I know your documentation says your plugin capability is based on some framework some guy wrote, but I'm not familiar with his framework and figured it would be easier (for me) to ask here. If the explanation requires more than a couple of sentences, I don't mind if one of those sentences includes "so you should just go read that documentation."

Thanks to the person who posted the pointer to somebody who was using infrared-enabled Pentax cameras for scanning. I wasn't aware these existed. I assume Spreads can only pull real-time images from cameras which support CHDK, but maybe that's another mistaken assumption. It might still be possible for Spreads to flash an infrared LED on a timer (or under keyboard control), while using native camera capability to display the images on monitors in real time.

My current non-mainstream scanning setup consists of two CHDK-enabled Canon S5-IS cameras on tripods, shooting books in a cradle held open as flat as I can with a couple of fingers. This setup completely avoids problems with reflection and debounce, but the resulting images are not as flat as they'd be with a platen, and also vary in size slightly from the front of the book to the back. Back on the plus side, my setup also avoids the time consumed and fatigue racked up by raising and lowering a platen (cradle), but even so I find I don't want to spend much more than an hour at a stretch scanning books. My cameras fire automatically every 6 seconds under the control of a CHDK script, which is usually plenty of time to turn the page, visually verify "this is the correct next page", smooth both leaves flat, and position my hold-down fingers in empty margin, with 2-3 seconds to spare. Ignoring the time required to setup a new book (typically less than a minute), this gives me approximately 1200 pages per hour throughput, which generally means 1-4 books at a stretch.

I've just seen an example of hOCR output, and I envision using something like that to "flatten" my pages after OCR, without the overhead and artifacts of an image-based dewarping step such as the one implemented by Scan Tailor. hOCR has bounding boxes for each word, so straightening the line should theoretically be simply a matter of properly aligning all the words in each line of text. You mentioned in the "Daniel Learns About Spreads" thread that you thought it might be possible to use Spreads to correlate the OCR candidates for multiple (commercial) OCR applications, which is what motivated my question on plugins. All of these applications can create PDF output. I believe the PDF stores OCRed text as characters and positions, but I haven't verified that that's true in all cases.

Assuming I had PDF output for my book from some set of commercial/open OCR applications, and an application I'd written myself which could pull the OCR information from PDFs, could Spreads be an appropriate tool to present the mismatched text along with the original image of the page?

jbaiter · Post by **jbaiter** » 03 Sep 2013, 07:53

spamsickle wrote:I'm curious to know what is required to implement a plugin for Spreads. Is any application with a CLI a candidate, or only applications for which one has the source available? I assume strictly GUI applications can't be plugged in, but maybe I'm mistaken.

Any application with a CLI (or an API, even better) can be called from a plugin, yes. Strictly GUI applications could be scripted (with something like AutoHotkey) but this is a huge pain in the ass and not very portable. I try to stick with CLI-applications as far as possible, as I want to be able to use spreads on a computer without a graphical environment.

I know your documentation says your plugin capability is based on some framework some guy wrote, but I'm not familiar with his framework and figured it would be easier (for me) to ask here. If the explanation requires more than a couple of sentences, I don't mind if one of those sentences includes "so you should just go read that documentation."

Well, you don't have too worry too much about that framework, it's only there to discover plugins, activate them and expose them to the application. As a plugin author, you only have to subclass the DevicePlugin or HookPlugin class and implement the hooks that you want to run your code at. For example, to write a custom postprocessing plugin, you would create a new class "MyPostPlugin" that inherits from "HookPlugin", implement the process method and you're good.

Thanks to the person who posted the pointer to somebody who was using infrared-enabled Pentax cameras for scanning. I wasn't aware these existed. I assume Spreads can only pull real-time images from cameras which support CHDK, but maybe that's another mistaken assumption. It might still be possible for Spreads to flash an infrared LED on a timer (or under keyboard control), while using native camera capability to display the images on monitors in real time.

Currently, yes. The real-time images are also limited to the GUI at the moment, but I plan to refactor that code a bit to allow for live preview images from a greater variety of devices across all interfaces. I'm not sure if I understood the second part, the infrared LED triggers a shot on those Pentax cameras?

My cameras fire automatically every 6 seconds under the control of a CHDK script, which is usually plenty of time to turn the page, visually verify "this is the correct next page", smooth both leaves flat, and position my hold-down fingers in empty margin, with 2-3 seconds to spare.

That workflow should be reproducible with spreads, all it really would have to do is fire the cameras at a fixed interval (easy to do, as I already said). Where do you verify the images? On the devices themselves or on the computer?

hOCR has bounding boxes for each word, so straightening the line should theoretically be simply a matter of properly aligning all the words in each line of text.

That sounds like a great idea for dewarping! Are you aware of some tool that does that? -- edit: After some thought, I'm not sure if this would help that much with page warping, often the word is warped itself, and by aligning the word with the rest of the line, it would still not look very good, possible even worse. Maybe with an OCR engine that operated on the character level, but I think all of them only go down to individual words.

You mentioned in the "Daniel Learns About Spreads" thread that you thought it might be possible to use Spreads to correlate the OCR candidates for multiple (commercial) OCR applications, which is what motivated my question on plugins. All of these applications can create PDF output. I believe the PDF stores OCRed text as characters and positions, but I haven't verified that that's true in all cases.

Yes, that's how PDF stores the OCRed text, at the character level. A problem here is that formatting isn't preserved (I think, I'm not too sure!) and PDF has no idea of things like lines, paragraphs and page sections at the "hidden text"-level, unlike the XML-output from the OCR engines. Plus, PDF is a b*** to parse, we're currently doing something like that at my workplace and it's really not fun to work with. There's a reason why there's not too many nice open-sourced PDF libraries around....

Assuming I had PDF output for my book from some set of commercial/open OCR applications, and an application I'd written myself which could pull the OCR information from PDFs, could Spreads be an appropriate tool to present the mismatched text along with the original image of the page?

I think that might be a bit too ambitious to do in one plugin. My plan was to keep the plugin code as simple as possible, just call some external tool or library, integrate the results and be done, ideally in less than 1000 lines of code or a single class. Your approach would probably best be done in an external application (much like the ScanTailor plugin), that called all those OCR engines, compared the output and waited for the user to approve of the merged versions.

cday · Post by **cday** » 03 Sep 2013, 11:30

Extracting text from PDF files:

If only the OCR text is required -- and not the word positions on the page as would be required for a searchable image of the page -- it can be exported by simply selecting the text, copying to the clipboard, and pasting into a text editor or word processor... Ctrl + A then Ctrl + C then Ctrl + V .

That way text files containing the OCR output from multiple PDF versions could easily be generated as a basis for further processing.

Edit:

If the above process could be implemented successfully, the resulting output could then conceivably be used to correct misidentified words in a master PDF searchable image 'text + word positions' file, given that PDF files are text files that can be edited.

spomwii · Post by **spomwii** » 03 Sep 2013, 16:28

jbaiter wrote:Glad you got so far, spomwil, and welcome to the murky waters of close-to-metal software...
Errors from the "PTPDevice" class generally indicate that something is going wrong when communicating with the device. There could be a myriad reasons for this, but here are some pointers from my own experiences:

Are both devices running from the same USB hub? I've had issues with that, try attaching both of them directly to your computer (ideally not to the front panel either, there might be another hub there)

Check your cables (that one kept me busy for a whole week...)

Check your system log, are there any unusual messages concerning the USB subsystem? (/var/log/syslog)

Are there any services running on your computer that might interfere with the camera communication? See Mark's article [1] for more hints

Can you trigger the cameras with Mark's script? [2]
It seems that parallel triggering is still kind of unstable, maybe a command-line flag to switch to alternate triggering would be wise?

Concerning those 'insufficient permissions' error, check out the updated FAQ [3] (information on how to fix the permissions permanently will follow in the next few days...)
And for the error message concerning "keep" during download, that's a known bug in the CLI wizard and has been fixed in the development version[4], so you might want to install from GitHub.

[1] https://github.com/markvdb/diybookscann ... ur-cameras
[2] https://github.com/markvdb/diybookscanner
[3] http://spreads.readthedocs.org/en/latest/faq.html
[4] https://github.com/jbaiter/spreads/comm ... 390e5f68a8

Hi Jbaiter,

I am trying everything I can think of now but I am getting nowhere. Earlier today I tried to install Debian but I ran into similar problems. I am back at Ubuntu now but I switched from 12.4 to 13.4 to see if there was any difference.

The difference now is that Spread is stopping earlier in the process. When I run "spread wizard ~/my_book" I get this error:

(.spreads)tommy@tommy-ThinkPad-T61:~$ spread wizard ~/my_book
spreads encountered an error:
Traceback (most recent call last):
File "/home/tommy/.spreads/bin/spread", line 39, in <module>
spreads.cli.main()
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/cli.py", line 279, in main
config.dump(filename=cfg_path)
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/confit.py", line 804, in dump
default_conf = next(x for x in self.sources if x.default)
StopIteration

Do know what this mean?

I am running Ubuntu on a laptop so I do not think there is a USB hub inside it.
I have switched to new USB cables but I not getting to the point where I connect the cameras anymore so I have not tested them yet.
I am not able to find anything in the syslog but I am not a Linux expert...
I have disabled backends permanently to avoid interference.
I have tried to install Marks script again but I am not able to compile ptpcam anymore.

Edit; I tried to install from GitHub this time.

jbaiter · Post by **jbaiter** » 04 Sep 2013, 03:03

spomwii wrote: (.spreads)tommy@tommy-ThinkPad-T61:~$ spread wizard ~/my_book
spreads encountered an error:
Traceback (most recent call last):
File "/home/tommy/.spreads/bin/spread", line 39, in <module>
spreads.cli.main()
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/cli.py", line 279, in main
config.dump(filename=cfg_path)
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/confit.py", line 804, in dump
default_conf = next(x for x in self.sources if x.default)
StopIteration

That error means that Spreads cannot write a default configuration file to your home directory. Do you have a folder ".config" in /home/tommy? If not, create one. You can force spreads to skip that dumping step by coping the file "default_config.yaml" from the spreads-subdirectory of the source tree to "~/.config/spreads/config.yaml", that should get rid of that error.

spomwii · Post by **spomwii** » 04 Sep 2013, 15:47

I do not know what I did wrong on the last installation so I have reinstalled again. Back on Ubuntu 12.04 now.

I am still getting errors that I don't understand. I am very grateful if you have any idea of whats wrong here.

Code: Select all

(.spreads)tommy@tommy-ThinkPad-T61:~$ spread wizard ~/my_book2
Please connect and turn on the devices.
Press any key to continue.
Detecting devices.
stevedore.extension: error calling 'chdkcamera': 'Device' object has no attribute 'bInterfaceSubClass'
stevedore.extension: 'Device' object has no attribute 'bInterfaceSubClass'
Traceback (most recent call last):
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/stevedore/extension.py", line 145, in _invoke_one_plugin
    response_callback(func(e, *args, **kwds))
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/plugin.py", line 312, in match
    match = extension.plugin.match(device)
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreadsplug/dev/chdkcamera.py", line 231, in match
    and hex(device.bInterfaceSubClass) == "0x1")
AttributeError: 'Device' object has no attribute 'bInterfaceSubClass'
stevedore.extension: error calling 'chdkcamera': 'Device' object has no attribute 'bInterfaceSubClass'
stevedore.extension: 'Device' object has no attribute 'bInterfaceSubClass'
Traceback (most recent call last):
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/stevedore/extension.py", line 145, in _invoke_one_plugin
    response_callback(func(e, *args, **kwds))
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/plugin.py", line 312, in match
    match = extension.plugin.match(device)
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreadsplug/dev/chdkcamera.py", line 231, in match
    and hex(device.bInterfaceSubClass) == "0x1")
AttributeError: 'Device' object has no attribute 'bInterfaceSubClass'
==========================
 Starting capturing process
 ==========================
Setting up devices for capturing.
PTPDevice[left]: Script timed out, retrying...
PTPDevice[right]: Script timed out, retrying...
PTPDevice[left]: Script timed out, retrying...
PTPDevice[right]: Script timed out, retrying...
PTPDevice[left]: Script timed out, retrying...
PTPDevice[right]: Script timed out, retrying...
spreads encountered an error:
Traceback (most recent call last):
  File "/home/tommy/.spreads/bin/spread", line 39, in <module>
    spreads.cli.main()
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/cli.py", line 298, in main
    args.func(args)
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/cli.py", line 175, in wizard
    capture(devices=devices)
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/cli.py", line 101, in capture
    workflow.prepare_capture(devices)
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/workflow.py", line 51, in prepare_capture
    exc_info=sys.exc_info(exc))
TypeError: exc_info() takes no arguments (1 given)

After trying several times I was actually able to capture a few images one time but when I tried to exit I got the "keep not found" again.
Any chance you could give me a little step by step guide to get the most recent version from Github? (I installed by using "pip install spreads" now)

spomwii · Post by **spomwii** » 04 Sep 2013, 16:17

Some times I get to start scanning but it is very unstable with a lot of script timed out, retrying....

Code: Select all

Press 'b' to capture.
PTPDevice[left]: Script timed out, retrying...
Shot 2 pages [551/h]PTPDevice[right]: Script timed out, retrying...
PTPDevice[right]: Script timed out, retrying...
Shot 4 pages in 0.6 minutes, average speed was 517 pages per hour=========================
 Starting download process
=========================
stevedore.extension: error calling 'chdkcamera': 'Device' object has no attribute 'bInterfaceSubClass'
stevedore.extension: 'Device' object has no attribute 'bInterfaceSubClass'
Traceback (most recent call last):
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/stevedore/extension.py", line 145, in _invoke_one_plugin
    response_callback(func(e, *args, **kwds))
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/plugin.py", line 312, in match
    match = extension.plugin.match(device)
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreadsplug/dev/chdkcamera.py", line 231, in match
    and hex(device.bInterfaceSubClass) == "0x1")
AttributeError: 'Device' object has no attribute 'bInterfaceSubClass'
Failed to connect (attempt 1), retrying in 1 s...
stevedore.extension: error calling 'chdkcamera': 'Device' object has no attribute 'bInterfaceSubClass'
stevedore.extension: 'Device' object has no attribute 'bInterfaceSubClass'
Traceback (most recent call last):
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/stevedore/extension.py", line 145, in _invoke_one_plugin
    response_callback(func(e, *args, **kwds))
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/plugin.py", line 312, in match
    match = extension.plugin.match(device)
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreadsplug/dev/chdkcamera.py", line 231, in match
    and hex(device.bInterfaceSubClass) == "0x1")
AttributeError: 'Device' object has no attribute 'bInterfaceSubClass'
Failed to connect (attempt 1), retrying in 1 s...
There is a problem with your configuration file(s):
keep not found
Could not close session!
pyptpchdk: Could not close session!
Could not close session!
pyptpchdk: Could not close session!
(.spreads)tommy@tommy-ThinkPad-T61:~$

spamsickle · Post by **spamsickle** » 04 Sep 2013, 18:30

jbaiter wrote:I'm not sure if I understood the second part, the infrared LED triggers a shot on those Pentax cameras?

That's my understanding -- there's a $10 remote that can trigger the cameras, and someone had rigged an optical audio cable (I'm pretty sure that's what he called it, but I'm not sure what it is) to "pipe" the light signal to the cameras. They were pressing the button on the remote to trigger the cameras on their scanner.

jbaiter wrote: That workflow should be reproducible with spreads, all it really would have to do is fire the cameras at a fixed interval (easy to do, as I already said). Where do you verify the images? On the devices themselves or on the computer?

On the cameras themselves. The LCD display of the S5 IS pivots out and rotates, so I can see them both as I scan the book. The "image verify" function even shows a zoomed-in section for a couple of seconds, so I can confirm that the text is legible, but since I'm using the focus lock that's pretty much a given anyway.

jbaiter wrote: That sounds like a great idea for dewarping! Are you aware of some tool that does that? -- edit: After some thought, I'm not sure if this would help that much with page warping, often the word is warped itself, and by aligning the word with the rest of the line, it would still not look very good, possible even worse. Maybe with an OCR engine that operated on the character level, but I think all of them only go down to individual words.

No, I'm not aware of a tool that does that (I originally read this as "Are you aware of some fool that does that?"), though there are tools which will generate PDF from text. I expect the tool (which I may have to write) will make some intelligent guesses about how the page was typeset by looking at the original images -- the amount of space between lines would be a constant that's calculated by averaging the values over the whole page, for instance. With accurate OCR and "good enough" font matching, I'd expect to be able to generate a reasonable facsimile of the original text, and eliminate the original image for everything except real images.

spomwii · Post by **spomwii** » 10 Sep 2013, 14:37

Does anyone know what this error means?

Code: Select all

 (python:2849): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed

(python:2849): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed

(python:2849): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed

(python:2849): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed

(python:2849): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed

(python:2849): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed

(python:2849): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed

I am trying to get the gui version to work on Linux Mint.

spomwii · Post by **spomwii** » 10 Sep 2013, 15:56

Hmm. I am getting closer but still have problems. Now I am able to capture images but something is failing while downloading the images.
I get this error "The left and right camera produced an inequal amount of images, please fix the problem!
Will not combine images"

When I look in the project folder there are 3 folders. They are called "raw", "left" & "right". There is images in the folder called "rigth". But the folder called "left" is empty. It looks to me like spread does not download images from the left camera.
If I swith place of the cameras (USB cable) the result is the same. Only images in the "right" folder.

In the terminal window in the background I get this message:

Code: Select all

(python:4635): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed
Traceback (most recent call last):
  File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreadsplug/gui/gui.py", line 390, in doDownload
    self.combine_btn.clicked.connect(get_pluginmanager()['combine']
AttributeError: 'Extension' object has no attribute 'download'

DIY Book Scanner

Introducing spreads: command-line workflow tool

Re: Introducing spreads: command-line workflow tool

Re: Introducing spreads: command-line workflow tool

Re: Introducing spreads: command-line workflow tool

Re: Introducing spreads: command-line workflow tool

Re: Introducing spreads: command-line workflow tool

Re: Introducing spreads: command-line workflow tool

Re: Introducing spreads: command-line workflow tool

Re: Introducing spreads: command-line workflow tool

Re: Spreads on Linux Mint: First Impressions

Re: Spreads on Linux Mint: First Impressions