Introducing spreads: command-line workflow tool
Moderator: peterZ
-
- Posts: 596
- Joined: 06 Jun 2009, 23:57
Re: Introducing spreads: command-line workflow tool
I'm curious to know what is required to implement a plugin for Spreads. Is any application with a CLI a candidate, or only applications for which one has the source available? I assume strictly GUI applications can't be plugged in, but maybe I'm mistaken.
I know your documentation says your plugin capability is based on some framework some guy wrote, but I'm not familiar with his framework and figured it would be easier (for me) to ask here. If the explanation requires more than a couple of sentences, I don't mind if one of those sentences includes "so you should just go read that documentation."
Thanks to the person who posted the pointer to somebody who was using infrared-enabled Pentax cameras for scanning. I wasn't aware these existed. I assume Spreads can only pull real-time images from cameras which support CHDK, but maybe that's another mistaken assumption. It might still be possible for Spreads to flash an infrared LED on a timer (or under keyboard control), while using native camera capability to display the images on monitors in real time.
My current non-mainstream scanning setup consists of two CHDK-enabled Canon S5-IS cameras on tripods, shooting books in a cradle held open as flat as I can with a couple of fingers. This setup completely avoids problems with reflection and debounce, but the resulting images are not as flat as they'd be with a platen, and also vary in size slightly from the front of the book to the back. Back on the plus side, my setup also avoids the time consumed and fatigue racked up by raising and lowering a platen (cradle), but even so I find I don't want to spend much more than an hour at a stretch scanning books. My cameras fire automatically every 6 seconds under the control of a CHDK script, which is usually plenty of time to turn the page, visually verify "this is the correct next page", smooth both leaves flat, and position my hold-down fingers in empty margin, with 2-3 seconds to spare. Ignoring the time required to setup a new book (typically less than a minute), this gives me approximately 1200 pages per hour throughput, which generally means 1-4 books at a stretch.
I've just seen an example of hOCR output, and I envision using something like that to "flatten" my pages after OCR, without the overhead and artifacts of an image-based dewarping step such as the one implemented by Scan Tailor. hOCR has bounding boxes for each word, so straightening the line should theoretically be simply a matter of properly aligning all the words in each line of text. You mentioned in the "Daniel Learns About Spreads" thread that you thought it might be possible to use Spreads to correlate the OCR candidates for multiple (commercial) OCR applications, which is what motivated my question on plugins. All of these applications can create PDF output. I believe the PDF stores OCRed text as characters and positions, but I haven't verified that that's true in all cases.
Assuming I had PDF output for my book from some set of commercial/open OCR applications, and an application I'd written myself which could pull the OCR information from PDFs, could Spreads be an appropriate tool to present the mismatched text along with the original image of the page?
I know your documentation says your plugin capability is based on some framework some guy wrote, but I'm not familiar with his framework and figured it would be easier (for me) to ask here. If the explanation requires more than a couple of sentences, I don't mind if one of those sentences includes "so you should just go read that documentation."
Thanks to the person who posted the pointer to somebody who was using infrared-enabled Pentax cameras for scanning. I wasn't aware these existed. I assume Spreads can only pull real-time images from cameras which support CHDK, but maybe that's another mistaken assumption. It might still be possible for Spreads to flash an infrared LED on a timer (or under keyboard control), while using native camera capability to display the images on monitors in real time.
My current non-mainstream scanning setup consists of two CHDK-enabled Canon S5-IS cameras on tripods, shooting books in a cradle held open as flat as I can with a couple of fingers. This setup completely avoids problems with reflection and debounce, but the resulting images are not as flat as they'd be with a platen, and also vary in size slightly from the front of the book to the back. Back on the plus side, my setup also avoids the time consumed and fatigue racked up by raising and lowering a platen (cradle), but even so I find I don't want to spend much more than an hour at a stretch scanning books. My cameras fire automatically every 6 seconds under the control of a CHDK script, which is usually plenty of time to turn the page, visually verify "this is the correct next page", smooth both leaves flat, and position my hold-down fingers in empty margin, with 2-3 seconds to spare. Ignoring the time required to setup a new book (typically less than a minute), this gives me approximately 1200 pages per hour throughput, which generally means 1-4 books at a stretch.
I've just seen an example of hOCR output, and I envision using something like that to "flatten" my pages after OCR, without the overhead and artifacts of an image-based dewarping step such as the one implemented by Scan Tailor. hOCR has bounding boxes for each word, so straightening the line should theoretically be simply a matter of properly aligning all the words in each line of text. You mentioned in the "Daniel Learns About Spreads" thread that you thought it might be possible to use Spreads to correlate the OCR candidates for multiple (commercial) OCR applications, which is what motivated my question on plugins. All of these applications can create PDF output. I believe the PDF stores OCRed text as characters and positions, but I haven't verified that that's true in all cases.
Assuming I had PDF output for my book from some set of commercial/open OCR applications, and an application I'd written myself which could pull the OCR information from PDFs, could Spreads be an appropriate tool to present the mismatched text along with the original image of the page?
- jbaiter
- Posts: 98
- Joined: 17 Jun 2013, 16:42
- E-book readers owned: 2
- Number of books owned: 0
- Country: Germany
- Location: Munich, Germany
- Contact:
Re: Introducing spreads: command-line workflow tool
Any application with a CLI (or an API, even better) can be called from a plugin, yes. Strictly GUI applications could be scripted (with something like AutoHotkey) but this is a huge pain in the ass and not very portable. I try to stick with CLI-applications as far as possible, as I want to be able to use spreads on a computer without a graphical environment.spamsickle wrote:I'm curious to know what is required to implement a plugin for Spreads. Is any application with a CLI a candidate, or only applications for which one has the source available? I assume strictly GUI applications can't be plugged in, but maybe I'm mistaken.
Well, you don't have too worry too much about that framework, it's only there to discover plugins, activate them and expose them to the application. As a plugin author, you only have to subclass the DevicePlugin or HookPlugin class and implement the hooks that you want to run your code at. For example, to write a custom postprocessing plugin, you would create a new class "MyPostPlugin" that inherits from "HookPlugin", implement the process method and you're good.I know your documentation says your plugin capability is based on some framework some guy wrote, but I'm not familiar with his framework and figured it would be easier (for me) to ask here. If the explanation requires more than a couple of sentences, I don't mind if one of those sentences includes "so you should just go read that documentation."
Currently, yes. The real-time images are also limited to the GUI at the moment, but I plan to refactor that code a bit to allow for live preview images from a greater variety of devices across all interfaces. I'm not sure if I understood the second part, the infrared LED triggers a shot on those Pentax cameras?Thanks to the person who posted the pointer to somebody who was using infrared-enabled Pentax cameras for scanning. I wasn't aware these existed. I assume Spreads can only pull real-time images from cameras which support CHDK, but maybe that's another mistaken assumption. It might still be possible for Spreads to flash an infrared LED on a timer (or under keyboard control), while using native camera capability to display the images on monitors in real time.
That workflow should be reproducible with spreads, all it really would have to do is fire the cameras at a fixed interval (easy to do, as I already said). Where do you verify the images? On the devices themselves or on the computer?My cameras fire automatically every 6 seconds under the control of a CHDK script, which is usually plenty of time to turn the page, visually verify "this is the correct next page", smooth both leaves flat, and position my hold-down fingers in empty margin, with 2-3 seconds to spare.
That sounds like a great idea for dewarping! Are you aware of some tool that does that? -- edit: After some thought, I'm not sure if this would help that much with page warping, often the word is warped itself, and by aligning the word with the rest of the line, it would still not look very good, possible even worse. Maybe with an OCR engine that operated on the character level, but I think all of them only go down to individual words.hOCR has bounding boxes for each word, so straightening the line should theoretically be simply a matter of properly aligning all the words in each line of text.
Yes, that's how PDF stores the OCRed text, at the character level. A problem here is that formatting isn't preserved (I think, I'm not too sure!) and PDF has no idea of things like lines, paragraphs and page sections at the "hidden text"-level, unlike the XML-output from the OCR engines. Plus, PDF is a b*** to parse, we're currently doing something like that at my workplace and it's really not fun to work with. There's a reason why there's not too many nice open-sourced PDF libraries around....You mentioned in the "Daniel Learns About Spreads" thread that you thought it might be possible to use Spreads to correlate the OCR candidates for multiple (commercial) OCR applications, which is what motivated my question on plugins. All of these applications can create PDF output. I believe the PDF stores OCRed text as characters and positions, but I haven't verified that that's true in all cases.
I think that might be a bit too ambitious to do in one plugin. My plan was to keep the plugin code as simple as possible, just call some external tool or library, integrate the results and be done, ideally in less than 1000 lines of code or a single class. Your approach would probably best be done in an external application (much like the ScanTailor plugin), that called all those OCR engines, compared the output and waited for the user to approve of the merged versions.Assuming I had PDF output for my book from some set of commercial/open OCR applications, and an application I'd written myself which could pull the OCR information from PDFs, could Spreads be an appropriate tool to present the mismatched text along with the original image of the page?
spreads: Command-line workflow assistant
Re: Introducing spreads: command-line workflow tool
Extracting text from PDF files:
If only the OCR text is required -- and not the word positions on the page as would be required for a searchable image of the page -- it can be exported by simply selecting the text, copying to the clipboard, and pasting into a text editor or word processor... Ctrl + A then Ctrl + C then Ctrl + V .
That way text files containing the OCR output from multiple PDF versions could easily be generated as a basis for further processing.
Edit:
If the above process could be implemented successfully, the resulting output could then conceivably be used to correct misidentified words in a master PDF searchable image 'text + word positions' file, given that PDF files are text files that can be edited.
If only the OCR text is required -- and not the word positions on the page as would be required for a searchable image of the page -- it can be exported by simply selecting the text, copying to the clipboard, and pasting into a text editor or word processor... Ctrl + A then Ctrl + C then Ctrl + V .
That way text files containing the OCR output from multiple PDF versions could easily be generated as a basis for further processing.
Edit:
If the above process could be implemented successfully, the resulting output could then conceivably be used to correct misidentified words in a master PDF searchable image 'text + word positions' file, given that PDF files are text files that can be edited.
Re: Introducing spreads: command-line workflow tool
Hi Jbaiter,jbaiter wrote:Glad you got so far, spomwil, and welcome to the murky waters of close-to-metal software...
Errors from the "PTPDevice" class generally indicate that something is going wrong when communicating with the device. There could be a myriad reasons for this, but here are some pointers from my own experiences:
It seems that parallel triggering is still kind of unstable, maybe a command-line flag to switch to alternate triggering would be wise?
- Are both devices running from the same USB hub? I've had issues with that, try attaching both of them directly to your computer (ideally not to the front panel either, there might be another hub there)
- Check your cables (that one kept me busy for a whole week...)
- Check your system log, are there any unusual messages concerning the USB subsystem? (/var/log/syslog)
- Are there any services running on your computer that might interfere with the camera communication? See Mark's article [1] for more hints
- Can you trigger the cameras with Mark's script? [2]
Concerning those 'insufficient permissions' error, check out the updated FAQ [3] (information on how to fix the permissions permanently will follow in the next few days...)
And for the error message concerning "keep" during download, that's a known bug in the CLI wizard and has been fixed in the development version[4], so you might want to install from GitHub.
[1] https://github.com/markvdb/diybookscann ... ur-cameras
[2] https://github.com/markvdb/diybookscanner
[3] http://spreads.readthedocs.org/en/latest/faq.html
[4] https://github.com/jbaiter/spreads/comm ... 390e5f68a8
I am trying everything I can think of now but I am getting nowhere. Earlier today I tried to install Debian but I ran into similar problems. I am back at Ubuntu now but I switched from 12.4 to 13.4 to see if there was any difference.
The difference now is that Spread is stopping earlier in the process. When I run "spread wizard ~/my_book" I get this error:
(.spreads)tommy@tommy-ThinkPad-T61:~$ spread wizard ~/my_book
spreads encountered an error:
Traceback (most recent call last):
File "/home/tommy/.spreads/bin/spread", line 39, in <module>
spreads.cli.main()
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/cli.py", line 279, in main
config.dump(filename=cfg_path)
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/confit.py", line 804, in dump
default_conf = next(x for x in self.sources if x.default)
StopIteration
Do know what this mean?
I am running Ubuntu on a laptop so I do not think there is a USB hub inside it.
I have switched to new USB cables but I not getting to the point where I connect the cameras anymore so I have not tested them yet.
I am not able to find anything in the syslog but I am not a Linux expert...
I have disabled backends permanently to avoid interference.
I have tried to install Marks script again but I am not able to compile ptpcam anymore.
Edit; I tried to install from GitHub this time.
- jbaiter
- Posts: 98
- Joined: 17 Jun 2013, 16:42
- E-book readers owned: 2
- Number of books owned: 0
- Country: Germany
- Location: Munich, Germany
- Contact:
Re: Introducing spreads: command-line workflow tool
That error means that Spreads cannot write a default configuration file to your home directory. Do you have a folder ".config" in /home/tommy? If not, create one. You can force spreads to skip that dumping step by coping the file "default_config.yaml" from the spreads-subdirectory of the source tree to "~/.config/spreads/config.yaml", that should get rid of that error.spomwii wrote: (.spreads)tommy@tommy-ThinkPad-T61:~$ spread wizard ~/my_book
spreads encountered an error:
Traceback (most recent call last):
File "/home/tommy/.spreads/bin/spread", line 39, in <module>
spreads.cli.main()
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/cli.py", line 279, in main
config.dump(filename=cfg_path)
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/confit.py", line 804, in dump
default_conf = next(x for x in self.sources if x.default)
StopIteration
spreads: Command-line workflow assistant
Re: Introducing spreads: command-line workflow tool
I do not know what I did wrong on the last installation so I have reinstalled again. Back on Ubuntu 12.04 now.
I am still getting errors that I don't understand. I am very grateful if you have any idea of whats wrong here.
After trying several times I was actually able to capture a few images one time but when I tried to exit I got the "keep not found" again.
Any chance you could give me a little step by step guide to get the most recent version from Github? (I installed by using "pip install spreads" now)
I am still getting errors that I don't understand. I am very grateful if you have any idea of whats wrong here.
Code: Select all
(.spreads)tommy@tommy-ThinkPad-T61:~$ spread wizard ~/my_book2
Please connect and turn on the devices.
Press any key to continue.
Detecting devices.
stevedore.extension: error calling 'chdkcamera': 'Device' object has no attribute 'bInterfaceSubClass'
stevedore.extension: 'Device' object has no attribute 'bInterfaceSubClass'
Traceback (most recent call last):
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/stevedore/extension.py", line 145, in _invoke_one_plugin
response_callback(func(e, *args, **kwds))
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/plugin.py", line 312, in match
match = extension.plugin.match(device)
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreadsplug/dev/chdkcamera.py", line 231, in match
and hex(device.bInterfaceSubClass) == "0x1")
AttributeError: 'Device' object has no attribute 'bInterfaceSubClass'
stevedore.extension: error calling 'chdkcamera': 'Device' object has no attribute 'bInterfaceSubClass'
stevedore.extension: 'Device' object has no attribute 'bInterfaceSubClass'
Traceback (most recent call last):
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/stevedore/extension.py", line 145, in _invoke_one_plugin
response_callback(func(e, *args, **kwds))
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/plugin.py", line 312, in match
match = extension.plugin.match(device)
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreadsplug/dev/chdkcamera.py", line 231, in match
and hex(device.bInterfaceSubClass) == "0x1")
AttributeError: 'Device' object has no attribute 'bInterfaceSubClass'
==========================
Starting capturing process
==========================
Setting up devices for capturing.
PTPDevice[left]: Script timed out, retrying...
PTPDevice[right]: Script timed out, retrying...
PTPDevice[left]: Script timed out, retrying...
PTPDevice[right]: Script timed out, retrying...
PTPDevice[left]: Script timed out, retrying...
PTPDevice[right]: Script timed out, retrying...
spreads encountered an error:
Traceback (most recent call last):
File "/home/tommy/.spreads/bin/spread", line 39, in <module>
spreads.cli.main()
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/cli.py", line 298, in main
args.func(args)
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/cli.py", line 175, in wizard
capture(devices=devices)
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/cli.py", line 101, in capture
workflow.prepare_capture(devices)
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/workflow.py", line 51, in prepare_capture
exc_info=sys.exc_info(exc))
TypeError: exc_info() takes no arguments (1 given)
Any chance you could give me a little step by step guide to get the most recent version from Github? (I installed by using "pip install spreads" now)
Re: Introducing spreads: command-line workflow tool
Some times I get to start scanning but it is very unstable with a lot of script timed out, retrying....
Code: Select all
Press 'b' to capture.
PTPDevice[left]: Script timed out, retrying...
Shot 2 pages [551/h]PTPDevice[right]: Script timed out, retrying...
PTPDevice[right]: Script timed out, retrying...
Shot 4 pages in 0.6 minutes, average speed was 517 pages per hour=========================
Starting download process
=========================
stevedore.extension: error calling 'chdkcamera': 'Device' object has no attribute 'bInterfaceSubClass'
stevedore.extension: 'Device' object has no attribute 'bInterfaceSubClass'
Traceback (most recent call last):
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/stevedore/extension.py", line 145, in _invoke_one_plugin
response_callback(func(e, *args, **kwds))
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/plugin.py", line 312, in match
match = extension.plugin.match(device)
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreadsplug/dev/chdkcamera.py", line 231, in match
and hex(device.bInterfaceSubClass) == "0x1")
AttributeError: 'Device' object has no attribute 'bInterfaceSubClass'
Failed to connect (attempt 1), retrying in 1 s...
stevedore.extension: error calling 'chdkcamera': 'Device' object has no attribute 'bInterfaceSubClass'
stevedore.extension: 'Device' object has no attribute 'bInterfaceSubClass'
Traceback (most recent call last):
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/stevedore/extension.py", line 145, in _invoke_one_plugin
response_callback(func(e, *args, **kwds))
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreads/plugin.py", line 312, in match
match = extension.plugin.match(device)
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreadsplug/dev/chdkcamera.py", line 231, in match
and hex(device.bInterfaceSubClass) == "0x1")
AttributeError: 'Device' object has no attribute 'bInterfaceSubClass'
Failed to connect (attempt 1), retrying in 1 s...
There is a problem with your configuration file(s):
keep not found
Could not close session!
pyptpchdk: Could not close session!
Could not close session!
pyptpchdk: Could not close session!
(.spreads)tommy@tommy-ThinkPad-T61:~$
-
- Posts: 596
- Joined: 06 Jun 2009, 23:57
Re: Introducing spreads: command-line workflow tool
That's my understanding -- there's a $10 remote that can trigger the cameras, and someone had rigged an optical audio cable (I'm pretty sure that's what he called it, but I'm not sure what it is) to "pipe" the light signal to the cameras. They were pressing the button on the remote to trigger the cameras on their scanner.jbaiter wrote:I'm not sure if I understood the second part, the infrared LED triggers a shot on those Pentax cameras?
On the cameras themselves. The LCD display of the S5 IS pivots out and rotates, so I can see them both as I scan the book. The "image verify" function even shows a zoomed-in section for a couple of seconds, so I can confirm that the text is legible, but since I'm using the focus lock that's pretty much a given anyway.jbaiter wrote: That workflow should be reproducible with spreads, all it really would have to do is fire the cameras at a fixed interval (easy to do, as I already said). Where do you verify the images? On the devices themselves or on the computer?
No, I'm not aware of a tool that does that (I originally read this as "Are you aware of some fool that does that?"), though there are tools which will generate PDF from text. I expect the tool (which I may have to write) will make some intelligent guesses about how the page was typeset by looking at the original images -- the amount of space between lines would be a constant that's calculated by averaging the values over the whole page, for instance. With accurate OCR and "good enough" font matching, I'd expect to be able to generate a reasonable facsimile of the original text, and eliminate the original image for everything except real images.jbaiter wrote: That sounds like a great idea for dewarping! Are you aware of some tool that does that? -- edit: After some thought, I'm not sure if this would help that much with page warping, often the word is warped itself, and by aligning the word with the rest of the line, it would still not look very good, possible even worse. Maybe with an OCR engine that operated on the character level, but I think all of them only go down to individual words.
Re: Spreads on Linux Mint: First Impressions
Does anyone know what this error means?
I am trying to get the gui version to work on Linux Mint.
Code: Select all
(python:2849): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed
(python:2849): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed
(python:2849): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed
(python:2849): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed
(python:2849): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed
(python:2849): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed
(python:2849): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed
Re: Spreads on Linux Mint: First Impressions
Hmm. I am getting closer but still have problems. Now I am able to capture images but something is failing while downloading the images.
I get this error "The left and right camera produced an inequal amount of images, please fix the problem!
Will not combine images"
When I look in the project folder there are 3 folders. They are called "raw", "left" & "right". There is images in the folder called "rigth". But the folder called "left" is empty. It looks to me like spread does not download images from the left camera.
If I swith place of the cameras (USB cable) the result is the same. Only images in the "right" folder.
In the terminal window in the background I get this message:
I get this error "The left and right camera produced an inequal amount of images, please fix the problem!
Will not combine images"
When I look in the project folder there are 3 folders. They are called "raw", "left" & "right". There is images in the folder called "rigth". But the folder called "left" is empty. It looks to me like spread does not download images from the left camera.
If I swith place of the cameras (USB cable) the result is the same. Only images in the "right" folder.
In the terminal window in the background I get this message:
Code: Select all
(python:4635): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion `value >= min && value <= max' failed
Traceback (most recent call last):
File "/home/tommy/.spreads/local/lib/python2.7/site-packages/spreadsplug/gui/gui.py", line 390, in doDownload
self.combine_btn.clicked.connect(get_pluginmanager()['combine']
AttributeError: 'Extension' object has no attribute 'download'