Proposed 100% Linux Workflow: Capture-Process-OCR
Moderator: peterZ
Re: Proposed 100% Linux Workflow: Capture-Process-OCR
That sounds fantastic! Can you post a few pictures or video of the process? Also, do you have anyone that is visually impaired testing the process? The scanning to OCR process is an extremely liberating one for visually impaired people and it would be especially good if your process worked for them as well. Of course, depending on their impairment, they may not be able to look at the images, but with a good process, they don't need to. In that line, you could consider contacting the people at bookshare.org to see if they have any resources or pointers to resources that could help you.
Re: Proposed 100% Linux Workflow: Capture-Process-OCR
These are good ideas... I'll try to take some photos the next time I'm in front of the system. Visual impairment is an interesting question... capturing would be pretty easy but previewing/flagging images might be tricky... though the previews will ultimately be displayed fullscreen, so I guess if you have a sufficiently big/high contrast monitor... Honestly I don't know much about this world... I'm very nearsighted and think with practice I could go through the process easily without my glasses, but know it's not the same. Would LOVE to connect with Bookshare at some point and trade some knowledge; suspect this is a community where our work could have deep impact. Hate to think there's an entire class of the population for whom accessing a ton of great literature is difficult/impossible, particularly when there are copyright exceptions providing mechanisms for that access.
Initial setup and calibration is really the sticking point, ideally what I want is a prompt that just asks you to input book dimensions, then tells you how high to place the camera and sets zoom levels automatically. In a distant future this could even be done automatically- place the book or a calibration card on the platen, start the program, camera zooms out, takes a preview image and based on that system decides the vertical required to center it and the zoom level to crop out the borders. But that's not high priority at the moment.
I guess you'd have to ask the scantailor crew about visual impairment and postprocessing. I know at one point there was conversation about inserting markers on a page itself to denote border and spine, haven't really been following their progress for a little while (no offense, guys, want to dive in, just superbusy!).
Initial setup and calibration is really the sticking point, ideally what I want is a prompt that just asks you to input book dimensions, then tells you how high to place the camera and sets zoom levels automatically. In a distant future this could even be done automatically- place the book or a calibration card on the platen, start the program, camera zooms out, takes a preview image and based on that system decides the vertical required to center it and the zoom level to crop out the borders. But that's not high priority at the moment.
I guess you'd have to ask the scantailor crew about visual impairment and postprocessing. I know at one point there was conversation about inserting markers on a page itself to denote border and spine, haven't really been following their progress for a little while (no offense, guys, want to dive in, just superbusy!).
- daniel_reetz
- Posts: 2812
- Joined: 03 Jun 2009, 13:56
- E-book readers owned: Used to have a PRS-500
- Number of books owned: 600
- Country: United States
- Contact:
Re: Proposed 100% Linux Workflow: Capture-Process-OCR
I've spoken with them, and can connect you with some people, maybe. (got your email, BTW)Would LOVE to connect with Bookshare at some point and trade some knowledge;
Re: Proposed 100% Linux Workflow: Capture-Process-OCR
I'm a little late to this conversation but for image display have you looked at Xv? It's pretty lightweight. Would the PBM toolkit be of any use for image manipulation? Maybe IM isn't cleanly handling SIGTERM from your pkill command...benjamin wrote:
One thing I'm still struggling with is an effective image preview mechanism. The problem I was having was that ImageMagick was taking up to 10 seconds to rotate and scale each page, making post-capture "preview" far too slow. i've also yet to find a way to effectively close a program from bash other than pkill, and for some reason that leads to system instability over time. I think I may have that beat now that I've gotten a handle on gphoto's ability to access rotate options on the camera itself, but need to test this and before any release we'd need to ensure this works on cameras other than the SX100's. I suspect there may be a way to capture a still image from the viewfinder, which would solve the problem, but haven't looked into this yet.
---john.
Re: Proposed 100% Linux Workflow: Capture-Process-OCR
This is helpful feedback... I'll look into these options when I return at the end of the month. One current script we've been playing with uses convert -> montage -> ristretto, which has the best output results but is also slow and crashes after a while. This may be a better alternative. It looks like the combo you're suggesting might also play more easily with Kommander as well, which I think is the path to a quick & dirty GUI.
Re: Proposed 100% Linux Workflow: Capture-Process-OCR
Regarding an image viewer: My favourite one is geeqie (previously gqview) is has some advanced features, supports some kind of remote control (haven't tested it indepth: geeqie --remote-help), you can configure the type of filtering for zoom/fit to window, etc ...
During scanning, I use it for quality control by sorting the directory by date, pressing R for reload, Pos1 for the most recent picture - but I do this in chunks of ~50 images. My setup captures images at a constant interval (7secs) in which I flip the page. Then, after ~50 images, I do some qc, repeat ...
During scanning, I use it for quality control by sorting the directory by date, pressing R for reload, Pos1 for the most recent picture - but I do this in chunks of ~50 images. My setup captures images at a constant interval (7secs) in which I flip the page. Then, after ~50 images, I do some qc, repeat ...