Proposed 100% Linux Workflow: Capture-Process-OCR

Convert page images into searchable text. Talk about software, techniques, and new developments here.

Moderator: peterZ

Tim

Re: Proposed 100% Linux Workflow: Capture-Process-OCR

Post by Tim »

That sounds fantastic! Can you post a few pictures or video of the process? Also, do you have anyone that is visually impaired testing the process? The scanning to OCR process is an extremely liberating one for visually impaired people and it would be especially good if your process worked for them as well. Of course, depending on their impairment, they may not be able to look at the images, but with a good process, they don't need to. In that line, you could consider contacting the people at bookshare.org to see if they have any resources or pointers to resources that could help you.
benjamin
Posts: 58
Joined: 04 Mar 2014, 00:53

Re: Proposed 100% Linux Workflow: Capture-Process-OCR

Post by benjamin »

These are good ideas... I'll try to take some photos the next time I'm in front of the system. Visual impairment is an interesting question... capturing would be pretty easy but previewing/flagging images might be tricky... though the previews will ultimately be displayed fullscreen, so I guess if you have a sufficiently big/high contrast monitor... Honestly I don't know much about this world... I'm very nearsighted and think with practice I could go through the process easily without my glasses, but know it's not the same. Would LOVE to connect with Bookshare at some point and trade some knowledge; suspect this is a community where our work could have deep impact. Hate to think there's an entire class of the population for whom accessing a ton of great literature is difficult/impossible, particularly when there are copyright exceptions providing mechanisms for that access.

Initial setup and calibration is really the sticking point, ideally what I want is a prompt that just asks you to input book dimensions, then tells you how high to place the camera and sets zoom levels automatically. In a distant future this could even be done automatically- place the book or a calibration card on the platen, start the program, camera zooms out, takes a preview image and based on that system decides the vertical required to center it and the zoom level to crop out the borders. But that's not high priority at the moment.

I guess you'd have to ask the scantailor crew about visual impairment and postprocessing. I know at one point there was conversation about inserting markers on a page itself to denote border and spine, haven't really been following their progress for a little while (no offense, guys, want to dive in, just superbusy!).
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Proposed 100% Linux Workflow: Capture-Process-OCR

Post by daniel_reetz »

Would LOVE to connect with Bookshare at some point and trade some knowledge;
I've spoken with them, and can connect you with some people, maybe. (got your email, BTW)
johnh
Posts: 2
Joined: 04 Mar 2014, 00:52

Re: Proposed 100% Linux Workflow: Capture-Process-OCR

Post by johnh »

benjamin wrote:
One thing I'm still struggling with is an effective image preview mechanism. The problem I was having was that ImageMagick was taking up to 10 seconds to rotate and scale each page, making post-capture "preview" far too slow. i've also yet to find a way to effectively close a program from bash other than pkill, and for some reason that leads to system instability over time. I think I may have that beat now that I've gotten a handle on gphoto's ability to access rotate options on the camera itself, but need to test this and before any release we'd need to ensure this works on cameras other than the SX100's. I suspect there may be a way to capture a still image from the viewfinder, which would solve the problem, but haven't looked into this yet.
I'm a little late to this conversation but for image display have you looked at Xv? It's pretty lightweight. Would the PBM toolkit be of any use for image manipulation? Maybe IM isn't cleanly handling SIGTERM from your pkill command...

---john.
benjamin
Posts: 58
Joined: 04 Mar 2014, 00:53

Re: Proposed 100% Linux Workflow: Capture-Process-OCR

Post by benjamin »

This is helpful feedback... I'll look into these options when I return at the end of the month. One current script we've been playing with uses convert -> montage -> ristretto, which has the best output results but is also slow and crashes after a while. This may be a better alternative. It looks like the combo you're suggesting might also play more easily with Kommander as well, which I think is the path to a quick & dirty GUI.
wels
Posts: 21
Joined: 04 Mar 2014, 00:52

Re: Proposed 100% Linux Workflow: Capture-Process-OCR

Post by wels »

Regarding an image viewer: My favourite one is geeqie (previously gqview) is has some advanced features, supports some kind of remote control (haven't tested it indepth: geeqie --remote-help), you can configure the type of filtering for zoom/fit to window, etc ...
During scanning, I use it for quality control by sorting the directory by date, pressing R for reload, Pos1 for the most recent picture - but I do this in chunks of ~50 images. My setup captures images at a constant interval (7secs) in which I flip the page. Then, after ~50 images, I do some qc, repeat ...
Post Reply