Voice Control

DIY Book Scanner Skunk Works. Share your crazy ideas and novel approaches. Home of the "3D structure of a book" thread.

Moderator: peterZ

Post Reply
duerig
Posts: 388
Joined: 01 Jun 2014, 17:04
Number of books owned: 1000
Country: United States of America

Voice Control

Post by duerig »

Voice recognition has come a long way in recent years and it is very accurate when the vocabulary is small. There are open source packages for voice controlled projects on the Raspberry Pi. This would free hands for page flipping. And you could either avoid a foot pedal altogether or repurpose your feet for platen movement on some scanners.

The vocabulary would be very simple:

SHOOT
RETAKE

Anything more complicated like changing settings would be accomplished through normal interfaces.

Ideally, we could have the equivalent words in other languages as well And this would be part of setup.

Now I need to do some research to find the best library to use.
User avatar
Gerard
Posts: 154
Joined: 17 Oct 2010, 07:15
Number of books owned: 0
Location: Berlin (Germany)

Re: Voice Control

Post by Gerard »

http://updates.html5rocks.com/2013/01/V ... Speech-API

maybe you could use an andoid phone als remote controll
duerig
Posts: 388
Joined: 01 Jun 2014, 17:04
Number of books owned: 1000
Country: United States of America

Re: Voice Control

Post by duerig »

Thanks, gerard. I will want to look into that when I get this more integrated into Spreads. Since I am running on a Raspberry Pi, it is always best to offload the work onto the client if possible. :)

I've also been working with Pocketsphinx to see how well that works. The nice thing is that it is very accurate when I say one of the keywords. It almost always detects when I say 'scan' or 'retake' and can distinguish between them very well.

The down side is that is also very good at taking other noises and sounds and interpreting them as 'scan' or 'retake'. This problem of out-of-vocabulary words doesn't seem to have any good solution in PocketSphinx, especially when you have a very small vocabulary. I have a few other possible solutions I want to explore, though.

Oddly enough, Pocketsphinx interprets the click of the camera shutter as the word 'scan'. Which of course makes it take another picture and generate another click. When I hooked it up to my prototype scanning workflow, it was in an infinite loop until I killed it.
jesu_krist
Posts: 5
Joined: 10 Sep 2017, 08:31
Number of books owned: 0
Country: Italy

Re: Voice Control

Post by jesu_krist »

In the broader context of voice controlling the camera(s) during acquisition, I've found this simple solution:

this is my scanning rig (viewtopic.php?p=20885#p20885), I use the TwoCamControl AutoHotKey script to trigger the camera(s), which in turn is activated with keystrokes or other custom actions performed on peripherals; my idea was to voice trigger the camera(s) through some speech recognition software capable of simulating keystrokes: enter Vocola.

Vocola 3 (http://vocola.net/v3/) uses the built-in Windows Speech Recognition as input for local and global hotkeys and shortcuts. I just use the internal mic of my laptop to trigger the camera(s), with no significant latency; Vocola interprets my commands as keystrokes; now i scan hands-free, much faster, and with as little effort as possible. To prevent ambient noises to trigger the camera(s) the easiest way is to lower the mic sensitivity and to choose commands that sound unique; because I just need to "Shoot" the cameras, I have just one voice command, I say "now!" every time I need the camera(s) to shoot, which is interpreted as "F8" on the keyboard, and "now!" is much more easily recognized by the software -- it never fails -- than "shoot!" (at least in my case: I'm not a native english speaker).
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: Voice Control

Post by dtic »

jesu_krist wrote: 12 Sep 2017, 09:21 Vocola 3 (http://vocola.net/v3/) uses the built-in Windows Speech Recognition as input for local and global hotkeys and shortcuts.
Nice solution! I'm not familiar with Vocola, how long is the delay between the phrase you say and the action?
jesu_krist
Posts: 5
Joined: 10 Sep 2017, 08:31
Number of books owned: 0
Country: Italy

Re: Voice Control

Post by jesu_krist »

In my case, with a one word command and using the internal mic of my laptop, the delay is under 2 seconds; but with a little bit of practice and for such a repetitive task I'm able to voice the command as I'm still turning the pages (or positioning the book), so that in the end there is almost no latency.
Post Reply