Note that there are some explanatory texts on larger screens.

plurals
  1. POIssue distinguishing commands from normal speech with SAPI
    primarykey
    data
    text
    <p>I'm working on a personal project involving microphones in my apartment that I can issue verbal commands to. To accomplish this, I've been using the Microsoft Speech API, and specifically RecognitionEngine from System.Speech.Recognition in C#. I construct a grammar as follows:</p> <pre><code>// validCommands is a Choices object containing all valid command strings // recognizer is a RecognitionEngine GrammarBuilder builder = new GrammarBuilder(recognitionSystemName); builder.Append(validCommands); recognizer.SetInputToDefaultAudioDevice(); recognizer.LoadGrammar(new Grammar(builder)); recognizer.RecognizeAsync(RecognizeMode.Multiple); // etc ... </code></pre> <p>This seems to work pretty well for the case when I actually give it a command. It hasn't misidentified one of my commands yet. Unfortunately, it also tends to pick up random talking as commands! I've tried to ameliorate this by prefacing the command <em>Choices</em> object with a "name" (<em>recognitionSystemName</em>), which I address the system as. Oddly, this doesn't seem to help. I am restricting it to a set of predetermined command phrases, so I would have thought that it would be able to detect if speech wasn't any of the strings. My best guess is that it's assuming that all sound is a command and picking the best match from the command set. Any advice on improving this system so that it no longer triggers off of conversation not directed at it would be very helpful.</p> <p>Edit: I've moved the name recognizer to a separate SpeechRecognitionEngine, but the accuracy is awful. Here's a bit of test code I wrote to examine the accuracy:</p> <pre><code>using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Speech.Recognition; namespace RecognitionAccuracyTest { class RecognitionAccuracyTest { static int recogcount; [STAThread] static void Main() { recogcount = 0; System.Console.WriteLine("Beginning speech recognition accuracy test."); SpeechRecognitionEngine recognizer; recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US")); recognizer.SetInputToDefaultAudioDevice(); recognizer.LoadGrammar(new Grammar(new GrammarBuilder("Octavian"))); recognizer.SpeechHypothesized += new EventHandler&lt;SpeechHypothesizedEventArgs&gt;(recognizer_SpeechHypothesized); recognizer.SpeechRecognized += new EventHandler&lt;SpeechRecognizedEventArgs&gt;(recognizer_SpeechRecognized); recognizer.RecognizeAsync(RecognizeMode.Multiple); while (true) ; } static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { System.Console.WriteLine("Recognized @ " + e.Result.Confidence); try { if (e.Result.Audio != null) { System.IO.FileStream stream = new System.IO.FileStream("audio" + ++recogcount + ".wav", System.IO.FileMode.Create); e.Result.Audio.WriteToWaveStream(stream); stream.Close(); } } catch (Exception) { } } static void recognizer_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e) { System.Console.WriteLine("Hypothesized @ " + e.Result.Confidence); } } } </code></pre> <p>If the name is "Octavian", it recognizes stuff like "Octopus", "Octagon", "Volkswagen", and "Wow, really?". I can clearly hear the difference in the associated audio clips. Any ideas on making this not awful would be great.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload