Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Whoever gave you your instructions wasn't telling you to count zero crossings on the results of the DFT/FFT. That would be meaningless. (If they were telling you to do that, they were clueless. You have my permission to laugh at them for giving you such ridiculous instructions). Rather they were telling you to count zero crossings on the original data, and also look at the FFT of your data.</p> <p>However,</p> <ul> <li><p>Zero crossing rate is a pretty crappy starting point for speech recognition. Maybe you can get somewhere with it. With only slight hyperbole, I can say zero crossing is the least robust DSP analysis you can do. However, it is also simple, and speech recognition research has been going on a long time, so maybe there's some research on it. UPDATE/CORRECTION: this is a bit of a hyperbole. Actually I believe a lot of speech recognition techniques DO use zero-crossing, but you should know what you are doing first, because it's not very robust and sensitive to many kinds of errors like octave-errors. When you use zero-crossing, it's a good idea to low-pass (maybe aggressively) first. Definitely consider other factors.</p></li> <li><p>Understanding the output of an FFT is something that's asked so often here that I wrote a blog entry. Usually people are trying to track pitch, and you should do that, too actually, but there's other stuff you can get from the FFT like frequency centroid, and the relative strengths of different frequencies that are important in speech. Start here: <a href="http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html" rel="nofollow">http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html</a></p></li> <li><p>You might also want to consider simply filtering important speech frequencies (to find out what these are, start with wikipedia entry on <a href="http://en.wikipedia.org/wiki/Manner_of_articulation" rel="nofollow">"manner of articulation"</a>. For example, by following the link to Sibilant, you'll learn that "[s] has the most acoustic strength at around 8,000 Hz". Neeto!) You can get that info from an FFT or by filtering. There are advantages and disadvantages to each. You may want to look into the speech recognition literature to see what they use.</p></li> </ul>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload