Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>While separating the individual speakers is quite a difficult problem you can automatically split the audio where there are pauses. This would produce a series of files that would likely be easier to manage since speakers often alternate between pauses.</p> <p>This approach requires the open source Julius speech recognition decoder package. This is available in many Linux package repositories. I use the Ubuntu multiverse repository.</p> <p>Here is the site: <a href="http://julius.sourceforge.jp/en_index.php" rel="nofollow">http://julius.sourceforge.jp/en_index.php</a></p> <hr> <p><strong>Step 0: Install Julius</strong></p> <pre><code>sudo apt-get install julius </code></pre> <p><strong>Step 1: Segment Audio</strong></p> <pre><code>adintool -in file -out file -filename myRecording.wav -startid 0 -freq 44100 -lv 2048 -zc 30 -headmargin 600 -tailmargin 600 </code></pre> <ul> <li><p><em>-startid</em> is the starting segment number that will be appended to the filename</p></li> <li><p><em>-freq</em> is the sample rate of the source audio file</p></li> <li><p><em>-lv</em> is the level of the audio above which voice detection will be active</p></li> <li><p><em>-zc</em> is the zero crossings above which voice detection will be active</p></li> <li><p><em>-headmargin</em> and <em>-tailmargin</em> is the amount of silence before and after each audio segment</p></li> </ul> <p>Note that -lv and -zc will have to be adjusted for your particular audio recording's attributes while -headmargin and -tailmargin will have to be adjusted for your particular speaker's styles. But the values given above have worked well for my voice recordings in the past.</p> <p>Here is the documentation: <a href="http://julius.sourceforge.jp/juliusbook/en/adintool.html" rel="nofollow">http://julius.sourceforge.jp/juliusbook/en/adintool.html</a></p> <hr> <p>In my experience preprocessing the audio using compression and normalization gives better results and requires less adjustment of the Julius arguments. These initial steps are recommended but not required.</p> <p>This approach requires the open source SoX audio toolkit package. This is also available in many Linux package repositories. I use the Ubuntu universe repository.</p> <p>Here is the site: <a href="http://sox.sourceforge.net" rel="nofollow">http://sox.sourceforge.net</a></p> <hr> <p><strong>Step -2: Install SoX</strong></p> <pre><code>sudo apt-get install sox </code></pre> <p><strong>Step -1: Preprocess Audio</strong></p> <pre><code>sox myOriginalRecording.wav myRecording.wav gain -b -n -8 compand 0.2,0.6 4:-48,-32,-24 0 -64 0.2 gain -b -n -2 </code></pre> <ul> <li><p><em>gain -b -n</em> balances and normalizes the audio to a given level</p></li> <li><p><em>compand</em> compresses (in this case) the audio based on the parameters</p></li> </ul> <p>Note that compand may require some time to completely understand the parameters. But the values given above have worked well for my voice recordings in the past.</p> <p>Here is the documentation: <a href="http://sox.sourceforge.net/sox.html" rel="nofollow">http://sox.sourceforge.net/sox.html</a></p> <hr> <p>While this will not give you identification of each speaker it will greatly simplify the task of doing it by ear, which may end up being the only option for a while. But I do hope you find practical solution if it is already available.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload