StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POcocktail party algorithm SVD implementation ... in one line of code?
primarykey
Id
20414667
data
AcceptedAnswerId
33616878
AnswerCount
2
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2013-12-06T02:11:10.193
FavoriteCount
55
LastActivityDate
2015-11-09T19:39:59.873
LastEditDate
2014-07-28T12:35:31.837
LastEditorUserId
411972
OwnerUserId
411972
ParentId
0
PostTypeId
1
Score
69
ViewCount
27791
LastEditorDisplayName
text
Body
In a slide within the introductory lecture on machine learning by Stanford's Andrew Ng at Coursera, he gives the following one line Octave solution to the cocktail party problem given the audio sources are recorded by two spatially separated microphones: <pre><code>[W,s,v]=svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x'); </code></pre> At the bottom of the slide is "source: Sam Roweis, Yair Weiss, Eero Simoncelli" and at the bottom of an earlier slide is "Audio clips courtesy of Te-Won Lee". In the video, Professor Ng says, <blockquote> "So you might look at unsupervised learning like this and ask, 'How complicated is it to implement this?' It seems like in order to build this application, it seems like to do this audio processing, you would write a ton of code, or maybe link into a bunch of C++ or Java libraries that process audio. It seems like it would be a really complicated program to do this audio: separating out audio and so on. It turns out the algorithm to do what you just heard, that can be done with just one line of code ... shown right here. It did take researchers a long time to come up with this line of code. So I'm not saying this is an easy problem. But it turns out that when you use the right programming environment many learning algorithms will be really short programs." </blockquote> The separated audio results played in the video lecture are not perfect but, in my opinion, amazing. Does anyone have any insight on how that one line of code performs so well? In particular, does anyone know of a reference that explains the work of Te-Won Lee, Sam Roweis, Yair Weiss, and Eero Simoncelli with respect to that one line of code? UPDATE To demonstrate the algorithm's sensitivity to microphone separation distance, the following simulation (in Octave) separates the tones from two spatially separated tone generators. <pre><code>% define model f1 = 1100; % frequency of tone generator 1; unit: Hz f2 = 2900; % frequency of tone generator 2; unit: Hz Ts = 1/(40*max(f1,f2)); % sampling period; unit: s dMic = 1; % distance between microphones centered about origin; unit: m dSrc = 10; % distance between tone generators centered about origin; unit: m c = 340.29; % speed of sound; unit: m / s % generate tones figure(1); t = [0:Ts:0.025]; tone1 = sin(2*pi*f1*t); tone2 = sin(2*pi*f2*t); plot(t,tone1); hold on; plot(t,tone2,'r'); xlabel('time'); ylabel('amplitude'); axis([0 0.005 -1 1]); legend('tone 1', 'tone 2'); hold off; % mix tones at microphones % assume inverse square attenuation of sound intensity (i.e., inverse linear attenuation of sound amplitude) figure(2); dNear = (dSrc - dMic)/2; dFar = (dSrc + dMic)/2; mic1 = 1/dNear*sin(2*pi*f1*(t-dNear/c)) + \ 1/dFar*sin(2*pi*f2*(t-dFar/c)); mic2 = 1/dNear*sin(2*pi*f2*(t-dNear/c)) + \ 1/dFar*sin(2*pi*f1*(t-dFar/c)); plot(t,mic1); hold on; plot(t,mic2,'r'); xlabel('time'); ylabel('amplitude'); axis([0 0.005 -1 1]); legend('mic 1', 'mic 2'); hold off; % use svd to isolate sound sources figure(3); x = [mic1' mic2']; [W,s,v]=svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x'); plot(t,v(:,1)); hold on; maxAmp = max(v(:,1)); plot(t,v(:,2),'r'); xlabel('time'); ylabel('amplitude'); axis([0 0.005 -maxAmp maxAmp]); legend('isolated tone 1', 'isolated tone 2'); hold off; </code></pre> After about 10 minutes of execution on my laptop computer, the simulation generates the following three figures illustrating the two isolated tones have the correct frequencies. <img src="https://i.stack.imgur.com/5FOS9.png" alt="Figure 1"> <img src="https://i.stack.imgur.com/TyU8G.png" alt="Figure 2"> <img src="https://i.stack.imgur.com/83RI2.png" alt="Figure 3"> However, setting the microphone separation distance to zero (i.e., dMic = 0) causes the simulation to instead generate the following three figures illustrating the simulation could not isolate a second tone (confirmed by the single significant diagonal term returned in svd's s matrix). <img src="https://i.stack.imgur.com/q9QbR.png" alt="Figure 1 with no mic separation"> <img src="https://i.stack.imgur.com/qpaGo.png" alt="Figure 2 with no mic separation"> <img src="https://i.stack.imgur.com/KKUp1.png" alt="Figure 3 with no mic separation"> I was hoping the microphone separation distance on a smartphone would be large enough to produce good results but setting the microphone separation distance to 5.25 inches (i.e., dMic = 0.1333 meters) causes the simulation to generate the following, less than encouraging, figures illustrating higher frequency components in the first isolated tone. <img src="https://i.stack.imgur.com/Y62sE.png" alt="Figure 1 on smartphone"> <img src="https://i.stack.imgur.com/auJGT.png" alt="Figure 2 on smartphone"> <img src="https://i.stack.imgur.com/vslRE.png" alt="Figure 3 on smartphone">
Tags
<matlab><octave><linear-algebra><numerical-methods><svd>
Title
cocktail party algorithm SVD implementation ... in one line of code?
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USgregS
UserOwnerUserId
1. USgregS
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POcocktail party algorithm SVD implementation ... in one line of code?
 UserUserId
 USlearnvst
 VoteTypeVoteTypeId
 VTFavorite
2. VO
 singulars
 PostPostId
 POcocktail party algorithm SVD implementation ... in one line of code?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POcocktail party algorithm SVD implementation ... in one line of code?
 UserUserId
 USCtrlV
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId
1. COI have vague memories of this lecture, but can't remember what `x` is; is it the spectrogram of the waveform, or what?
 singulars
 PostPostId
 POcocktail party algorithm SVD implementation ... in one line of code?
 UserUserId
 USIsaac
2. COProfessor Ng, at t=5:30 in the introductory video 4 on unsupervised learning, seems to suggest that x is a vector of the audio samples. Perhaps that repmat section in the svd argument is implementing some sort of power normalization of the signal.
 singulars
 PostPostId
 POcocktail party algorithm SVD implementation ... in one line of code?
 UserUserId
 USgregS

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.